Best Practices: Deploying an API

Deploying a function or model as an API can help encapsulate data science work and expose it to other team members and applications. APIs can be used to do the following:

  • Provide predictions as a microservice to a larger application.
  • Provide inputs to visualization software like Tableau or Bokeh.
  • Serve as a means of sharing your model with other analysts and data scientists.

This guide provides best practices for deploying APIs on the DataScience.com Platform.

Building the API Script

The Deploy Timeout

When you a deploy an API in the Platform, the following actions occur:

  • A container is provisioned with the set specifications.
  • The source files are included on the container.
  • All dependencies indicated in requirements files will be installed via pip, install.packages, and apt-get.
  • A webserver will be launched on the container.
  • A number of workers or processes will be launched.
  • Each of the workers will execute your script.

The web server uses a parameter called a timeout, which is set to 30 seconds. Any worker that is unresponsive for a period longer than the timeout is shut down. Thus, if the code in the API function takes longer than the timeout, the API will fail. Currently, the timeout cannot be configured. Before deploying your API, you can try running the script in a Platform Session to determine whether your code takes too long to execute. We recommend that the API function be as fast and lightweight as possible to minimize API response times.

Pickling vs. Training

You can include a model in your deploy script either by training a model in the script or by loading a serialized, pre-trained model. We generally recommend that you load serialized models in the model script rather than train a model. Loading a serialized model will result in faster build times and help avoid timeout issues.

For introductions to serializing data (pickling), see these articles written for Python and R.

Training a model within a deploy script:

In Python:

model = RandomForestClassifier()
model.fit(X, y)

def my_predict(data):
    return {'predictions': model.predict(data)}

In R:

model <- lm(‘y ~ x’, df)

my_predict <- function(data){
return(model, data)
}

Loading a serialized model within a deploy script:

In Python:

import pickle

with open(model_path) as model_file:
        model = pickle.load(model_file)

def predict(data):
    return {'predictions': model.predict(data)}

In R:

model <- readRDS(model_path)
my_predict <- function(data){
        return(model, data)
}

Pickled models can either be stored as part of the project repository, and therefore tied to version control, or can be stored in a remote file storage system like Amazon S3.

Some users prefer to keep the training process in the deploy script to promote reproducibility and transparency. The tradeoff to this is slower deployment.

Choosing a Response Type for a Deployed API

Once your model is deployed, every result it returns is converted to JSON before being sent over HTTP to the client. Anything the model returns must be a data structure that can be converted to JSON. Responses should be limited to combinations of strings, floats, integers, lists, or dictionaries.

For examples of how to transform common data structures into acceptable formats, see the table below. The leftmost column represents a given data structure in a particular language. The middle column represents how to transform the data structure to be JSON-friendly. “N/A” denotes that there is not an obvious means of conversion. “OK” denotes the data structure does not need to be transformed.

Original Structure JSON-Friendly Transformation Language
dict OK Python
numpy.ndarray x.tolist() Python
list OK Python
str OK Python
pandas.Series x.to_dict() Python
pandas.DataFrame x.to_dict() Python
list as.matrix(as.data.frame(x)) R
data.frame as.matrix(x) R
character OK R
double OK R
matrix OK R

Deploying the API

Document Your Model or Function with a README

Documentation can help ensure that stakeholders and model consumers understand your model. Some things you may want to record include:

  • Request signatures: Provide examples of what types of requests are valid. Describe the potential source(s) of the requested data.
  • Response signatures: Provide examples of what the API returns so consumers know how to integrate results into their applications.
  • A description of the model: As you and your organization accumulate more APIs, it will become difficult to recall the mechanics and details of every API. Including comments and metadata about training data, algorithms, hyperparameters, training time, etc. will help expedite onboarding, enable other team members to understand previous work, and make it easier to diagnose and improve the model in the future.
  • A latency estimate: If other team members are going to be calling your model, it may be helpful to indicate how long your model takes to return responses. This will help ensure that model consumers can evaluate whether the model is fast enough for their application.
  • Release notes: Any time the model is updated, either by retraining it, changing the request/response signatures, or changing the source code/choice of algorithm(s), it is important to update the documentation with details of the change. This will help avoid unintended errors and make it easier to revert to earlier versions if a rollback is needed.

Submitting Requests to Your API

To submit a request to your API, you’ll need the API URL and its cookie string. You can find this data in the Versions tab of your model, under the API Endpoint header, as shown below:

../_images/API-details.png

Below are examples of how to call your model API from R and Python.

Pass in a URL, a request body, specify the encoding, ignore SSL certs, and provide a cookie. Dictionaries are great data types to use for the body as part of the request. The keys of the dictionary will be taken as the names of the arguments of the deployed function.

In Python:

import requests

url = 'https://myenv.datascience.com/deploy/mymodel/'
cookies = {
'datascience-platform': 'my-cookie'
}
body = {
'data':data.tolist()
}
predict = requests.post(url,
                  cookies=cookies,
                  verify=False,
                  json=body)

In R:

library('httr')
body <- list(data=mydata)
request <- POST(url,
             body = body,
             encode = 'json',
             config = config(ssl_verifypeer = 0L),
             set_cookies("datascience-platform" = cookie_string)
            )

Send More Records and Fewer Requests

If you are deploying an API to score requests, you can decrease the overall latency by sending more individual records per request. Concretely:

requests.post(url, cookie=cookie, verify=False, json = [user_1, user2])

will typically outperform

requests.post(url, cookie=cookie, verify=False, json = [user_1])
requests.post(url, cookie=cookie, verify=False, json = [user_2])

This is subject to the memory limits of the container executing the deployed API.

Run APIs on Larger Containers to Improve Response Times

If your deployed API is resource-intensive and the container running the API is undersized, API latency will increase. Besides working on reducing the resource load of your API, try deploying it on a container with larger resources. See our article on allocating resources for containers for details.

Use Resource Pool over On-Demand for Faster Builds

If your organization’s environment supports on-demand containers, note that building on-demand containers requires the additional step of provisioning resources, which can take several extra minutes. To avoid longer builds, run APIs on your shared resource pool. This is particularly useful when prototyping.

Prototyping to Production: Developing Internal Standards

If your model or function is being consumed by other team members and/or applications, it’s important to note the following best practices. Best practices for different teams and use cases will vary, but ideally everyone on a team follows a common set of guidelines. These may include:

  • Coding style guides, comments, and design patterns:
    • To improve its readability, your code should be written clearly, be well-commented, and follow coherent design patterns. See Google’s R style guide and PEP 8 for examples.
  • Documentation:
    • Documentation can help others understand the context of your work and provide instructions on how to use it. Undocumented projects or models may fall into disuse after the original developer has left the project.
  • Continuous Integration and Unit Testing:
    • These processes can help ensure that your model or function is working as expected at all times for all team members and applications. If you update the source code of your function, having an automated test suite can help avoid unexpectedly breaking other applications.
  • Profiling and Optimization:
    • If others are consuming your API, it’s important to set an acceptable latency. You want to ensure that expectations around latency are aligned. If you fall below this level, using memory and/or execution time profilers can help identify bottlenecks so you can address them.

Dependencies

Whether deploying models in Python or in R, it is preferable to have installations captured in requirements files rather than installing them directly in the model script. Once the deploy workers are built, the Platform will determine that an unresponsive worker is dead after a period of time and will kill the worker. Installations, especially those made using source files from the internet, do not take a defined amount of time so timeouts can occur if installations are part of the worker runtime.

Python Libraries

You can list any packages your API requires in a requirements-py.txt file, and reference this file in the Deploy Configuration page. These packages will be installed from PyPI as part of the build process once you deploy your model. For details on Python requirements files, see the pip user guide.

R Libraries

List any R dependencies in a requirements-r.txt file. These packages must exist within the CRAN repository. Note that package versions cannot be specified in R requirements files; when a package is specified, the latest stable release will be installed. If you need to specify a version, you’ll need to install the package within the API script itself.

APT

Sometimes, especially when using certain R libraries, you will need dependencies on the Debian box running your model. These packages can be installed via apt-get install. Apt dependencies can be listed in a requirements-apt.txt file.

Examples

Below are examples of scripts you could deploy (the model) and scripts you could use to call the deployed model (the client).

Model

In Python:

import numpy as np
from sklearn.linear_model import LinearRegression

dim   = 3
N     = 100
X     = np.random.normal(0, 1, size = (N, dim))
beta  = np.random.normal(0, 1, size = dim)
err   = np.random.normal(0, 1, size = N)
y     = np.dot(X, beta) + err
model = LinearRegression().fit(X, y)

def predict(data):
    """Function to be deployed"""
    return model.predict(data).tolist()

In R:

dim  <- 3
N    <- 1000
beta <- rnorm(dim, mean = 0, sd = 1)
err  <- rnorm(N, mean = 0, sd = 1)
X    <- as.matrix(replicate(dim, rnorm(N, mean = 0, sd = 1)))
y    <- as.matrix(X %*% cbind(beta) + err)

colnames(X) <- c('x1', 'x2', 'x3')
colnames(y) <- c('prediction')

df <- data.frame(cbind(X, y))

model <- lm("y ~ x1 + x2 + x3", df)

predictor <- function(data){
    return(predict(model, data))
}

Client

In Python:

import requests
from functools import partial

cookies = { 'datascience-platform': 'my-cookie'}
url     = 'https://myenv.datascience.com/deploy/mymodel/'
predict = partial(requests.post, url, cookies=cookies, verify=False)

def encode(data):
    return {'data':data.tolist()}

x1 = np.random.normal(0, 1, size = (1, 3))

predict(json=encode(x1))

In R:

library('httr')
url <- 'https://myenv.datascience.com/deploy/mymodel/'
cookie_string <-  'my-cookie'
set_cookies("datascience-platform" = cookie_string)

encode <- function(data){
    return(list(data=data))
}

predictor <- function(data){
    response <- POST(url,
                    body = encode(data),
                    encode = 'json',
                    config = config(ssl_verifypeer = 0L))
    return(content(response))
}

x1    <- as.matrix(replicate(3, rnorm(1, mean = 0, sd = 1)))

predictor(x1)