Best Practices: Using Dependency Files

What are Dependency Files?

If you are installing packages during a session (either via pip or install.packages), read this article on creating and using dependency files.

In a nutshell, dependency files contain a list of the libraries and packages installed in a project environment. It’s good practice to keep dependency files for each project you have, regardless of whether you are using our pre-built dependency collections.

Dependency files are very handy when you want to re-create the environment in which you developed your model, whether on the Platform or on your laptop. Dependency files are necessary to create the environment of your deployed API or scheduled run. This is especially true if you install extra packages during a session.

The Platform currently supports dependency files for (i) pip (Python), (ii) R, and (iii) apt.

How to Create Dependency Files in a Jupyter Session

Creating a pip Dependency File in a Jupyter Python Session

The easiest way to create a complete dependency file in a Python project is to use the !pip freeze command in a Jupyter notebook session. As you work in your Jupyter session, you will likely install packages. Run the pip freeze command when you are ready to either close your session or deploy your model (don’t forget to Sync!). We show the command in the snapshot below.


In the image, you can see that all packages have a == sign. This denotes the specific version of the package installed in your environment. When you use a pip requirements file to install these libraries in a new environment, you can always relax the constraint == by using the >=, >, <, and <= signs. Below is an example with the Python library pandas:

pandas==0.15.2 # exact version match.
pandas>=0.15.2 # any versions of pandas greater or equal to 0.15.2
pandas # install the most recent version of 'pandas' available on `pypi <>`__

You can find more on the topic of pip requirements file format in pip documentation.

Creating a Dependency File in an R Jupyter Session

In R sessions, you can get a list of the installed packages by calling installed.packages() in a notebook cell. The snapshot below displays how you can do this within a Jupyter session. Note that for R, the Platform installer will only accept the package names in the dependency file and will install the latest stable version.

Make sure you (i) list one package per line and (ii) do not include the version number.


Apt Dependency Files

In addition to the pip and R package managers, you can also create a dependency file for apt. Apt stands for Advanced Package Tool and is a set of tools for managing Debian packages. (Note that the Platform runs the Debian OS). If you want to install apt dependencies, we recommend listing these dependencies in a file called requirements_apt.txt. You can do so directly in the Platform by opening a new text file in a Jupyter session. For R, the apt installer will install the latest stable version of each package listed in requirements_apt.txt file.

The format of the requirements_apt.txt file is the same as for the R package manager: (i) list one package per line and (ii) do not include the package version.

Here’s an example of the content of a short requirements_apt.txt file:

r-base libreadline-dev gfortran

Using Dependency Files on the Platform

In the previous section you learned how to create dependency files. Now you will learn why you should use these dependency files and how you can use them in your workflow.

In a Jupyter, RStudio, or Zeppelin Session

Dependency files are particularly useful when you are migrating work on the Platform. Let’s suppose you have developed a model on your laptop and you want to move it onto the Platform. Reproducing your laptop Python environment on the Platform is easy if you captured the dependencies via pip freeze. Just run the following command on the Platform in a notebook cell:

!pip install -r requirements_python.txt

The packages on the Platform will match the ones you have used in your local/dev environment.

Dependency files are very useful when creating (or re-creating) an environment. In an R Session, you can also install many packages from a requirements_r.txt file. In a Jupyter notebook cell, run the following command where the file requirements_r.txt was created previously:

packageList <- read.csv('requirements_r.txt', header=FALSE, col.names=c('packages'))
packageList <- as.vector(packageList[,])
lapply(packageList, install.packages(packageList), character.only=T)

The same three commands can be executed within an RStudio session.

When Deploying an API

When deploying your model as a REST API, it is important that the API environment matches the one you used to develop the model. You achieve this by using dependency files. In the snapshot below we show where to put the names of the dependency files in the Deploy an API window.


When Scheduling a Run

The same idea applies when scheduling a run. In the snapshot below you can see where the requirements files can be inserted.


General Tips and Best Practices

  • Put your requirements in the top level folder of your project.
  • Add the installer suffix to your dependency file names. For example: requirements_pip.txt , requirements_r.txt and requirements_apt.txt.

Additional References on Dependency Files