Best Practices: Using Dependency Files¶
What are Dependency Files?¶
If you are installing packages during a session (either via
install.packages), read this article on creating and using
In a nutshell, dependency files contain a list of the libraries and packages installed in a project environment. It’s good practice to keep dependency files for each project you have, regardless of whether you are using our pre-built dependency collections.
Dependency files are very handy when you want to re-create the environment in which you developed your model, whether on the Platform or on your laptop. Dependency files are necessary to create the environment of your deployed API or scheduled run. This is especially true if you install extra packages during a session.
The Platform currently supports dependency files for (i)
(ii) R, and (iii)
How to Create Dependency Files in a Jupyter Session¶
Creating a pip Dependency File in a Jupyter Python Session¶
The easiest way to create a complete dependency file in a Python project
is to use the
!pip freeze command in a Jupyter notebook session. As
you work in your Jupyter session, you will likely install packages. Run
pip freeze command when you are ready to either close your session or deploy your model (don’t forget to Sync!). We show the command in the
In the image, you can see that all packages have a
This denotes the specific version of the package installed in your
environment. When you use a
pip requirements file to install
these libraries in a new environment, you can always relax the
== by using the
Below is an example with the Python library pandas:
pandas==0.15.2 # exact version match. pandas>=0.15.2 # any versions of pandas greater or equal to 0.15.2 pandas # install the most recent version of 'pandas' available on `pypi <pypi.python.org/pypi>`__
You can find more on the topic of
pip requirements file format in pip documentation.
Creating a Dependency File in an R Jupyter Session¶
In R sessions, you can get a list of the installed packages
installed.packages() in a notebook cell. The snapshot below displays how you can do this within a Jupyter session. Note
that for R, the Platform installer will only accept the package names
in the dependency file and will install the latest stable version.
Make sure you (i) list one package per line and (ii) do not include the version number.
Apt Dependency Files¶
In addition to the
pip and R package managers, you can also create a
dependency file for
Apt stands for Advanced Package Tool and is
a set of tools for managing Debian packages. (Note that the Platform
runs the Debian OS). If you want to install
dependencies, we recommend listing these dependencies in a file called
requirements_apt.txt. You can do so directly in the Platform by
opening a new text file in a Jupyter session. For R, the
apt installer will install the latest stable version of each package
The format of the
requirements_apt.txt file is the same as for the
R package manager: (i) list one package per line and (ii) do not
include the package version.
Here’s an example of the content of a short
r-base libreadline-dev gfortran
Using Dependency Files on the Platform¶
In the previous section you learned how to create dependency files. Now you will learn why you should use these dependency files and how you can use them in your workflow.
In a Jupyter, RStudio, or Zeppelin Session¶
Dependency files are particularly useful when you are migrating work on
the Platform. Let’s suppose you have developed a model on your laptop
and you want to move it onto the Platform. Reproducing your laptop Python
environment on the Platform is easy if you captured the dependencies via
pip freeze. Just run the following command on the Platform in a notebook cell:
!pip install -r requirements_python.txt
The packages on the Platform will match the ones you have used in your local/dev environment.
Dependency files are very useful when creating (or re-creating) an
environment. In an R Session, you can also install many packages from a
requirements_r.txt file. In a Jupyter notebook cell, run the
following command where the file
requirements_r.txt was created previously:
packageList <- read.csv('requirements_r.txt', header=FALSE, col.names=c('packages')) packageList <- as.vector(packageList[,]) lapply(packageList, install.packages(packageList), character.only=T)
The same three commands can be executed within an RStudio session.
When Deploying an API¶
When deploying your model as a REST API, it is important that the API environment matches the one you used to develop the model. You achieve this by using dependency files. In the snapshot below we show where to put the names of the dependency files in the Deploy an API window.
When Scheduling a Run¶
The same idea applies when scheduling a run. In the snapshot below you can see where the requirements files can be inserted.
General Tips and Best Practices¶
- Put your requirements in the top level folder of your project.
- Add the installer suffix to your dependency file names. For example: