Best Practices: Migrating Existing Work

In this article you will learn how to migrate existing work onto the Platform. There are three different example cases:

  • Case 1: You want to move existing GitHub/GitLab/Bitbucket repositories onto the Platform
  • Case 2: You want to copy files from your local environment into an existing project on the Platform
  • Case 3: Your work is not in a version-controlled repository. Where do you start?

Moving Existing GitHub/GitLab/Bitbucket Repositories on the Platform

This is the easiest case. Make sure that under Settings (accessible via your avatar drop-down in the top right) you have your Git provider credentials in place.

../_images/Git-credentials.png

Once you have verified your credentials, go back to the Projects page, click New Project, choose your Git provider, and enter the name of the repo you want to migrate to the Platform.

../_images/Git-repo.png

Copying Files from Your Local Environment into a Project

If you have a written notebook or a script on your laptop and you want to move those files into an existing project, there are two methods you may follow:

  • Method 1: Clone the repository of the project on your machine, git add the files, git commit them and push your branch to remote. Open a session on the Platform under that project and you should see that the new files are accessible in your project.
  • Method 2: You can add files to your project by using the Upload button within your Jupyter session.
../_images/Jupyter-session-upload.png

If you have multiple files that you want to move to an existing project, create a file archive (tar) and upload it to the Platform. From a Python Jupyter notebook, enter the following command to unpack the file:

!tar -xvf filename.tar

If you have compressed the file with gzip, you can unpack and decompress the file with a single command:

!tar -xjvf filename.tar.gz

Migrating Work That is Not in a Version-Controlled Repository

In this case, you have files in a folder either locally or in a remote environment that is not version-controlled.

Warnings

  • Avoid copying or moving large data files in your project. If your team is using the cloud, put these files on a shared file system such as Amazon AWS S3 or Microsoft Azure Blob. The Docker containers are of finite size and you don’t want to version control large data files. Github, for example, has a file size limit of 100MB. Keep your repository under 1GB in size.