Working in a Session

Sessions combine interactive data science tools with packages and compute resources. Sessions are perfect for iterative analytical work, such as exploratory data analysis or feature engineering. The Platform currently supports Jupyter and RStudio, with Zeppelin coming soon.

When you launch a session, you may select from the the default set of environments created by your administrator. You can install additional libraries once inside a session, just like you would on a regular laptop (for example, using pip in Python). To learn more, see the Environments section.

The DataScience.com Platform currently supports two interactive session tools:

  • Jupyter: Jupyter is a staple in the Python open source data community, but has kernels for R and many other languages. For more resources, see the Project Jupyter community page.
  • RStudio: RStudio is a fully-featured development environment primarily for R programmers. The DataScience.com Platform supports the open source version of RStudio. For more information, see their docs.

Launch a session

To start a session, select “Launch a Session” from the project actions button, then configure the following options:

  • Branch: Determine the branch of your repo that you’ll work on. The files from the most recent commit on your branch will be available in the session.
  • Name: Opt whether to name the session to help you keep track of multiple, concurrent sessions.
  • Tool: Choose an interactive tool to use in your session.
  • Compute Resource: Select from a list of machine sizes specified by your administrator.
  • Environment: Choose a set of pre-installed libraries. For more on environments, see the Environments and Dependencies page .
  • Additional Requirements: Install additional dependencies at runtime from a text file. For more on additional requirements, see the Environments and Dependencies page.
_images/5383d23-Screen_Shot_2017-09-06_at_1.51.23_PM.png

You can navigate back to a running session from your project’s Activity tab, or from the Running Resources menu, shown here:

_images/0aa0688-running_resources.png

Sync changes

Just like traditional Git workflows on a personal computer, sessions clone from a branch, changes are staged (automatically by the sync menu), and then you push your changes with a commit message back to the Git remote.

After you’ve made some changes to your files in a session, you can save them by syncing back to the Git repo. From the top Platform chrome bar in your session, drop down the Session menu, and select Sync.

_images/e9a5215-first-sync-screenshot.png

On the Sync menu, you’ll see which files have been added, deleted, or modified. Using the checkboxes, you can select which files you would like to sync. You can enter an optional commit message and then sync your changes back to the Git repo.

_images/4ad275a-screenshot-1_1.png

Warning

Be mindful of file sizes. Most Git providers have size limits for files you can store. For example, GitHub limits files to 100MB. Also, the DataScience.com Platform web app has a upload/download limit of 200MB, which affects downloading files from the Jupyter file browser.

If the file changes you’ve made don’t conflict with changes your team has made since you started your session, the Platform will push all your files as a new commit to the active branch.

If there are conflicts, you’ll have two choices:

  • Cancel: this option reverts your Git status back to the moment you hit Sync. You may keep working and manually resolve conflicts using the Jupyter or RStudio file editors.
  • Create Branch: this option creates a new branch and pushes your changes to that branch. The parent of the branch will be the commit that was originally loaded into your Session.

Git commands behind the scenes

Below are the exact commands that run for each Sync feature.

Loading the Sync menu:

git status

Sync action:

git add .
git commit -m <message you provide>
git fetch
git merge <branch you chose when launching> --no-commit --no-ff

Cancelling a Sync after a conflict:

git reset

Creating a new branch after a conflict:

git branch <name you provide>

Shut down a session

A session will run and consume compute resources until you stop it. To shut down a session, navigate to the Session menu in the top bar and select Shutdown.

Warning

You can’t recover unsaved changes from a session after shutting down. If you want to save the work you have done, make sure to sync your files before shutting down.