Platform Configuration

Introduction

This short guide provides a brief overview of the key Platform specs. These are useful to know for the Users of the Platform.

Each installation is unique and comes with its own constraints. Double check with your IT department for any differences that could exist between this guide and your custom installation.

Instance Footprint

Production

In a production instance, the DataScience.com Platform requires the following minimum number of servers or nodes. Note that it’s crucial for the sake of both stability and security to separate the nodes that run the core Platform services from the nodes that host user workloads.

3 Master Nodes

These run the core components of the Platform, split evenly across the three nodes in a leaderless cluster. In the case that one or more nodes fails, the remaining node(s) will take on the additional load of the failed Master.

These hosts should meet or exceed the minimum requirements in the Host Requirements section below.

2 Postgres Nodes

The two Postgres nodes consist of a master database and a stand-by database. In cloud installations, these can be one redundant managed database, such as what AWS RDS offers.

The hosts for these nodes should meet or exceed the requirements in the Databases section below.

Worker Nodes

User workloads are hosted on separate Worker nodes for increased stability and security. The number of Worker nodes and their sizing depends on the number of Platform users and how many resources you plan to allocate for each.

For instances of the Platform that are hosted on Amazon AWS, we also support on-demand instances for user workloads.

Host Requirements

Per host:

  • RAM: 32GB
  • CPU: 8 core
  • Disk Space: 300GB

Supported Operating Systems

(64-bit distributions)

.deb Distributions

  • Debian 9 (kernel 4.9+)
  • Ubuntu 16.04 (kernel 4.4+)

.rpm Distributions

  • Fedora 24 (kernel 4.11+)
  • Red Hat Enterprise Linux 7.3 (kernel 3.10.0-514+)
  • CentOS 7.3 (kernel 3.10.0-514+)

Supported Browsers

The DataScience.com Platform relies on native flexbox support, which requires the following minimum versions:

  • Apple Safari 10+
  • Google Chrome 49+
  • Microsoft Edge 14+
  • Microsoft Internet Explorer 11+ (partial support; there are some known issues with flexbox)
  • Mozilla Firefox 51+
  • Opera 43+

Additional Software

The installation script for the DataScience.com Platform will automatically install the correct version of docker-engine; please ensure this version is not overwritten by other configuration management tools.

  • docker 17.06-ce+

Email Integration

The Datascience.com Platform requires an email server to send invitations and collaboration notifications. You will need an SMTP server address, port, username, password, and “From” address. Please see your System Administrator for more details.

Port Configuration

The following ports should be opened between the specified sources and destinations. “Administrative IP(s)” refers to the IP(s) from which Systems Administrators will need to access the instance. “User IP(s)” refers to the IP(s) from which users of the DataScience.com Platform will be accessing the application.

Caution

LDAP and SMTP Ports

For integrations such as LDAP and SMTP, we’ve provided the most commonly used ports. Please confirm these ports with your Service Administrator(s).

Port Usage Source(s) Destination(s)
25 Unencrypted SMTP traffic Master node SMTP server
80 HTTP (redirects to HTTPS) Administrative IP(s) & User IP(s) Master node
389 (optional) Non-SSL LDAP traffic Master node LDAP server
443 HTTPS Administrative IP(s) & User IP(s) Master node
465 Encrypted SMTP traffic Master node SMTP server
636 (optional) SSL LDAP traffic Master node LDAP server
2376 Docker remote socket Master node All nodes
2377 Docker Swarm API All nodes All nodes
5000 Logstash ingress All nodes Master node
5432 Postgres traffic Master & Core nodes Postgres endpoint
7946 Docker Swarm All nodes All nodes
8080 HTTP (redirects to HTTPS) Administrative IP(s) & User IP(s) Master node
8085 GitHub OAuth authentication DS Docker Event Listener github.com & All nodes All nodes
8300-8302 Consul All nodes All nodes
8500 Consul All nodes All nodes
8600 Consul All nodes All nodes
8686 Darkroom All nodes All nodes
8800 Admin Console Administrative IP(s) Master node
8830 Acquiesce All nodes All nodes
8899 Graphite & Statsd All nodes Master node
9870 - 9880 Cluster management All nodes All nodes
32768-61000 Proxy routing to containers Master node All nodes

Important

Connecting to data sources

In addition to the above, please ensure that routes are open between the DataScience.com Platform and whatever data sources you plan to connect.

Git Providers

In order to create projects in the DataScience.com Platform, you must integrate with a Git provider. We currently support the following:

  • GitHub.com
  • GitHub Enterprise 2.9+
  • Bitbucket.org
  • GitLab.com
  • GitLab Enterprise 7+

Limits

File upload/download limit: 200MB

Google Compute Engine and SMTP: GCE does not currently support the use of standard SMTP servers. They do, however, offer support for their own Gmail service as well as several third party providers, including SendGrid.

Optional Supported Integrations

LDAP & Active Directory

If you use LDAP or Active Directory to manage users in your organization, your System Administrator can configure your Datascience.com Platform to use this integration to set up users and permissions. Optionally, when LDAP or Active Directory is enabled, you can also enable Single Sign-On.

On-Demand Compute Resources

If your Datascience.com Platform runs on Amazon AWS, your System Administrator can configure some or all of your services to run ad-hoc. This can save costs and resources.

3rd Party Logging

The Datascience.com Platform currently has optional integrations with Loggly and Datadog. If you use either of these for logging or monitoring, your System Administrator can add your API keys to the Platform to send data to these services.