This short guide provides a brief overview of the key Platform specs. These are useful to know for the Users of the Platform.
Each installation is unique and comes with its own constraints. Double check with your IT department for any differences that could exist between this guide and your custom installation.
In a production instance, the DataScience.com Platform requires the following minimum number of servers or nodes. Note that it’s crucial for the sake of both stability and security to separate the nodes that run the core Platform services from the nodes that host user workloads.
3 Master Nodes¶
These run the core components of the Platform, split evenly across the three nodes in a leaderless cluster. In the case that one or more nodes fails, the remaining node(s) will take on the additional load of the failed Master.
These hosts should meet or exceed the minimum requirements in the Host Requirements section below.
2 Postgres Nodes¶
The two Postgres nodes consist of a master database and a stand-by database. In cloud installations, these can be one redundant managed database, such as what AWS RDS offers.
The hosts for these nodes should meet or exceed the requirements in the Databases section below.
User workloads are hosted on separate Worker nodes for increased stability and security. The number of Worker nodes and their sizing depends on the number of Platform users and how many resources you plan to allocate for each.
For instances of the Platform that are hosted on Amazon AWS, we also support on-demand instances for user workloads.
- RAM: 32GB
- CPU: 8 core
- Disk Space: 300GB
Supported Operating Systems¶
- Debian 9 (kernel 4.9+)
- Ubuntu 16.04 (kernel 4.4+)
- Fedora 24 (kernel 4.11+)
- Red Hat Enterprise Linux 7.3 (kernel 3.10.0-514+)
- CentOS 7.3 (kernel 3.10.0-514+)
The DataScience.com Platform relies on native flexbox support, which requires the following minimum versions:
- Apple Safari 10+
- Google Chrome 49+
- Microsoft Edge 14+
- Microsoft Internet Explorer 11+ (partial support; there are some known issues with flexbox)
- Mozilla Firefox 51+
- Opera 43+
The installation script for the DataScience.com Platform will automatically install the correct version of docker-engine; please ensure this version is not overwritten by other configuration management tools.
- docker 17.06-ce+
The Datascience.com Platform requires an email server to send invitations and collaboration notifications. You will need an SMTP server address, port, username, password, and “From” address. Please see your System Administrator for more details.
The following ports should be opened between the specified sources and destinations. “Administrative IP(s)” refers to the IP(s) from which Systems Administrators will need to access the instance. “User IP(s)” refers to the IP(s) from which users of the DataScience.com Platform will be accessing the application.
LDAP and SMTP Ports
For integrations such as LDAP and SMTP, we’ve provided the most commonly used ports. Please confirm these ports with your Service Administrator(s).
|25||Unencrypted SMTP traffic||Master node||SMTP server|
|80||HTTP (redirects to HTTPS)||Administrative IP(s) & User IP(s)||Master node|
|389 (optional)||Non-SSL LDAP traffic||Master node||LDAP server|
|443||HTTPS||Administrative IP(s) & User IP(s)||Master node|
|465||Encrypted SMTP traffic||Master node||SMTP server|
|636 (optional)||SSL LDAP traffic||Master node||LDAP server|
|2376||Docker remote socket||Master node||All nodes|
|2377||Docker Swarm API||All nodes||All nodes|
|5000||Logstash ingress||All nodes||Master node|
|5432||Postgres traffic||Master & Core nodes||Postgres endpoint|
|7946||Docker Swarm||All nodes||All nodes|
|8080||HTTP (redirects to HTTPS)||Administrative IP(s) & User IP(s)||Master node|
|8085||GitHub OAuth authentication DS Docker Event Listener||github.com & All nodes||All nodes|
|8300-8302||Consul||All nodes||All nodes|
|8500||Consul||All nodes||All nodes|
|8600||Consul||All nodes||All nodes|
|8686||Darkroom||All nodes||All nodes|
|8800||Admin Console||Administrative IP(s)||Master node|
|8830||Acquiesce||All nodes||All nodes|
|8899||Graphite & Statsd||All nodes||Master node|
|9870 - 9880||Cluster management||All nodes||All nodes|
|32768-61000||Proxy routing to containers||Master node||All nodes|
Connecting to data sources
In addition to the above, please ensure that routes are open between the DataScience.com Platform and whatever data sources you plan to connect.
In order to create projects in the DataScience.com Platform, you must integrate with a Git provider. We currently support the following:
- GitHub Enterprise 2.9+
- GitLab Enterprise 7+
File upload/download limit: 200MB
Google Compute Engine and SMTP: GCE does not currently support the use of standard SMTP servers. They do, however, offer support for their own Gmail service as well as several third party providers, including SendGrid.
Optional Supported Integrations¶
LDAP & Active Directory
If you use LDAP or Active Directory to manage users in your organization, your System Administrator can configure your Datascience.com Platform to use this integration to set up users and permissions. Optionally, when LDAP or Active Directory is enabled, you can also enable Single Sign-On.
On-Demand Compute Resources
If your Datascience.com Platform runs on Amazon AWS, your System Administrator can configure some or all of your services to run ad-hoc. This can save costs and resources.
3rd Party Logging
The Datascience.com Platform currently has optional integrations with Loggly and Datadog. If you use either of these for logging or monitoring, your System Administrator can add your API keys to the Platform to send data to these services.