安装 (AEN 4.1.2)#

概述

此安装过程涵盖安装所需的步骤 一个基本的 Anaconda Enterprise Notebooks (AEN) 系统,该系统由 前端服务器、一个或多个网关以及一个或多个计算节点。

如果您对安装说明有任何疑问, 请联系您的销售代表或 Priority 支持团队。

组件

AEN 平台由三个主要服务组组成:AEN ServerAEN GatewayAEN Compute。 这些服务可以是分布式的 跨多个服务器(推荐),或在单台计算机上运行。

服务器

Server 组件是系统的管理前端。 这是用户登录系统的地方,其中存储了用户帐户, 管理员可以在其中管理系统,并与数据库交互。

服务器是所有用户的主入口点。它处理设置 项目,并确保将用户发送到正确的 Data Center 进行 给定项目

Anaconda Enterprise Notebooks 使用 MongoDB 存储内部数据。 这通常与 Server 在同一主机上运行,但 也可以部署在单独的主机上。

Server 使用 NGINX 来处理面向用户的 Web 界面。 NGINX 充当请求代理。实际的 Server Web 进程在 一个仅侦听 的高编号端口localhost, and NGINX forwards requests there. The NGINX server is also responsible for static content.

Gateway

The Gateway is a reverse proxy that authenticates users and automatically directs them to the proper AEN Compute machine for their project.

The Gateway provides a single access point to a set of Compute Nodes, and acts as a proxy service to manage authorization and mapping of URLs and ports to services that are running on Compute Nodes, thus providing a consistent uniform interface for the user.

Generally you need one Gateway for each physical location in your organization using AEN for firewall reasons.

Users will not notice the Gateway as it automatically routes requests to the proper Compute Node.

Compute Nodes

Compute Nodes are where Apps (such as Jupyter Notebook and Workbench) actually run. These are also the hosts that a user would see in a terminal session or if they used SSH to access the node. It is where all user-visible programs run. Each Project is associated with one or more Compute Nodes, and these in turn are part of a single Data Center. Compute Nodes need only be reachable by the AEN Gateway, so they can be completely isolated by a firewall.

Component organization

../../../_images/ae-notebooks/4.1.2/install/components.png

image1

Organizationally, each Anaconda Enterprise Notebooks installation has exactly one Server instance. One or more Gateway instances can be configured and each Compute Node can only connect to one Gateway. The collection of Compute Nodes served by a single Gateway will be referred to as a Data Center. New Data Centers can be added to the AEN installation at any time.

For example, a Anaconda Enterprise Notebooks deployment with two Data Centers, where one Gateway had a cluster of 20 physical computers, and the second Gateway had 30 virtual machines would have the following complement of services installed and running:

1  AEN Server instance
2  AEN Gateway instances
50 AEN Compute instances (20 + 30)

Anaconda Enterprise Notebooks users interact with the system predominantly through Projects, a set of conda environments, Jupyter Notebooks, and other Apps that can be accessed by a Team of users.

Projects are associated with a single Data Center within the AEN environment. The team of users includes one Owner, which is the user that created the Project.

Since Anaconda Enterprise Notebooks is web-based, it uses configurable HTTP ports on the Server.

Installers

The Anaconda Enterprise Notebooks installers are available to paid customers only. If you are interested in a demonstration of Anaconda Enterprise Notebooks, please contact us.

Distributed install

In a distributed install the Server and Gateway run on separate hosts.

Single box install

Both the Server and the Gateway need separate external ports since they are independent services that are running on the same host in the single-box installation.

Installation requirements

Ensure you have the proper hardware and software resources before installing AEN.

Hardware requirements

See System Requirements for all Anaconda Enterprise hardware requirements.

NOTE: We recommend putting ``/opt/wakari`` and ``/projects`` on the same filesystem. If the project and conda env directories are on separate filesystems then more disk space will be required on compute nodes and performance will be worse.

Software requirements

  • Red Hat/CentOS versions 6.5 to 7.2 on all nodes (Other Linux distros are supported, but this installation document assumes Red Hat or CentOS.)
  • Linux home directories are required since Jupyter looks in $HOME for profiles and extensions.
  • /opt/wakari: Ability to install here and at least 10 GB of storage.
  • /projects: Size depends on number and size of projects. At least 20 GB of storage.

Linux system accounts required

Some Linux system accounts (UIDs) are added to the system during installation. If your organization requires special actions, here is the list of UIDs:

  • mongod (Red Hat) or mongodb (Ubuntu/Debian): Created by the RPM or deb package
  • elasticsearch: Created by RPM or deb package
  • nginx: Created by RPM or deb package
  • AEN_SRVC_ACCT: Created during installation of Anaconda Enterprise Notebooks, and defaults to “wakari”
  • ANON_USER: An account such as public or anonymous on the Compute Node If this user is not found, AEN_SRVC_ACCT will try to create it, and if this fails, projects will fail to start.
  • ACL: These directories need the filesystem mounted with Posix ACL (Access Control List) support (Posix.1e). Check with mount and tune2fs -l /path/to/filesystem | grep options

Additional software requirements

AEN Server
  • Mongo Version: >= 2.6.8 and < 3.0
  • NGINX version: >= 1.6.2
  • ElasticSearch: >= 1.7.2
  • Oracle JRE 7 or 8
  • bzip2
AEN Gateway

No additional software prerequisites.

AEN Compute Node
  • git
  • bzip2
  • bash (Red Hat default) or zsh
  • X Window System

Note: If you don’t want to install the whole X Window System, you still need to install the following packages for R plotting support:

sudo yum install libXrender libXext libXdmcp libSM libICE libXt \
dejavu-sans-fonts dejavu-serif-fonts dejavu-fonts-common \
fontpackages-filesystem

Security requirements

  • Root or sudo access
  • SELinux in Permissive or Disabled mode

One way to change SELinux to either permissive or disabled mode is to edit the /etc/sysconfig/selinux file and set SELINUX parameters value to either disable or permissive. Edit the following file using either root or sudo access:

/etc/sysconfig/selinux

Edit the following and reboot for changes to take effect:

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.

SELINUX=enforcing

# SELINUXTYPE= can take one of these two values:
    #     targeted - Targeted processes are protected,
    #     mls - Multi Level Security protection.

SELINUXTYPE=targeted

Verify changes with getenforce.

Network/TCP requirements

Note that all port numbers are configurable, but defaults are shown below.

Direction Type Default Port Protocol Optional Configurable Comments
Inbound TCP 80 HTTP or HTTPS No Yes Server
Inbound TCP 8089 HTTP or HTTPS No Yes Gateway
Inbound TCP 5002 HTTP No Yes Compute

Other requirements

Assuming the above requirements are met, there are no additional dependencies necessary for AEN.

Note: While not a requirement for running the software, these instructions use curl or wget to download packages used in the install process. You may use other appropriate means to put the needed files into the installation directory.

Install Steps#

Carry out the procedures linked from the table below to perform a complete install of all Anaconda Enterprise Notebooks components.

The following optional install procedures may need to be performed, depending on how you set up your Data Center:

Additional post-install information: