What is Anaconda? Why Anaconda?

Anaconda is a comprehensive, curated, high quality and high performance distribution for Python, R, and many associated packages for Linux, Windows, and MacOS, intended for use by scientists. With the help of dedicated Conda environments, we are able to offer extensive collections of R and Python tools. Conda's version and dependency management makes sure that in an environment, the individual components are compatible. Moreover, for users, it is easy to install their own specialized toolsets into private environments.

Starting with 2020-03, we base our Anaconda installation on the Miniconda installer. To avoid version conflicts, we install individual toolsets into separate environments:

Discover current versions of modules described below by issuing
module avail Anaconda3

python-3.7.7-anaconda-2020.03
Python 3.7 with comprehensive collection of more than 250 packages. Amonge these are NumPy, Scipy, and Matplotlib; frontends including ipython, Jupyter, and Spyder, and compilers/optimizers (e.g. Cython and Numba). The included Intel MKL ensures optimal performance of numerical methods. Preliminary tests have shown that Anaconda's Python outperforms locally compiled versions.

r-3.6.0-conda-2020.03
The R statistics software with more than 130 R libraries and the RStudio IDE for R.

tensorflow[-gpu]-2.1.0-conda-2020.03
Tensorflow machine learning tool in CPU and GPU variants

pytorch-1.4.0-conda-2020.03
Pytorch (machine learning and GPU capable replacement of Numpy) downloaded from Pytorch Conda channel - this version runs on GPU and CPU as well.

miniconda-base-2020.03
Conda's base environment - starting point for individual user extensions.

...
More environments can be easily added to your own account (see below) or centrally upon request if of sufficient general interest.

Using Anaconda On The Leo Systems

The preinstalled Anaconda collections for Python and R are quite comprehensive and may satisfy many of your needs. This is covered in the following section.

If you need to install your own packages, proceed to the Section describing conda extensions.

Using Pre-Installed Environments

To discover which Anaconda environments are installed, issue the command

module avail Anaconda3

For new projects, usually the most recent version is appropriate. The 5.x versions are now considered legacy.

To use any of the Anaconda environments, upon login or at the beginning of a batch job, first issue

module load Anaconda3/yyyy.mm/module-name-yyyy.mm

This will let you use the software provided with the respective module, and a restricted version of the conda command.

Issue

conda list

to see which packages are available in the current environment.

Extending Anaconda For Your Own Needs

Important: One-time preparation before you start

Environments are downloaded and built in your $HOME/.conda directory. This can easily grow to many gigabytes, overflowing your $HOME quota. Therefore we recommend you set up a symbolic link

$HOME/.conda -> $SCRATCH/.conda

First make sure that $HOME/.conda does not exist:

 $ ls -ld $HOME/.conda
ls: cannot access /home/cxxx/ccxxxyyyy/.conda: No such file or directory

If the symlink exists, you are done. If $HOME/.conda is a directory, decide if you want to rename or remove it.

Then create the symbolic link

ln -s $SCRATCH/.conda $HOME/.conda

Note that you are responsible for backup of your own data in $SCRATCH. Since it is easy to recreate environments, it is usually sufficient to record the necessary steps for successfully creating your environment(s).

At the beginning of each shell session or batch job

Load the Anaconda/Miniconda base module

module load Anaconda3/yyyy.mm/miniconda-base-yyyy.mm

Then, to enable full Conda functionality, issue

source $UIBK_ANACONDA3_PROFILE
conda activate base

This loads the Miniconda base environment. Now conda is a shell function which allows manipulation of your session's environment variables, creation of new environments, and activation of existing environments with conda mechanisms.

Note:

  • Never use conda init because this will modify your $HOME/.bashrc in a way which is incompatible with our modules environment.
  • To obtain a list of all environments (preinstalled and your own) issue the command
    conda env list
  • To obtain a list of packages in an environment, issue
    conda list -n environment
  • We no longer actively support Python2 because it is obsolete as of January 2020. See the "Sunsetting Python 2" article for background information.

Creating And Installing Your Own Environments

If you need additional packages, you should create one or more environments and install the required packages into these environments. To keep installations apart and minimize possible version conflicts, we recommend to create separate environments for different projects requiring disparate packages.

Try to install as many missing packages using conda to get optimized versions. The remaining packages can be installed into your environment using the Conda version of Python's pip command or R's install.packages() function (see below).

For all of the following, first, load the Miniconda base environment as described above.

Checking Which Of Your Required Packages Are Already Installed.

Start by loading an existing environment that appears to come close to your needs.

For all needed shell commands name, issue
which name
and note which commands have not been found.

For Python packages, write small test programs containing
import name
statements and note which packages cannot be found.

Likewise proceed for R packages with
library('name')
statements.

Identifying Missing Packages Which Can Be Installed By Conda

For each required/missing package name, issue

conda search name

Take note if name was found. When your list is complete, do

conda create -n myenvironment package1 [package2 ...] 

Substitute a good name for myenvironment. Your environment will be created in $HOME/.conda/envs/myenvironment, and all packages requested on the command line (including necessary up- and downgrades of existing packages) are installed into your environment.

While it is possible to install more packages after creating and activating an environment, installing all packages at creation does a better job avoiding version conflicts.

It is also possible to duplicate an existing environment:

conda create --clone existingenvironment -n myenvironment

In this case, you cannot add packages in the create-command, but need conda install after activating your environment.

Before using your environment, issue

conda activate myenvironment

All remaining Python modules or R libraries (i.e. which could not be installed using Conda), may be installed using the conda-specific version of pip (see below) and install.packages() into your activated Conda environment.

When you are done installing Conda packages, you may, with your environment still active, use

conda clean --all

to remove unneeded installation material from your environment.

Adding Missing Components Using Pip

If, after installing required packages using Conda, any packages remain, these can be installed into your Conda environment using Conda's version of pip.

First, activate your environment if you have not done so, and use Conda to install pip:

conda install pip gcc_linux-64 gxx_linux-64 gfortran_linux-64

The compilers are necessary because the OS compilers are too old for most current software.

Then, do not create a Python virtual environment (as you would normally do outside Conda), but simply use pip to install the remaining packages into your active Conda environment:

pip install package1 [package2 ...]

This may result in a large number of requisites to be installed automatically, many of which may be installed by Conda. To check, add the list of packages installed by pip to your list of candidate conda packages, destroy your existing environment:

conda deactivate
conda env remove -n myenvironment

... and repeat above Conda installation with your enhanced list. Often, this leaves very few packages to be installed with pip, making optimal use of Conda's performance optimizations.

Note

After using pip for the first time into a given Conda environment, you no longer should use Conda to install more packages into the same environment. Should this be necessary, simply begin from the start by creating a new Conda environment and proceeding as described above.

Various Hints

Sample Job Fragment

If you have created your own environment(s), you may use the following commands as a template

module load Anaconda3/yyyy.mm/miniconda-base-yyyy.mm
source $UIBK_ANACONDA3_PROFILE
conda activate myenvironment

More On Environments

  • You may also create a Conda environment in a non-standard location using conda create -p path/to/env. Such environments will not be listed by conda env list and need to be remembered.

  • Environments do not nest. While conda deactivate takes you to a previously activated environment, conda activate newenv will replace the currently activated environment with the newenv.
  • For details, see the Conda Managing environments documentation.

Using Your PC To Display a Jupyter Notebook Running On A Server

You can start a Jupyter kernel in a Leo server session and display its dialog in a browser window on your PC.

Windows PC Using Putty

  1. Start new browser window on your PC.
  2. Connect to Leo using Putty.
  3. In the Leo session, load Anaconda and activate your environment as needed.
  4. Start Jupyter:
    jupyter-lab --no-browser
  5. Copy displayed URL
    http://localhost:88xy/?token=zzzzzzzzzzzzzzzzzzzzz
    to your clipboard.
  6. Paste URL into your browser's address field but do not hit enter.
  7. In Putty, create a tunnel:
    1. Conrol-Right-Click in Putty window, then "Change Settings / Connection / SSH / Tunnels"
    2. Source Port: 88xy from URL
    3. Destination Port: localhost:88xy / Local / IPv4
    4. Add (check if 4L88xy localhost:88xy appears in port list)
    5. Apply
  8. In your browser, activate the URL. Your Jupyter session should be displayed.

Linux PC Using OpenSSH

  1. Start new browser window on your PC.
  2. Connect to Leo using OpenSSH.
  3. In the Leo session, load Anaconda and activate your environment as needed.
  4. Start Jupyter:
    jupyter-lab --no-browser
  5. Copy displayed URL
    http://localhost:88xy/?token=zzzzzzzzzzzzzzzzzzzzz
    to your clipboard.
  6. Paste URL into your browser's address field but do not hit enter.
  7. On your local PC, in another terminal window, create an SSH Tunnel:
    ssh -N -f -L localhost:88xy:localhost:88xy remoteuser@leoX
    Meaning of options:
    -N no remote command, tunnel only
    -f asynchronous
    -L ...define port mapping
  8. In your browser, activate the URL. Your Jupyter session should be displayed.

Jupyter Proxy Script For Linux Workstation

You can automate the creation of the tunnel by creating the following shell script jupyter-proxy in $HOME/bin on your Linux workstation:

cat > $HOME/bin/jupyter-proxy <<'EOF'
#!/bin/bash
 
case $# in
2)
 
  userhost="$1"
  url="$2"
  <<<"$2" grep -qE -e "http://localhost:[[:digit:]]+/\?token=[[:alnum:]]+" ||
    { echo >&2 malformed URL ; exit 2 ; }
  port=$(<<<"$2" sed -r 's|(http://localhost:([[:digit:]]+)/\?token=[[:alnum:]]+)|\2|')
 
  echo "$url"
  echo "hit ^C to terminate tunnel"
  ssh -N -L localhost:$port:localhost:$port $userhost
 
;;
*)
  echo >&2 usage: $0 [user@]host URL
  exit 2
;;
esac
EOF
chmod +x $HOME/bin/jupyter-proxy

Usage:

jupyter-proxy user@host localhost:88xy:localhost:88xy

Documentation and Notes

Anaconda And Conda Web Sites

Links To Other Noteworthy Anaconda Installations

Notes

This is a generic binary installation that should work on all of our microarchitectures. Should any of your Anaconda based jobs or processes abort with Illegal instruction, please let us know, and we will fix this.

Nach oben scrollen