What is Anaconda? Why Anaconda?

Anaconda is a comprehensive, curated, high quality, and high performance distribution and package manager for open source software such as Python, R, and many associated packages, intended for use by scientists. It is available for Linux, Windows, and MacOS. With the help of dedicated Conda environments, we are able to offer extensive collections of R and Python tools. Conda's version and dependency management makes sure that in an environment, the individual components are compatible. Moreover, for users, it is easy to install their own specialized toolsets into private environments.

Starting with 2020-03, we have been basing our Anaconda installation on the Miniconda installer. To avoid version conflicts, we install individual toolsets into separate environments.

Discover all our pre-installed versions of (Ana)conda modules by issuing

module avail Anaconda3

or

module avail Anaconda3/2022.01

to get a list of the most recent modules:

python-3.9.7-anaconda-2022.01
Python 3.9.7 with an extensive collection of more than 370 packages. Amonge these are NumPy, Scipy, Pandas, and Matplotlib; frontends including ipython, Jupyterlab, and Spyder, and compilers/optimizers (e.g. Cython and Numba). The included Intel MKL numerical library ensures optimal performance of numerical methods. This environment may be all you need if you plan to use Python for general purpose scientific data processing and simulations. Our tests have shown that Anaconda's Python outperforms locally compiled versions as well as versions installed from conda-forge.

r-3.6.0-conda-2022.01
The R statistics software with more than 290 R libraries and the RStudio IDE for R.

tensorflow-2.6.0-conda-2022.01 and tensorflow-gpu-2.4.1-conda-2022.01
Tensorflow machine learning tool in CPU and GPU variants from Anaconda. These are less recent than their conda-forge counterparts but perform much better.

pytorch-1.10.1-conda-2022.01
Pytorch (machine learning and GPU capable replacement of Numpy) downloaded from the pytorch Conda channel - this version runs on GPU as well as CPU.

miniconda-base-2022.01
Conda's base environment - your starting point for individual user extensions.

...
More environments can be easily added to your own account (see below) or centrally upon request if of sufficient general interest.

Using Anaconda On The Leo Systems

The preinstalled Anaconda collections for Python and R are quite comprehensive and may satisfy many of your needs. This is covered in the following section.

If you need to install your own packages, proceed to the Section describing conda extensions.

Using Pre-Installed Environments

To discover which Anaconda environments are installed, issue the command

module avail Anaconda3

For new projects, usually the most recent version is appropriate. The 5.x versions are now considered legacy. Python2 is no longer supported.

To use any of the Anaconda environments, upon login or at the beginning of a batch job, first issue

module load Anaconda3/yyyy.mm/module-name-yyyy.mm

This will let you use the software provided with the respective module, and a restricted version of the conda command.

Issue

conda list

to see which packages are available in the current environment.

Extending Anaconda For Your Own Needs

Important: One-time preparation before you start

Environments are downloaded and built in your $HOME/.conda directory. This can easily grow to many gigabytes, overflowing your $HOME quota. Therefore we recommend that you create .conda in yout $SCRATCH and set up a symbolic link $HOME/.conda -> $SCRATCH/.conda

First make sure that $HOME/.conda does not exist yet:

 $ ls -ld $HOME/.conda
ls: cannot access /home/cxxx/ccxxxyyyy/.conda: No such file or directory

If the symlink exists, you are done. Else, if $HOME/.conda is a directory, decide if you want to rename it (mv -i $HOME/.conda $HOME/.conda-backup), remove it (rm -rf $HOME/.conda), or move it to $SCRATCH (mv -i $HOME/.conda $SCRATCH).

Now, create the new Conda directory in $SCRATCH if it does not exist yet

mkdir -p $SCRATCH/.conda

Finally, create the symbolic link

ln -s $SCRATCH/.conda $HOME/.conda

At this point, you may also want to redirect your .cache directory to $SCRATCH (assuming it currently contains no valuable data):

cd $HOME
rm -r .cache
mkdir -p $SCRATCH/.cache
ln -s $SCRATCH/.cache $HOME/.cache

Note that you are responsible for backup of your own data in $SCRATCH. Since it is easy to recreate environments, it is usually sufficient to record the necessary steps for successfully creating your environment(s).

At the beginning of each shell session or batch job

Load the Anaconda/Miniconda base module

module load Anaconda3/yyyy.mm/miniconda-base-yyyy.mm

Then, to enable full Conda functionality, issue

source $UIBK_CONDA_PROFILE
conda activate base

This loads the Miniconda base environment. Now conda is a shell function which allows manipulation of your session's environment variables, creation of new environments, and activation of existing environments with conda mechanisms.

Note:

  • Never use conda init because this will modify your $HOME/.bashrc in a way which is incompatible with our modules environment.
  • To obtain a list of all environments (preinstalled and your own) issue the command
    conda env list
  • To obtain a list of packages in an environment, issue
    conda list -n environment

Creating And Installing Your Own Environments

If the pre-installed environments do not meet your needs, you should create one or more environments and install the required packages into these environments. To keep installations apart and minimize possible version conflicts, we recommend to create separate environments for different projects requiring disparate packages.

Try to install as many missing packages using conda to get optimized versions. The remaining packages can be installed into your environment using the Conda version of Python's pip command or R's install.packages() function (see below).

For all of the following, first, load and activate the Miniconda base environment as described above.

Creating Your New Conda Environment

You typically should have a good notion of the packages necessary to run your software, either from the prerequisites section of the software documentation, or from the import statements of your programs.

Given a tentative list of required packages, create your new environment:

conda create -n myenvironment package1 [package2 ...] 

Substitute a good name for myenvironment. Your environment will be created in $HOME/.conda/envs/myenvironment, and all packages requested on the command line are installed into your new environment.

Then activate your environment

conda activate myenvironment

and try to run your software. You may also want to explicitly look for missing components by trying to invoke them:

For all needed shell commands name, issue
which name
and note which commands have not been found.

For Python packages, write small test programs containing
import name
statements and note which packages cannot be found.

Likewise proceed for R packages with
library('name')
statements.

Identifying Missing Packages Which Can Be Installed By Conda

For each required/missing package name, issue

conda search name

Take note if name was found. When your list is complete, reiterate your installation using the new set of packages

conda activate base
conda env remove myenvironment
conda create -n myenvironment package1 [package2 ...]
conda activate myenvironment

While it is possible to install more packages after creating and activating an environment, installing all packages at creation does a much better job at avoiding version conflicts.

Repeat this process until no new packages can be installed.

All remaining Python modules or R libraries (i.e. which could not be installed using Conda), may be installed using the conda-specific version of pip (see below) and install.packages() into your activated Conda environment.

Adding Missing Components Using Pip

If, after installing required packages using Conda, any packages are still missing, these can be installed into your Conda environment using Conda's version of pip.

First, activate your environment if you have not done so

conda activate myenvironment

and use Conda to install pip and Conda's compiler environment:

conda install pip gcc_linux-64 gxx_linux-64 gfortran_linux-64

The compilers are necessary because the OS compilers are too old for most current software.

Then, do not create a Python virtual environment (as you would normally do outside Conda), but simply use pip to install the remaining packages into your active Conda environment:

conda activate myenvironment
pip install package1 [package2 ...]

This may result in a large number of requisites to be installed automatically, many of which could have been installed by Conda instead. To check, add the list of packages installed by pip to your list of candidate conda packages, destroy your existing environment:

conda deactivate
conda env remove -n myenvironment

... and repeat above Conda installation with your enhanced list. Often, this leaves very few packages to be installed with pip, making optimal use of Conda's performance optimizations.

Note

After using pip for the first time into a given Conda environment, you no longer should use Conda to install more packages into the same environment. Should this be necessary, simply begin from the start by creating a new Conda environment and proceeding as described above.

Final Cleanup

When you are done installing Conda packages, you may, with your environment still active, use

conda clean --all --yes

to remove unneeded installation material from your environments.

Various Hints

Mamba

If you install your own packages, you may want to try mamba [create|install] instead of conda. Mamba is much faster than conda, but it uses a different resolver and is in an early stage of development, so your mileage may vary.

Alternate Channels

If your software cannot be installed from the default Anaconda repository, you may install it either using pip as described above, or you may want to try a different channel using the -c channel argument for conda create.

A very popular channel, which contains a huge number of additions over Anaconda's default channel as well as more recent versions of packages also found in default, is conda-forge. Note, however, that conda-forge appears to have no quality monitoring, so you may end up with unreliable or poorly performing packages.

Another channel which contains many Bioinformatics packages is bioconda.

Sample Job Fragment

If you have created your own environment(s), you may use the following commands as a template

module load Anaconda3/yyyy.mm/miniconda-base-yyyy.mm
source $UIBK_ANACONDA3_PROFILE
conda activate myenvironment

More On Environments

  • You may also create a Conda environment in a non-standard location using conda create -p path/to/env. Such environments will not be listed by conda env list and need to be remembered.

  • Environments do not nest. While conda deactivate takes you to a previously activated environment, conda activate newenv will replace the currently activated environment with the newenv.
  • For details, see the Conda Managing environments documentation.

Using Your PC To Display a Jupyter Notebook Running On A Server

You can start a Jupyter kernel in a Leo server session and display its dialog in a browser window on your PC.

Windows PC Using Putty

  1. Start new browser window on your PC.
  2. Connect to Leo using Putty.
  3. In the Leo session, load Anaconda and activate your environment as needed.
  4. Start Jupyter:
    jupyter-lab --no-browser
  5. Jupyterlab will display several URLs. Indentify the following URL:
    http://localhost:88xy/?token=zzzzzzzzzzzzzzzzzzzzz
  6. In Putty, create a tunnel:
    1. Control-Right-Click in Putty window, then "Change Settings / Connection / SSH / Tunnels"
    2. Source Port: 88xy from URL
    3. Destination Port: localhost:88xy
      and make sure the radio buttons Local and IPv4 are selected.
    4. Hit the Add button. (check if 4L88xy localhost:88xy appears in port list)
    5. Hit the Apply button.
  7. Copy and paste above URL into your browser's address field and hit Enter. Your Jupyter session should be displayed.

Linux PC Using OpenSSH

  1. Start new browser window on your PC.
  2. Connect to Leo using OpenSSH.
  3. In the Leo session, load Anaconda and activate your environment as needed.
  4. Start Jupyter:
    jupyter-lab --no-browser
  5. Jupyterlab will display several URLs. Indentify the following URL:
    http://localhost:88xy/?token=zzzzzzzzzzzzzzzzzzzzz
  6. On your local PC, in another terminal window, create an SSH Tunnel:
    ssh -N -f -L localhost:88xy:localhost:88xy remoteuser@leoX
    Meaning of options:
    -N no remote command, tunnel only
    -f asynchronous
    -L ...define port mapping
  7. Copy and paste above URL into your browser's address field and hit Enter. Your Jupyter session should be displayed.

Jupyter Proxy Script For Linux Workstation

You can automate the creation of the tunnel by creating the following shell script jupyter-proxy in $HOME/bin on your Linux workstation:

cat > $HOME/bin/jupyter-proxy <<'EOF'
#!/bin/bash
 
case $# in
2)
 
  userhost="$1"
  url="$2"
  <<<"$2" grep -qE -e "http://localhost:[[:digit:]]+/\?token=[[:alnum:]]+" ||
    { echo >&2 malformed URL ; exit 2 ; }
  port=$(<<<"$2" sed -r 's|(http://localhost:([[:digit:]]+)/\?token=[[:alnum:]]+)|\2|')
 
  echo "$url"
  echo "hit ^C to terminate tunnel"
  ssh -N -L localhost:$port:localhost:$port $userhost
 
;;
*)
  echo >&2 usage: $0 [user@]host URL
  exit 2
;;
esac
EOF
chmod +x $HOME/bin/jupyter-proxy

Usage:

jupyter-proxy user@host http://localhost:88xy/?token=zzzzzzzzzzzzzzzzzzzzz
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                        paste this from the jupyter-lab output

This also works on Windows PCs with WSL or Cygwin installed.

Documentation and Notes

License Information

Use of Anaconda is free for use by individuals for non-profit academic purposes. If in doubt that your planned use falls within this category, please read the End User License Agreement.

Anaconda And Conda Web Sites

Python 2 Legacy Code

We no longer support Python2 because it is obsolete as of January 2020. See the "Sunsetting Python 2" article for background information.

If you still have legacy code written in Python2, it will likely be possible to automatically convert large portions of your code using tools such as 2to3. Since Python2 and Python3 have a few semantically undecidable incompatibilities (e.g. string handling, generator functions vs. functions returning lists), you may need to apply a few manual corrections after automatic conversions to get your code to run and perform well. To our experience with a few projects, the effort for a successful conversion is not very high.

Links To Other Noteworthy Anaconda Installations

Notes

This is a generic binary installation that should work on all of our microarchitectures. Should any of your Anaconda based jobs or processes abort with Illegal instruction, please let us know, and we will try to fix this.

Nach oben scrollen