Singularity




This version is obsolete but still contains some material on the rationale of containers in scientific computing.
To learn how to set up and use containers in the UIBK HPC environment, please go to the updated Singularity 3.8+ description.







Announcements

Singularity 2.4

A major revision Singularity 2.4 has been released and is being rolled out on our HPC systems (status 2017-11-23: 2.4.1 on LEO3 and LEO3E). The new release introduces substantial new functionality but should be able to run containers built with previous versions. Visit the Singularity Home Page to learn more about the new release.

Before upgrading your local copy of Singularity, you might want to cleanly uninstall your current version from /usr/local: Simply cd to the directory into which you extracted the old version (e.g. cd $HOME/src/singularity-x.y.z and then issue sudo make uninstall. After this, proceed with the new installation.

Please note:

  • Singularity 2.4 includes major changes (new container formats, new build command etc.), which are not reflected in this document yet. Feedback from our user community is welcome.
  • On Leo3, due to an old kernel, the
    uibk-helper mkscratch containername.img
    command does not work with the new sqfs format containers. Either use the traditional img format (not recommended) or create the /scratch directory while you are building your container image (recommended).
  • If you plan to use Singularity on MACH2, please contact the JKU system administration team.

Use Singularity 2.3.2 or later

Please note: Due to a change in Docker image metadata format, all Singularity versions up to and including version 2.3.1 fail to download most images from Docker Hub. A new minor version 2.3.2 has been released in Singularity Releases. You need 2.3.2 or later for all activities involving download from Docker Hub.

See also Singularity in Google Groups for announcements.

Persistent Containers: DON'T

Singularity 2.4 has added the possibility to start persistent containers using the instance subgroup of commands. Persistent containers interfere with the operation of the batch system and are NOT WELCOME on our server systems. So, please refrain from using Singularity instances.

User Defined Software Environments

Why Containers in Scientific Computing?

With growing user bases and increasing adoption of computer methods by sciences not traditionally associated with High Performance Computing, large HPC installations are facing an increasing complexity of software to be maintained with sometimes dozens of prerequisite libraries in various - often conflicting combinations. Besides the effort to maintain these software installations (even when using the environment modules system), there is an increasing gap between expectations of stability (meaning slow adoption of new software, libraries, and drivers) and innovation.

From a user perspective, starting to use a new system in a classical manner requires you to port all your software to the new system, including testing and optimization. Differing software and library versions raise questions of reproducibility of scientific results, contributing to a phenomenon known as the Replication Crisis.

Software containers address both problems. They can be used to bundle software with all needed prerequisites into an isolated environment, which can be executed on arbitrary (resonably recent) machines in a manner completely independent of locally installed software. Porting your software to a new machine then simply means copying your container to that new machine.

Among the many possible benefits and uses of containers in various applicatioin areas, the key aspect for scientific computing is the focus on providing a software tree fully controlled by the user, while still providing full access to other components of the system on which this software will run.

Singularity

Singularity is a runtime system that allows execution of user-defined software environments (containers, e.g. from Docker) in an HPC cluster with full integration into the cluster infrastructure (including access to HOME and SCRATCH directories, environment variables, network communication of MPI programs, and the batch environment).

The Docker runtime environment, while being easy to set up on users' workstations, is strongly geared towards PAAS deployments (Platform As A Service - common for web, database, and similar applications) and is unsuitable for HPC clusters. Docker requires a complex server and storage setup. The commands to be executed at container runtime are determined at container setup time, and there exists no straightforward integration into conventional load management systems such as SGE, PBS, or SLURM. Although Docker attempts to completely isolate containers against their runtime host, a user needs administrative privileges for even the least demanding uses of containers.

A decisive advantage of Docker is the availability of innumerable community provided containers, many of them prepopulated with complete installations of various software products. A successful alternative execution environment should leverage this community effort, and Singularity does.

In contrast to Docker, Singularity allows non-privileged users to run containers on HPC clusters as if they were normal executable programs. No administrative privileges are necessary to run a program in a Singularity container. Beyond installation and configuration of the Singularity software and providing enough space for users to store their containers in $SCRATCH, Singularity needs no central server or storage setup. Users can easily start multiple instances of the same container with arbitrary command lines, and access their data in $HOME and other directories, environment settings, etc. with no arbitrary restrictions.

Integration of Singularity into the batch environment is trivial because Singularity containers run like normal programs on the host. To run MPI programs, you use the Host's "mpirun" command to start multiple (possibly distributed) instances of the container. The only restriction is that inside and outside the container, the same version of OpenMPI is required.

Despite its different execution model, Singularity is integrated in the Docker infrastructure. Unmodified Docker containers may be pulled from the Docker Hub for direct execution on HPC hosts. Alternatively, users may pull containers to their local Linux workstation (or virtual machine) and modify the container according to their needs using a local Singularity (or Docker) installation. For changing container contents, root privileges are required, this is why these activities need a workstation under the user's control. The resulting container image may then be transferred to the HPC server and executed interactively or in batch jobs.

Technically, a Singularity container image is a flat file containing an image of the container's directory tree (which typically includes parts of a Linux OS installation, the user's software, static data, and libraries) and some metadata. When Singularity starts a container, it will loop-back mount the container image, replacing the host's root file system by the contents of the container, and start the program given in the "exec" statement or the shell as a normal user processes. Programs running in the container have access to the user's home directory (more directories can be specified in the configuration or command line), environment variables, the system's device files, and network connections, and so may make full use of the cluster infrastructure while still running in an isolated software environment.

The Singularity quick start guide is a good starting point for experimentation. Normally, you will want to install the latest release. Similar to the usage of dockerfiles, setup of singularity containers may (and - for production purposes - should) be automated using a bootstrap file.

Singularity 2.3.2 Workflow

Please note: The details of the workflow have been changed for version 2.4. For up-to-date inmformation, please visit the Singularity documentation and in particular the section on best practices - Singularity Flow.

Usage details and various workflows are described in the literature, see below.

A typical workflow may look like this:

  1. (One-time preparation, optional) Download and install Singularity on your workstation. See Singularity Home Page. This step is only needed if you plan to modify containers.
  2. On your workstation, create the container image
    singularity create container.img
  3. Import an existing container from Dockerhub to your image
    sudo singularity import container.img docker://ubuntu:latest
    Note: you may use containers based on any Linux release as long as the kernels used on your machine and the target machine are compatible with the release.
  4. Modify your image:
    sudo singularity shell -w container.img
    starts a shell and allows you to install software, copy files etc. All modifications to the container will be permanent.
  5. The preceding two steps may (and should) be automated by bootstrapping the container. A moderately complex example is given below.
  6. Copy the container image to your SCRATCH directory on the HPC cluster.
  7. On the HPC cluster, run commands inside the container
    singularity exec container.img command [arg ...]
    Since the container image is accessed read-only, you may run an arbitrary number of instances (working e.g. on different tasks) on individual or multiple cluster nodes. I/O should be done on your $SCRATCH directory.

Implementation on Leo3 and Leo3e (Version 2.3.2)

Please note: Singularity usage has been significantly changed with version 2.4. The following description is valid only for version 2.3.2.

Setup

To use Singularity, load the singularity module

module load singularity/2.3.2

After this, all Singularity commands that do not modify containers are available. To change containers, you need a personal computer running Linux (or a virtual Linux machine) with root access.

As of October 2017, we have Singularity version 2.3.2 installed on Leo3 and Leo3e. In November 2017, version 2.4 was added on Leo3e.

We recommend that you put your container images somewhere under your $SCRATCH directory. If you are on Leo3, please add the Scratch mount point to your container (see below) before beginning to use it. To start a shell in your container, issue

singularity shell containername.img

To run a program, use a command line similar to

singularity exec containername.img programname [arg ...]

This is the typical usage in the context of SGE batch jobs. Since container processes are children of the shell issuing the singularity command, they will run cleanly under SGE control and can be terminated with normal SGE means (qdel jobid).

The command

singularity help [command]

gives an overview and specific information on singularity commands.

Our setup includes automatic access to your $HOME and $SCRATCH directories.

Using $SCRATCH on Leo3

Containers will run out of the box on Leo3e. On Leo3, which has an older operating system, a one-time preparation is necessary before running the container for the first time: you need to manually create the mount point for the /scratch file system. If your container does not have this directory, a command

uibk-helper mkscratch containername.img

will create this mount point. Afterwards, the container can be used like on Leo3e.

Remarks

Current level of support: Test installation. Sequential or shared memory parallel containers should work fine. OpenMPI Infiniband integration has been tested on Leo3 and Leo3e for OpenMPI 1.10.2 and achieves bandwidths identical to programs running natively on the host (or even better).
We are looking for test users. User feedback is welcome.

Note: At the ZID UIBK HPC Workgroup, we have evaluated Singularity against various alternatives (among them Docker, LXC, Charliecloud, Shifter, and Udocker). Why Singularity is most suitable for running containers on HPC clusters, can be read in reference [2] below. At first sight, Charliecloud [4] looks like an even more lightweight alternative, but it relies on kernel features that are unavailable or unstable on our current Leo3(e) machines, and it depends on users having Docker set up on their workstations.

Case Study: Using MPI with Infiniband on the HPC clusters Leo3 and Leo3e

Please note: Singularity usage has been significantly changed with version 2.4. The following procedure is valid only for version 2.3.2.

Singularity containers have access to /dev files, so MPI programs running in containers can use the Infiniband transport layer of OpenMPI. To run MPI programs in Singularity containers, we need the following prerequisites:

  • OpenMPI installations in host and container must match
  • Infiniband utilities and drivers installed in the container

What follows are - as an example - steps to get the Ohio State University MPI benchmark to run on Leo3(e) in an Ubuntu container. Modify this workflow according to your needs. This is also an example for automating the setup of a slightly nontrivial container using the bootstrap method.

On your Linux workstation

  1. Download and install Singularity. See Singularity Home Page.
  2. Prepare the following boot file ubuntu-mpi.bootstrap
    Bootstrap: docker
    From: ubuntu:latest
    
    %runscript
    echo this is the runscript
    
    %setup
    # this section is executed in host environment.
    # container root can be accessed via $SINGULARITY_ROOTFS
    
    set -ex
    
    # in version 2.3, the %files section is executed AFTER the %post script,
    # so we need to copy data here. this is expected to be fixed in 2.4
    
    mkdir $SINGULARITY_ROOTFS/data
    
    # uncomment and modify according to needs
    # cp -rp /home/myuser/singularity/data/* $SINGULARITY_ROOTFS/data
    
    %files
    # empty because files are copied too late in singularity 2.3
    
    %environment
    
    %labels
    AUTHOR Michael.Fink@uibk.ac.at
    
    %post
    # this section is executed within the container
    # add all commands needed to setup software
    
    set -ex
    apt update
    apt -y upgrade
    
    ## install some general purpose packages and compiler environment
    apt -y install apt-utils apt-file vim less gcc g++ make wget openssh-client
    
    ## install openmpi and support utilities and drivers for infiniband
    ## as of July 2017, openmpi 1.10.2 is installed.
    ## since you will be using the host-side mpirun, you need the same version on the host
    apt -y install openmpi-bin libopenmpi-dev infiniband-diags ibverbs-utils libibverbs-dev
    apt -y install libcxgb3-1 libipathverbs1 libmlx4-1 libmlx5-1 libmthca1 libnes1
    
    # apt-file update
    
    ## install Ohio State University MPI benchmarks
    cd /data
    wget http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-5.3.2.tar.gz
    tar xf osu-micro-benchmarks-5.3.2.tar.gz
    cd osu-micro-benchmarks-5.3.2
    ./configure --prefix=/data CC=$(which mpicc) CXX=$(which mpicxx)
    make
    make install
    
    This bootstrap file will pull a basic Ubuntu image from the Docker hub and will install some utilities, the GNU compiler, MPI support, and the OSU Benchmark in the container. Note that the OS running on Leo3(e) is CentOS, but the container OS used in this example is Ubuntu.
  3. Create an empty container image.
    singularity create ubuntu-mpi.img
  4. Bootstrap the container image using the bootstrap file created in step 2. You need root to do this. Depending on your network connection, the bootstrap will take several minutes. Watch the output. If errors occur, remove the image, correct the bootstrap file, and start again at step 3.
    sudo singularity bootstrap ubuntu-mpi.img ubuntu-mpi.bootstrap
    Above bootstrap file was tested to work fine with Singularity 2.3.1 on Leo3e on 27 July 2017.
  5. After the bootstrap is successful, copy the container image to your Leo3(e) scratch directory
    scp ubuntu-mpi.img cXXXyyyy@leo3e:/scratch/cXXXyyyy/ubuntu-mpi.img

On Leo3 or Leo3e

  1. Log on to Leo3(e) and cd to your $SCRATCH directory.
  2. Issue module load singularity/2.3.2
  3. Check for the container image ls -l ubuntu-mpi.img
  4. On Leo3, issue uibk-helper mkscratch ubuntu-mpi.img
  5. Check local shared memory execution of the MPI benchmark. As of July 2017, Ubuntu comes with OpenMPI 10.0.2, so this is what we need to load in this test.
    module load openmpi/1.10.2 singularity/2.3.1
    mpirun -np 2 singularity exec ubuntu-mpi.img /data/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bw
    
    You should get bandwidths up to approx. 4 GB/s
  6. Prepare an SGE batch job
    #!/bin/bash
    #$ -N singtest
    #$ -pe openmpi-1perhost 2
    #$ -cwd
    
    module load openmpi/1.10.2 singularity/2.3.1
    mpirun -np 2 singularity exec ubuntu-mpi.img /data/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bw
    
    Note that the singularity commands are placed on the execution nodes using the host's mpirun command. In the output file, you should see bandwidths up to approx. 6 GB/s.

Sample output for OSU MPI benchmark

$ cat singtest.o592416
# OSU MPI Bandwidth Test v5.3.2
# Size      Bandwidth (MB/s)
1                       3.70
2                       7.28
4                      14.65
8                      28.99
[...]
262144               6197.92
524288               5913.92
1048576              6021.13
2097152              5964.13
4194304              6098.53

This should provide you with the information necessary to implement your own workflow.

Additional OpenMPI libraries can be readily installed upon request.

Literature

  1. Singularity Homepage
  2. Singularity: Scientific containers for mobility of compute.
    This exellent paper describes the rationale of Singularity in the context of user defined software environments and reproducibility of research, describes the pros and cons of various alternatives, and gives an introduction to its usage.
  3. Docker Homepage
  4. Charliecloud: Unprivileged containers for user-defined software stacks in HPC.
    Charliecloud reference paper.
Nach oben scrollen