LEO5: Introduction and Overview

LEO5 Project

1 Feb 2023: Start of Friendly User Test Operation
1 May 2023: Start of regular operation.

LEO5 Key Features

The system consists of 62 compute nodes with two 32-core Intel Xeon 8358 (Icelake) sockets each (i.e. 64 cores/node). Nodes are homogeneously networked with a high performance Infiniband (IB HDR100) interconnect.

LEO5 Hardware

Node Type	# Nodes	Cores	System Memory	GPUs	GPU Memory per GPU
Login	1	64	2 TB	2 NVidia A30	24GB
Standard	28	64	256 GB	-	-
GPU A30	21	64	256 GB	2 x A30	24GB
GPU A40	2	64	256 GB	2 x A40	40GB
GPU A100	6	64	256 GB	1 x A100	40GB
Big Memory	6	64	2 TB	-	-
Total	64	4096	28.8 TB	54 (44 × A30 + 4 x A40 + 6 x A100)	-

$SCRATCH net storage capacity: 1.8 PB (104 TB SSDs) - file system: IBM Storage Scale 5.1.5
Total performance (nominal): CPU: 235 TFlop/s (FP64)
GPU: 300 TFlop/s (FP64), 740 TFlop/s (FP32), 10 PFlop/s (GPU, FP16 Tensor).

This configuration includes two extensions, one by a financial contribution by the Research Area Scientific Computing, and a dedicated extension (16 nodes, 320 TB storage), which was added upon request by University top management and was financed by investment funds of a few research groups. Full integration of otherwise dedicated resources into shared infrastructure allows the general community to profit from these investments in times of low usage.

LEO5 GPUs

GPU Type	SDK Version	Capability
A30	>= 11.0	8.0
A40	>= 11.1	8.6
A100	>= 11.0	8.0

LEO5 Software

Compared to our previous LEO systems, the software setup has been modernized to reflect current standards:

The Linux distribution Rocky Linux 8.6 is a clone of RHEL 8.6. Rocky receives substantial support by major players in the industry and is fully compatible with all third party software used in our installation.
We continue using the high performance parallel file system Spectrum Scale (formerly GPFS), which is in use by many TOP 500 installations, including LRZ and VSC.
We have replaced the obsolete SGE load manager by Slurm, which is now standard in most major HPC installations worldwide. In particular, our Slurm configuration natively supports hybrid multithread/MPI workloads, has substantially improved memory management, and gives users the option to employ hardware hyperthreads for suitable workloads.
Deployment of most of the standard software has been automated using the Spack HPC software package manager. Spack is being used by many major HPC sites and will allow us to satisfy many basic software installation requests quickly.
For software involving Python and R, we continue using Conda.
Some software will still be installed manually. Due to reduced workforce, deployment of some domain-specific software products may need to be delegated to institutes.

Notes on Software Installed by Spack

Software is still accessed via Environment Modules.
In prior LEO systems, variants of libraries are implicitly loaded depending on which compiler toolchain (e.g. Gnu, Intel,...) has been loaded. This implicit dependency no longer exists. All module names explicitly contain the compiler with which they were built.
Format of module names:
softwarename/softwareversion-compiler-compilerversion-hash
Example: zlib/1.2.13-gcc-8.5.0-xlt7jpk
All module names begin with the name of the software followed by a slash and the software version, the compiler toolchain, and end with a seven-character hash. The hash allows to distinguish between variants not evident from the module name. Some module names may contain further software dependencies (e.g. Python, MPI).
To discover with which specifications a module was built, enter module help modulename
To compare the specifications of two given modules, first add the UIBK utilities to your PATH (module load uibk-util), then issue moddiff [-s] mod1 mod2 . (Note: if moddiff is run with just one argument, the specifications are listed line by line).

At runtime, you need to load only modules that make shell level commands available. Library dependencies are coded into executables using the RPATH attribute. With our UIBK Spack installation, we also set the CPATH and LD_RUN_PATH variables. So your executables built with compilers and libraries that have been built with Spack will also contain RPATH entries which uniquely identify libraries at runtime.
Please Note: Unfortunately, this mechanism currently works only when using the OS-supplied compilers. For a detailed description and a temporary workaround, please have a look at the entry in the Known-Problems section.
In keeping with the Spack recommendations we no longer set the environment variable LD_LIBRARY_PATH because this would also affect the behaviour of system utilities. Instead, the variable LIBRARY_PATH is set, which contains the search path for -lxxx arguments at link time. However, not all compiler front ends honor LIBRARY_PATH, so it may be necessary in some cases to export LD_LIBRARY_PATH=$LIBRARY_PATH or use the option -L${LIBRARY_PATH} when linking your programs.
New Spack versions are released approximately twice per year, supporting new versions of existing software and new functionality. We will make these versions available to users by installing new Spack release-instances. We try to keep these reasonably complete. After login, your MODULEPATH will always refer to the most recent Spack release-instance.
As the need arises due to requests for individual software versions more recent than provided by the stable Spack releases, we will install additional instances of the develop-versions of Spack.
All available Spack instances may be listed by issuing
module avail spack
To access any given Spack instance or directly use Spack shell commands and integration, issue
module load spack/version
Then issue module avail to obtain an overview of software installed in that spack instance.
All information about the experimental Spack installation introduced in 2018 is now obsolete. We plan to delete these old versions from the system in the near future.

Notes on Migrating from SGE to Slurm

Both SGE and Slurm are job schedulers but have a very different set of options on the command line and in job-scripts. So you need to convert your existing SGE scripts to SLURM - documentation to help you with the transition is in preparation.
On our clusters we are providing a script to help you convert SGE job-scripts to their Slurm counterpart:
/usr/site/hpc/bin/sge2slurm.py
Please check the output whether it reflects your intentions before submitting your modified job script.
We are planning to offer short introductory workshops about migrating from SGE to Slurm. Details and proposed dates for appointments will follow.

For details, see the Slurm tutorial.

How to Access LEO5

All existing LEO users will have access to LEO5. Connect to LEO5 via
ssh cXXXyyyy@leo5.uibk.ac.at

What Happens to the Existing Clusters

LEO3e and LEO4 operation will continue. Central components are covered by maintenance contracts until March 2024 (LEO3e) and November 2024 (LEO4).
LEO3 has been dismantled. Part of its hardware will replace the LCC2 teaching cluster (LCC3).
MACH2 is past End Of Life as of March 31, 2023. Jobs with up to 2TB/3TB local memory can be run on LEO5 (5 nodes with 2TB) / LEO4 (1 node with 3TB). Currently, no direct replacement is in sight for larger jobs. The ZID and the Research Area Scientific Computing have contacted currently active users about their needs and possible mitigations in the near future.
Currently, there appears to be no urgent need for a machine with more than 3TB single-node memory. Should the need arise in the future, interested users are encouraged to contact the Research Area Scientific Computing at their earliest convenience.

We are looking forward to another chapter of successful service and cooperation with our HPC community. Please to not hesitate to contact us if you need advice or support using our systems.