Back to LEO5 Home Page


LEO5 Tentative Schedule

1 Feb 2023

Start of Friendly User Test Operation

  • Documentation is in progress.
  • System operation may be interrupted at any time, possibly without prior notice. Please do not depend on stable operation of LEO5 in this phase.
  • Software will only gradually become available (as team resources permit) or undergo changes.
  • Your friendly feedback helps us discover errors and improve the configuration.
1 May 2023

Start of regular operation.

LEO5 Key Features

The system consists of 62 compute nodes with two 32-core Intel Xeon 8358 (Icelake) sockets each (i.e. 64 cores/node). Nodes are homogeneously networked with a high performance Infiniband (IB HDR100) interconnect.

LEO5 Hardware

Node Type # Nodes Cores/Node memory/Node GPUs
Login 1 64 2 TB GB 2 NVidia A30
Standard 33 64 256 GB -
GPU A30 22 64 256 GB 2 x A30
GPU A40 2 64 256 GB 2 x A40
Big Memory 5 64 2 TB -
Total 63 3968 26.8 TB 46 × A30 + 4 x A40
  • $SCRATCH net storage capacity: 2.1 PB (120 TB SSDs) - file system: IBM Spectrum Scale 5.1.4
  • Total performance (nominal): 200 TFlop/s (CPU, FP64), 600 TFlop/s (GPU, FP32), 220 TFlop/s (GPU, FP64).

This configuration includes two extensions, one by a financial contribution by the Research Area Scientific Computing, and a dedicated extension (16 nodes, 320 TB storage), which was added upon request by University top management and was financed by investment funds of a few research groups. Full integration of otherwise dedicated resources into shared infrastructure allows the general community to profit from these investments in times of low usage.

LEO5 Software

Compared to our previous LEO systems, the software setup has been modernized to reflect current standards:

  • The Linux distribution Rocky Linux 8.6 is a clone of RHEL 8.6. Rocky receives substantial support by major players in the industry and is fully compatible with all third party software used in our installation.
  • We continue using the high performance parallel file system Spectrum Scale (formerly GPFS), which is in use by many TOP 500 installations, including LRZ and VSC.
  • We have replaced the obsolete SGE load manager by Slurm, which is now standard in most major HPC installations worldwide. In particular, our Slurm configuration natively supports hybrid multithread/MPI workloads, has substantially improved memory management, and gives users the option to employ hardware hyperthreads for suitable workloads.
  • Deployment of most of the standard software has been automated using the Spack HPC software package manager. Spack is being used by many major HPC sites and will allow us to satisfy many basic software installation requests quickly.
  • For software involving Python and R, we continue using Conda.
  • Some software will still be installed manually. Due to reduced workforce, deployment of some domain-specific software products may need to be delegated to institutes.

Notes on Software Installed by Spack

  • Software is still accessed via Environment Modules.
  • In prior LEO systems, variants of libraries are implicitly loaded depending on which compiler toolchain (e.g. Gnu, Intel,...) has been loaded. This implicit dependency no longer exists. All module names explicitly contain the compiler with which they were built.
  • Format of module names:
    softwarename/softwareversion-compiler-compilerversion-hash
    Example: zlib/1.2.13-gcc-8.5.0-xlt7jpk
    All module names begin with the name of the software followed by a slash and the software version, the compiler toolchain, and end with a seven-character hash. The hash allows to distinguish between variants not evident from the module name. Some module names may contain further software dependencies (e.g. Python, MPI). To discover with which specifications a module was built, enter module help modulename
  • At runtime, you need to load only modules that make shell level commands available. Library dependencies are coded into executables using the RPATH attribute. With our UIBK Spack installation, we also set the LD_RUN_PATH variable. So your executables built with compilers and libraries built with Spack will also contain RPATH entries which uniquely identify libraries at runtime.
  • New Spack versions are released approximately once per year, supporting new versions of existing software and new functionality. We will make these versions available to users by installing new Spack release-instances. We try to keep these reasonably complete. After login, your MODULEPATH will always refer to the newest Spack release-instance.
  • As the need arises due to requests for individual software versions more recent than provided by the stable Spack releases, we will install additional instances of the develop-versions of Spack.
  • All available Spack instances may be listed by issuing
    module avail spack
    To access any given Spack instance or directly use Spack shell commands and integration, issue
    module load spack/version
    Then issue module avail to obtain an overview of software installed in that spack instance.
  • All information about the experimental Spack installation introduced in 2018 is now obsolete. We plan to delete these old versions from the system in the near future.

Notes on Migrating from SGE to Slurm

  • Both SGE and Slurm are job schedulers but have a very different set of options on the command line and in job-scripts. So you need to convert your existing SGE scripts to SLURM - documentation to help you with the transition is in preparation.
  • On our clusters we are providing a script to help you convert SGE job-scripts to their Slurm counterpart:
    /usr/site/hpc/bin/sge2slurm.py
    Please check the output whether it reflects your intentions before submitting your modified job script.
  • We are planning to offer short introductory workshops about migrating from SGE to Slurm. Details and proposed dates for appointments will follow.

For details, see the Slurm tutorial.

How to Access LEO5

  • All existing LEO users will have access to LEO5. Connect to LEO5 via
      ssh cXXXyyyy@leo5.uibk.ac.at

What Happens to the Existing Clusters

  • LEO3e and LEO4 operation will continue. Central components are covered by maintenance contracts until March 2024 (LEO3e) and November 2024 (LEO4).
  • LEO3 has been dismantled. Part of its hardware will replace the LCC2 teaching cluster (LCC3).
  • End of Life of MACH2 is pending - planned last day of access is scheduled for March 31, 2023. Jobs with up to 2TB/3TB local memory can be run on LEO5 (5 nodes with 2TB) / LEO4 (1 node with 3TB). Currently, no direct replacement is in sight for larger jobs. The ZID and the Research Area Scientific Computing will contact currently active users about their needs and possible mitigations in the near future.

We are looking forward to another chapter of successful service and cooperation with our HPC community. Please to not hesitate to contact us if you need advice or support using our systems.

Nach oben scrollen