Leo5: Known Problems and Configuration Changes

Known Problems

Unstable Nodes

As of 14 Feb, 2023, Leo5 worker nodes occasionally become unresponsive and need to be rebooted. Problem analysis is underway.
Preliminary results (17 Feb 2023) indicate that there seems to be a bug in the kernel (?) of our systems that gets triggered by the way we had Slurm configured to gracefully handle (minor) memory-over-consumption. As a work-around we disallow that completely for now (see below).

Intel OneAPI Compilers

The intel-oneapi-compilers modules contain two generations of compilers, Intel-Classic (based on the GNU compiler front end - declared deprecated by Intel), and Intel-OneAPI (based on an LLVM front end - in active development and currently supported by Intel).

In Leo5 acceptance tests - in particular in connection with OpenMPI and possibly Fortran (Netlib HPL) - there appeared to be signs of unexpected results with the OneAPI compilers. As far as we are informed, Intel are looking into this issue.

Problems were also reported by users. Some standard open source packages, such as Kerberos and OpenSSH, do not build with Spack using the OneAPI toolchain.

For the time being, we have removed all packages built with OneAPI from our default Spack-leo5-20230116 instance (spack/v0.19-leo5-20230116-release). For users interested in exploring Intel OneAPI, we are deploying these packages, using the latest available Intel compilers, in the spack/v0.20-leo5-20230124-develop instance.

Mpich and Intel MPI

The introduction of Slurm has made it much easier to support multiple MPI implementations, in particular those that often come with 3rd party software. Our OpenMPI integration with Slurm works as it should and can be used without technical limitations.

However, we currently have an issue with the Slurm/Cgroups integration of Mpich and Intel MPI, which causes all remote processes to be limited to CPU#0 when using the mpirun/mpiexec command. ~~We are looking into this problem - for the time being, jobs using Mpich or IntelMPI should be run only in single-node configurations.~~

Mpich and Intel MPI work fine if you place your processes with Slurm's srun --mpi=pmi2 command, so this is what we recommend. The option --mpi=pmi2 is necessary lest all tasks be started as Rank 0.

RPATH Works Only With OS Supplied Compilers

In LEO5 Intro - RPATH we describe that executables using libraries built with Spack will get a list of all requisite libraries in the RPATH attribute, so there is no need to module load the libraries at runtime. This effect is achieved by having module load set the $LD_RUN_PATH environment variable to a list of the directories containing these libraries at link time.

This mechanism currently works only when you use the OS-supplied compilers (gcc 8.5.0). When any of the compilers installed by Spack are used, the mechanism is effectively broken (overridden by an undocumented change to Spack since version 0.17).

As a temporary workaround, we recommend to do either of the following (pending detailed verification):

Either:
- When building your software with one of the Spack-supplied compilers, make note of the environment modules needed.
- Before running your programs, first load the modules as noted in step 1, then do
  export LD_LIBRARY_PATH=$LD_RUN_PATH. Do this if you do not want to re-build your software.
Or (recommended for every new build of your software):
- Add the option
  -Wl,-rpath=$LD_RUN_PATH
  to the commands by which your programs are linked, e.g. by defining
  ADD_RPATH = -Wl,-rpath=$(LD_RUN_PATH)
  in your Makefile and making sure that the link step in your Makefile contains
  $(CC) .... $(ADD_RPATH)
  or similar. This will add the contents of LD_RUN_PATH to the RPATH attribute of your executable, and there will be no need to set LD_LIBRARY_PATH at runtime.

This should fix the problem for the time being. The root cause of the problem is a deliberate change of behaviour by the Spack developers. Unfortunately, at the moment, there appears no simple way to restore the previous behaviour (which was consistent with documented behaviour of compilers) without necessary user interventions.

Hints

Hyperthreading

The Slurm option --threads-per-core as originally documented yields incorrect CPU-affinity for single-threaded tasks. Use --hint=multithread instead.

MPI and Python

Currently, the following Python modules have been tested for MPI (mpi4py):

~~Anaconda3/2023.03/python-3.10.10-anaconda+mpi-conda-2023.03~~ obsolete(*)
Anaconda3/2023.03/python-3.10.10-anaconda-mpi-conda-2023.03
Anaconda3/2023.03/python-3.11.0-numpy+mpi-conda-2023.03

These are using the MPICH libraries. To correctly map MPI ranks to individual processes in batch jobs, you need to use Slurm's srun command with the following options:

srun --mpi=pmi2 --export=ALL ./myprogram.py

Omitting the --mpi option will cause all processes to be run independently and have rank 0.

(*) Note: Module names containing a "+" character no longer work with newer versions of the Environment Modules software (e.g. as installed on LEO5). All affected module names have been duplicated with "+" replaced by "-".