System Leo3, Leo3e Mach VSC3
Operating System CentOS 6.3, CentOS 7.1 SuSE SLES 11.1 Scientific Linux 6.6
Architecture Infiniband Cluster
Leo3:
162 nodes (physical machines)
12 CPUs/node (2 sockets @ 6 cores)
1944 CPUs total
24 GB/node
2 GB/CPU
Leo3e:
45 nodes (physical machines)
20 CPUs/node (2 sockets @ 10 cores)
900 CPUs total
64 (nodes 44+45: 512) GB/node
3.2 (nodes 44+45: 25.6) GB/CPU
ccNUMA SMP
1 node = 256 vnodes (virtual nodes = processor sockets)
8 CPUs/vnode (8 cores per socket)
2048 CPUs total
8 GB/CPU
64 GB/vnode
16 TB total
Infiniband Cluster
approx. 2000 nodes
16 CPUs/node (2 sockets @ 8-cores)
32000 CPUs total
Standard nodes: 64 GB → 4 GB/CPU
Big nodes: 128 or 256 GB → 8/16 GB/CPU
Topology Leo3 (Leo3e): 7 (2) units with up to 24 nodes each. Blocking factor 1:2 between units.
Fat Tree. Batch Scheduler allocates vnodes on a best-fit basis. 8 islands, each consisting of up to 24 units with 12 nodes each (i.e. up to 2304 nodes total). Blocking factor 1:2 within islands, 1:4 between islands.
File System Architecture
$HOME
Home directory /home/group/user, shared, backup
$SCRATCH
Scratch directory /scratch/user, shared, no backup
$HOME
Home directory /home/group/user, backup
$SCRATCH
Scratch directory /scratch/group/user, no backup
$HOME
Home directory /home/project/user, shared, no backup
$SCRATCH
Local scratch directory /scratch, specific to each node, no backup, automatic deletion some time after job terminates
$GLOBAL
Global scratch directory /global/project/user, shared, no backup
Job Scheduler SGE PBS pro SLURM
Allocation Granularity 1 CPU.
Leo3: 1 node = 12 CPUs + 24GB GB memory
Leo3e: 1 node = 20 CPUs + 64GB (nodes 44+45: 512 GB) memory
Note that several jobs from different users may run on the same node, so this system is suitable for the entire range from small sequential programs, multithreaded (e.g. OpenMP) programs up to 12 (Leo3) resp. 20 (Leo3e) parallel threads, up to relatively large parallel MPI jobs using hundreds of CPUs. Please pecify realistic memory requirements to assist job placement.
Multiples of 1 vnode (= 8 CPUs, 64 GB memory each)
Mach is a special machine for large parallel jobs with high memory demands. If you need little memory or fewer than 8 CPUs per program run, please consider using another system.
Multiples of 1 node (= 16 CPUs + 64GB, 128GB or 256 GB memory each)
Each node is assigned to a job exclusively. Jobs can span many nodes up to very large parallel MPI jobs using many hundreds of CPUs. If your individual programs cannot profit from the minimum degree of parallelism (16 threads or tasks), please consider employing a job farming scheme (description by LRZ; please note that their conventions are different) or using a different system.
Job Submission
qsub scriptfile
qsub scriptfile
sbatch scriptfile
Query Jobs
qstat -u $USER
List all my jobs
qstat -j jobid
Detailed information on jobid
qstat [-wTx] -u $USER
List all my jobs [-w wide format, -T estimated start time, -x include finished jobs]
qstat [-x] -f jobid
Detailed information on jobid [-x if finished]
squeue -u $USER
List all my jobs
squeue -j jobid -o '%all'
scontrol [-dd] show job jobid
Detailed information on jobid
Cancel Jobs
qdel jobid
qdel jobid
scancel jobid
Format of batch script file: All systems permit supplying processing options as command line parameters of submit command (qsub/sbatch) or as directives in batch script file (recommended and described below).
Format of command line options: [ option [parameters] ] [...] multiple options in one command line
Format of directives: prefix option [parameters] separate line for each optioin
prefix depends on batch system: #$ (SGE - leo3, leo3e), #PBS (PBS pro - mach), #SBATCH (Slurm - vsc3)
Options are documented in man page of respective submit command (man qsub or man sbatch)
General Scheduler Directives
#!/bin/bash
#$ -N jobname (optional)
#$ -o outfile (default: jobname.ojobid)
#$ -e errfile (default: jobname.ejobid)
#$ -j yes|no  (join stderr to stdout)

#$ -cwd  (run job in current directory)
         (default: $HOME)
#!/bin/bash
#PBS -N jobname (optional)
#PBS -o outfile (default: jobid.o)
#PBS -e errfile (default: jobid.e)
#PBS -j yes|no  (join stderr to stdout)

# after last directive
cd $PBS_O_WORKDIR  (run job in current directory)
                   (default: $HOME)
#!/bin/bash
#SBATCH -J jobname   (optional)
#SBATCH -o outfile   (default: slurm-%j.out)
                     (stderr goes to stdout)


#SBATCH -D directory (run job in specified directory)
                     (default: current directory)

[-p partition]optional -p partition selects type of node by required memory (GB):

partition ::=  { mem_0064     (default) |
                 mem_0128 | mem0256 }
Notification Directives
#$ -M mail-address
#$ -m b|e|a|s|n  (begin|end|abort|suspend|none)
#PBS -m mail-address
#PBS -m b|e|a|n  (begin|end|abort|none)
#SBATCH --mail-type=(BEGIN|END|FAIL|REQUEUE|ALL)
#SBATCH --mail-user=user
Resource Directives Run time
#$ -l h_rt=[hh:mm:]ss




Tasks, Threads See Task Distribution below.




Per slot virtual memory size (bytes)
#$ -l h_vmem=size
Default: 2GB (Leo3) 1GB (Leo3e)

Per slot stack size limit (bytes)
#$ -l h_stack=size
Run time
#PBS -l walltime=[hh:mm:]ss
Tasks, Threads
#PBS -l select=ntask:ncpus=nthread
Request ntask times nthread CPUs.
See Task Distribution below.





Memory
#PBS -l select=ntask:mem=size{mb|gb}
Request size MB or GB of memory for each of ntask tasks.
Run time
#SBATCH -t mm|mm:ss|hh:mm:ss|days-hh[:mm[:ss]]
Nodes, Tasks, Threads
#SBATCH -N minnodes[-maxnodes]
Request number of nodes for job.
#SBATCH -n ntasks
Request resources for ntasks tasks.
#SBATCH -c nthreads
Request nthreads CPUs per task (default: 1)
Memory
#SBATCH -mem=MB
Request MB megabytes per node
#SBATCH -mem-per-cpu=MB
Request MB megabytes per CPU
Task Distribution
#$ -pe parallel-env nslots
Request a total of nslots CPUs for parallel environment.

parellel-env is one of:
openmpi-Xperhost
(Leo3: X={1|2|4|6|8|12})
(Leo3e: X={1|2|4|6|8|10|12|14|16|18|20})
openmpi-fillup
openmp
#PBS -l select=ntask:ncpus=1
For MPI jobs running ntask processes
#PBS -l select=1:ncpus=nthread
For OpenMP or other multithreaded jobs running nthread threads.
#SBATCH -m node-distribution-method[:socket-distribution-method]

where

node-distribution-method::= {block|cyclic|arbitrary|plane=options}
socket-distribution-method::= {block|cyclic}
#SBATCH -c cpus-per-task
Hybrid programming: number of threads per MPI task

For more options and details, see man sbatch

Interactive jobs
{qsh|qlogin} [ -pe parallel-env np ]
qsh starts an xterm-session,
qlogin starts an ssh-like interactive session.
qsub -I [ -l select=ntask:ncpus=nthread ]
starts an ssh-like interactive session

TBD

Remarks   Any non-directive (e.g. command) terminates processing of directives in script. options may be parameterized using macros
e.g. %j (job id), %u (user name)
Nach oben scrollen