How to Use the ITP HPC Clusters

    Login to the HPC Cluster
Change your password
Set up your environment
Using Sun Grid Engine

Login to the HPC Cluster

Depending on your account use the appropriate login node (also referred as master-node) of  one of the following systems:

Cluster Login node RSA key fingerprint
REGULUS regulus.uibk.ac.at 97:a2:1e:0d:d8:7e:2a:b7:44:1c:6a:19:7e:39:f5:b7
TEAZER teazer.uibk.ac.at cd:61:96:a2:42:01:1f:19:42:cf:76:06:b8:1e:b9:1d

The server teazer.uibk.ac.at is the login node to the High Performance Computing (HPC) cluster. The cluster can be contacted by slogin or ssh. Login to the HPC cluster with:

slogin regulus.uibk.ac.at -l <user-name>

For Windows users: See the ZID's Getting Started Tutorial to establish a connection within Windows.

Remote access

For security reasons, our LinuX cluster systems are only available from inside the University's domain IPs. If you want to access the systems from outside, you need to set up a VPN connection. See the ZID's instructions for setting up a VPN client for various operating systems.

Change your password

Change your password with the command yppasswd. After having typed your current password you have to input your new password twice.

$ yppasswd
Changing NIS account information for "user ID" on regulus.uibk.ac.at.
Please enter old password: <type old value here>
Changing NIS password for "user ID" on regulus.uibk.ac.at.
Please enter new password: <type new value>
Please retype new password: <re-type new value>

Changing NIS password has been changed on regulus.uibk.ac.at.

Set up your environment

The environment modules package provides a great way to easily customize your Linux environment (PATH, MANPATH, INCLUDE, LD_LIBRARY_PATH), especially on the fly. Using the modules environment allows you to set and unset cleanly your path and environment variables by loading or unloading an installed software package with a module file (module load module_file). By issuing the command module avail you get a list of the installed software, respectively the associated module files, available on the cluster, as e.g. the Intel Compiler.

Here you can find more detailed information about the modules environment and its usage.

Using Sun Grid Engine

The cluster's job scheduling is operated by the open-source Sun Grid Engine (SGE) system (version 6.1u4). As a user you might want to be familiar with the following commands: qsub, qstat, qdel and qhost, which are briefly described in the following. For more information consult the respecitve man pages or see the vendor documentation, especially the SGE's User's Guide.

Vendor documentation of SGE 6.1U4:

Submitting batch jobs (qsub)

The command qsub allows to submit jobs to the batch-system. qsub uses the following syntax:

qsub [options] scriptfile [script arguments]

where scriptfile represents the path to either a binary or a script containing the commands to be run by the job using a shell.

There are two ways to submit a job.

Method 1: (not recommended)

You may add the options directly after the qsub command, like:

qsub -q all.q -o output.dat -i input.dat -l swap_free=200M scriptfile

Method 2: (recommended)

Options can be written to a file (job description file). The file can be passed on to the command "qsub".

qsub job

The content of the file "job" may look like the following:

#$ -q all.q
#$ -o output.dat
#$ -i input.dat
#$ -l swap_free=200M

./scriptfile

Description of the most important options of qsub:

Input/Output
-i path standard input file
-o path standard output file
-e path standard error file
-j yes|no join standard error output to standard output (yes or no)
Notification
-M email-address notifications will be sent to this email address
-m b|e|a|s|n notifications on different events:
b ... begin, e ... end, a ... abort, s ... suspend, n ... no mail (default)
Do not forget to specify an email address (with -M) if you want to get these notifications.
Resources
-l h_rt=[hours:minutes:]seconds requested real time;
the default depends on the queue
-l mem_free=size request free memory of "size" bytes
-l swap_free=size request to use swap space with "size" bytes
-w v check whether the syntax of the job is okay (do not submit the job)
-hold_jid job-id start job only if the job with the job id "job-id" has finished
Other useful options
-N name name of the job
Parallel jobs / parallel environments
-pe parallel-environment process-number You have to specify a parallel environment and the number of processes on which your mpi application should run. With the parallel environments you are able to control the job distribution. The following parallel environments are available:

parallel environment description
openmpi-4perhost Each host gets four processes.
openmpi-8perhost Each host gets eight processes.
openmpi-16erhost Each host gets 16 processes.
openmpi-fillup, mpich-fillup The batchsystem fills up each host with processes to its host process limit.
openmpi-roundrobin, mpich-roundrobin The processes will be scheduled circularly according to the round robin method.
openmp This environment supports to work with OPENMP threads.

There are differences to consider not only between submitting sequential and parallel jobs, but also between the different supported parallel programming models. The following examples illustrate the different procedures:

Sequential batch jobs

qsub job-file

where the content of the file "job-file" looks like:
(if you just copy&paste this example please be aware of line breaks)

#!/bin/bash

# The job should be placed into the queue 'all.q'.
#$ -q all.q

# Redirect output stream to this file.
#$ -o output.dat

# Redirect error stream to this file.
#$ -e error.dat

# The batchsystem uses the current directory as working directory.
# Both files (output.dat and error.dat) will be placed in the current
# directory. The batchsystem assumes to find the executable in this directory.
#$ -cwd

# Send status information to this email address.
#$ -M Karl.Mustermann@uibk.ac..at

# Send an e-mail when the job is done.
#$ -m e

# This is the file to be executed.
./script.sh

Parallel (MPI) batch jobs

qsub job-file

where the content of the file "job-file" looks like:
(if you just copy&paste this example please be aware of line breaks)

#!/bin/bash

# The job should be placed into the queue 'par.q'.
#$ -q par.q

# Redirect output stream to this file.
#$ -o output.dat

# The batchsystem should use the current directory as working directory.
#$ -cwd

# Join the error stream to the output stream.
#$ -j yes

# This is my e-mail address for notifications. I want to receive all
# notifications on my official e-mail account.
#$ -M Karl.Mustermann@uibk.ac.at

# Send me an e-mail when the job has finished.
#$ -m e

# Use the parallel environment "mpich-2perhost", which assigns two processes
# to one host. If there are not enough machines to run the mpi job on
# 16 processors the batchsystem can also use fewer than 16 but the job should
# not run on fewer than 8 processors.
#$ -pe mpich-2perhost 8-16

./script.sh

MPICH implementation: (outdated)

When using MPICH the script file "script.sh" may look like this:

#!/bin/bash

mpirun -np $NSLOTS -machinefile $TMPDIR/machines mpi_exec

Assure that "script.sh" has execution rights (chmod +x script.sh).

OpenMPI implementation:

For OpenMPI the -machinefile option of mpirun must not be specified. Use a script file "script.sh" like the following:

#!/bin/bash

mpirun -np $NSLOTS mpi_exec

Assure that "script.sh" has execution rights (chmod +x script.sh).

Parallel (OpenMP) batch jobs

qsub job-file

where the content of the file "job-file" looks like:
(if you just copy&paste this example please be aware of line breaks)

#!/bin/bash

# The job should be placed into the queue 'par.q'.
#$ -q par.q

# Redirect output stream to this file.
#$ -o output.dat

# The batchsystem should use the current directory as working directory.
#$ -cwd

# Join the error stream to the output stream.
#$ -j yes

# This is my e-mail address for notifications. I want to receive all
# notifications on my official e-mail account.
#$ -M Karl.Mustermann@xxx.com

# Send me an e-mail when the job has finished.
#$ -m e

# Use the parallel environment "openmp".
#$ -pe openmp 4-8

export OMP_NUM_THREADS=$NSLOTS
./script.sh

The script file must have execution rights (chmod +x script.sh)

Note that it makes no sense to ask for more processes than cores are available on the largest machines in the cluster (max. 48 cores at the moment).

Observing a job (qstat)

qstat [options]

Options of qstat:

-u user Prints all jobs of a given user.
-j job-id Prints full job informations of the job with the given job-id. Here you can see the reason if your job is pending.
-f Prints all queues and jobs.
-help Prints all possible qstat options.

In case of pending jobs, you might also get some hints on why your job with the job identifier 'job-id' is still waiting in queue, by executing

qalter -w p job-id

Deleting a job (qdel)

qdel job-id

Delete a job with the job identifier "job-id".

Status information and resource limitations

Obtaining current host status (qhost)

To obtain current status information for the cluster execution hosts and their configuration parameters, execute qhost or for a more substantial representation:

qhost -F

Slot limitations (qquota)

Due to the limited resources the number of available slots per user has been restricted to 256 slots for power users and 160 slots for standard users. Execute

qquota

to see your actual resource consumption.

Note: Please contact the ITP cluster administration if you need more resources for the progress of an urgent project.

Submitting interactive jobs (qsh)

The submission of interactive jobs is useful in situations where a job requires some sort of direct intervention. This is usually the case for X-Windows applications or in situations in which further processing depends on your interpretation of immediate results. A typical example for both of these cases is a graphical debugging session.

The only supported method for interactive sessions on the Opteron cluster is currently to start an interactive X-Windows session via the SGE's qsh command. This will bring up an xterm from the executing node with the display directed either to the X-server indicated by your actual DISPLAY environment variable or as specified with the -display option. Try qsh -help for a list of allowable options to qsh. You can also force qsh to use the options specified in an optionfile with:

qsh -@ optionfile

A valid "optionfile" might contain the following lines:

#Name your job
-N my_name

#Export some of your current environment variables
-v var1[=val1],var2[=val2],...

#Use the current directory as working directory
-cwd

Note: Interactive jobs are not spooled if the necessary ressources are not available, so either your job is started immediately or you are notified to try again later.

Interactive sequential jobs

Start an interactive session for a sequential program by executing:

qsh -q all.q

Prepare your session as needed, e.g. by loading all necessary modules within the provided xterm and then simply start your sequential program on the executing node.

Interactive parallel jobs

For a parallel program execute

qsh -q par.q -pe parallel-environment number-of-processes

with the SGE's parallel environment of your choice (see the list of available parallel environments with qconf -spl) and the number of processes you plan to debug on, just as if submitting a parallel job with qsub to the SGE's parallel queue.

Start your parallel MPI program as depicted within the "script.sh" files for parallel MPI batch jobs above. For OpenMP jobs export the OMP_NUM_THREADS variable with export OMP_NUM_THREADS=$NSLOTS and start your job.

High priority interactive sessions

If your job is urgent and the necessary ressources are not available temporarily, submit your interactive session to the developers' queue with

qsh -q dev.q

Note that the developers' queue has a much stricter time limitation than the other available queues. Nevertheless, please make extra sure to end high priority sessions when no longer needed.

 

Adapted from http://www.uibk.ac.at/zid/systeme/hpc-systeme by courtesy of the ZID HPC Team.