SGE tutorial

Submitting jobs (qsub)

To use the compute nodes in our HPC clusters, you submit jobs to the batch job scheduling system SGE (formerly Sun Grid Engine, now its open source successor Son of Grid Engine). As a user you may want to become familiar with the following commands: qsub, qstat, qdel, qhost and qsh, which are briefly described in the following sections.

For more information see the respective man pages of these commands. Detailed information may be found at the Son of Grid Engine project page.

The command qsub is used to submit jobs to the batch-system. qsub uses the following syntax:

qsub [ -q std.q ] [options] job_script [ job_script_arguments ...]

where job_script represents the path to either a binary (in which case the qsub option -b y is required) or (preferably) a simple shell script containing the commands to be run on the remote cluster nodes.

The -q option specifies the name of the queue to be used. As of December 2016, std.q ist the default. You may list the available queues with qconf -sql.

The SGE will start your job on any nodes that have the necessary resources available or put your job in a queue until requested resources become available. SGE uses recent consumption of resources in deciding which queued jobs to start first ("fair share" scheduling). After submitting your job, you may continue to work or log out - job scheduling is completely independent of interactive work. The only way to stop a job after it has been submitted is to use the qdel command described below.

If you submit more than one job at the same time, you need to make sure that individual jobs (which may be executed simultaneously) do not interfere with each other, e.g. by writing to the same files.

Note that we currently only support Bash (/bin/bash). Your script may use common Bash functionality such as I/O redirection using the < and > characters, loops, case constructs etc., but please keep it simple. If your setup uses a different shell or needs a complex script, simply call your script from within the batch job script.

The options tell the qsub command how to behave: job name, where output is written, use of main memory and run time, parallelization method, etc.

There are two ways to supply options to the qsub command:

Method 1:

You may add the options directly to the qsub command line, like:

qsub -cwd -q std.q -o output.dat -e error.dat job_script [ argument ... ]

Method 2 (recommended):

Add the qsub options to the beginning of your job_script, one option per line. These are automatically added to qsub to the qsub command line

qsub job_script [ argument ... ]

Note that the lines prefixed with #$ are parsed by the qsub command, but are treated as comments by the shell.

Taking above example, the contents of job_script would look like:

#!/bin/bash

#$ -q std.q
#$ -cwd
#$ -o output.dat
#$ -e error.dat

./your_commands

Overview of commonly used options to qsub

Queue
-q queuename

Select queue.
Get a listing of queues with qconf -sql,
display queue parameters with qconf -sq queuename.
Currently the following queues are defined:

std.q
This is the default. General purpose queue. Default/maximum runtime: 240 hours.
short.q
For small test jobs. Limited number of CPU slots. Default/maximum runtime: 10 hours.
bigmem.q
Leo3e only. Jobs with high main memory requirements. Will run on the nodes equipped with 512GB of memory. Default/maximum runtime: 240 hours.
Job Name, Input, Output
-N name Name the job.
Default: File name of script.
The job name is also reflected in the default file names for standard output and standard error (see below).
-o opath Standard output will be appended to file opath.
If you plan to run multiple instances of a job at the same time, using this option is not recommended (interleaving and possible loss out output data).
Default: name.ojob_id.
Here, name is the job's name (see above), and the unique job_id is automatically created by the system for each job.
-e epath Standard error will be appended to epath.
Default: name.ejob_id.
name and job_id are the same as above.
-j yes|no Join standard error to standard output (yes or no)
-i path Standard input file
-cwd execute job in current working directory. If you omit this option, you job will execute in $HOME, which is usually a bad idea. Input/output file names are relative to this directory.
Notification
-M email address notifications will be sent to this email address
-m [b|e|a|s|n] notifications for any combination of the following events:
begin, end, abort, suspend, no mail (default)
Do not forget to specify an email address (with -M) if you want to get these notifications.
Resources
-l h_rt=[hours:minutes:]seconds requested real time (wallclock time from start to termination; the default (=maximum) depends on the system and, if applicable, the specified queue.
-l h_vmem=size[M|G] request a per slot memory limit of size bytes / megabytes / gigabytes.
I.e. the requested memory in total is size multiplied by the number of requested slots. See the description of parallel environments below.
-l h_stack=size[m|g] request a per slot stack size limit of size bytes / megabytes / gigabytes. This parameter is typically needed if your programs allocate large amounts of memory on the stack (e.g. large dynamically sized local variables in Fortran programs).
-hold_jid job-id start job only if the job with the job id job-id has finished
Parallel jobs / parallel environments
-pe parallel-environment number-of-slots

If you run parallelized programs (MPI or shared memory), you need to specify a parallel environment and the number of processes/threads (= SGE slots) on which your parallel (MPI/OpenMP) application should run. By selecting a parallel environment you can also control how jobs are distributed across nodes. For a list of available parallel environments on the system execute:

qconf -spl

If you omit the -pe option, SGE assumes that your job is sequential.

Please note: The -pe option only reserves CPU-cores for your job. You need to make sure that your program actually starts as many processes or threads as you requested.

The following types of parallel-environment are available:

openmpi-Xperhost Each host gets X processes (number-of-slots must be a multiple of X).
openmpi-fillup The batchsystem fills up each available host with processes to its host process limit.
openmp This environment should be chosen when working with threaded applications (e.g. OpenMP).
Job Arrays
-t 1-n Trivial parallelisation using a job array. Start n independent instances of your job (e.g. for extensive parameter studies). When the job is run, you use the environment variable $SGE_TASK_ID, which is set to a unique integer value from 1 .. n, to distinguish between the individual job instances (e.g. to initialize a random number generator, select an input file or compute parameter values).
Other useful options
-w v check whether the syntax of the job is okay (do not submit the job)


There are differences to consider between the various supported (parallel) programming models. The following examples illustrate the different procedures:

Sequential jobs

qsub job_script

where the contents of job_script may look like this:
(if you just copy&paste this example please be aware of line breaks and special characters)

#!/bin/bash

# Name your job. Unless you use the -o and -e options, output will
# go to a unique file name.ojob_id for each job.
#$ -N name

# Execute job in the queue "std.q" unless you have special requirements.
#$ -q std.q

# The SGE batch system uses the current directory as working directory.
# Both files (output.dat and error.dat) will be placed in the current
# directory. The batch system assumes to find the executable in this directory.
#$ -cwd

# Redirect output stream to this file.
#$ -o output.dat

# Redirect error stream to this file.
#$ -e error.dat

# Send status information to this email address.
#$ -M Karl.Mustermann@xxx.com

# Send an e-mail when the job is done.
#$ -m e

# For example an additional script file to be executed in the current
# working directory. In such a case assure that script.sh has 
# execute permission (chmod +x script.sh).
./script.sh

Parallel MPI jobs

qsub job_script

where the contents of job_scriptjob_script may look like this:
(if you just copy&paste this example please be aware of line breaks and special characters)

#!/bin/bash

# Execute job in the queue "std.q" unless you have special requirements.
#$ -q std.q

# The batch system should use the current directory as working directory.
#$ -cwd

# Name your job. Unless you use the -o and -e options, output will
# go to a unique file name.ojob_id for each job.
#$ -N name

# Redirect output stream to this file.
#$ -o output.dat

# Join the error stream to the output stream.
#$ -j yes

# Send status information to this email address. 
#$ -M Karl.Mustermann@xxx.com

# Send me an e-mail when the job has finished. 
#$ -m e

# Specify the amount of virtual memory given to each MPI process
# in the job.
#$ -l h_vmem=1G

# Use the parallel environment "openmpi-fillup", which assigns as many processes
# as available on each host. Start 16 MPI processes across an arbitrary number of
# hosts. For each process, SGE will reserve one CPU-core.
#$ -pe openmpi-fillup 16

## ALTERNATIVE
# Use the parallel environment "openmpi-fillup", which assigns as many processes
# as available on each host. If there are not enough machines to run the MPI job
# on the maximum of 16 cores, the batch system will gradually try to run the job
# on fewer cores, but not less than 8. 
##  #$ -pe openmpi-fillup 8-16

mpirun -np $NSLOTS ./your_mpi_executable [extra arguments]

Parallel OpenMP jobs

qsub job_script

where the contents of job_script may look like this:
(if you just copy&paste this example please be aware of line breaks and special characters)

#!/bin/bash

# Name your job. Unless you use the -o and -e options, output will
# go to a unique file name.ojob_id for each job.
#$ -N name

# Execute job in the queue "std.q" unless you have special requirements.
#$ -q std.q

# The batch system should use the current directory as working directory.
#$ -cwd

# Redirect output stream to this file.
#$ -o output.dat

# Join the error stream to the output stream.
#$ -j yes

# Send status information to this email address. 
#$ -M Karl.Mustermann@xxx.com

# Send me an e-mail when the job has finished. 
#$ -m e

# Use the parallel environment "openmp" with 8 job slots. Each requested
# job slot will be assigned one core on the execution host.
#$ -pe openmp 8

# Allocate 2 Gigabytes per job slot.
# The total memory available to your program
# (i.e. the UNIX "ulimit -v" value) will be the
# product of job slots from the -pe directive
# times the h_vmem requirement. For the present
# example, the job will get 16GB of virtual memory.
#$ -l h_vmem=2G

# tell OpenMP how many threads to start
export OMP_NUM_THREADS=$NSLOTS
./your_openmp_executable

Important: If your job uses shared memory parallelization other than OpenMP, you will still use the -pe openmp environment, but you need to ensure that the number of CPU-intensive threads is consistent with the number of slots assigned to the job ($NSLOTS). If you start more threads than you requested in the -pe directive, these may interfere with other users' processes, possibly degrading the overall efficiency of large parts of the system. Many parallel programs by default automatically discover the number of cores installed on the system and will start as many threads. You will need to find out how to override this behaviour (quite software-dependent).

Note that there is no use asking for more processes than are available on the largest machines in the cluster. This will result in SGE's failure to ever start the job.

Submitting interactive jobs (qsh)

The submission of interactive jobs is useful in situations where a job requires some sort of direct intervention. This is usually the case for X-Windows applications or in situations in which further processing depends on your interpretation of immediate results. A typical example for both of these cases is a graphical debugging session.

Note: Interactive sessions are particularly helpful for getting acquainted with the system or when building and testing new programs.

The only supported method for interactive sessions on the cluster is currently to start an interactive X-Windows session via the SGE's qsh command. This will bring up an xterm from the executing node with the display directed either to the X-server indicated by your actual DISPLAY environment variable or as specified with the -display option. Try qsh -help for a list of allowable options to qsh. You can also force qsh to use the options specified in an optionfile with qsh -@ optionfile. A valid optionfile might contain the following lines:

# Select queue
-q std.q

# Name your job
-N my_name

# Export some of your current environment variables
-v var1[=val1],var2[=val2],...

# Use the current directory as working directory
-cwd


Interactive jobs are not spooled if the necessary ressources are not available, so either your job is started immediately or you are notified to try again later. Also, interactive jobs will always fail if the implemented transient slot limits (see the section "Slot limitations" in the resource requirements and limitations tutorial for more information) are exceeded. In such cases, submit your interactive session with the option

qsh -now n [...]

Note: Make sure to end your interactive sessions as soon as they are no longer needed!

Interactive sequential jobs

Start an interactive session for a sequential program simply by executing

qsh

Prepare your session as needed, e.g. by loading all necessary modules within the provided xterm and then start your sequential program on the executing node.

Interactive parallel jobs

For a parallel program execute

qsh -pe parallel-environment number-of-slots

with the SGE's parallel environment of your choice (see the list of available parallel environments with qconf -spl) and the number of processes/threads you intend to use. This is not different from submitting a parallel job with qsub.
Start your parallel MPI program as depicted within the script.sh files for parallel MPI batch jobs above. For OpenMP jobs export the OMP_NUM_THREADS variable with export OMP_NUM_THREADS=$NSLOTS and start your job.

Monitoring jobs (qstat)

To get information about running or waiting jobs use

qstat [options]

To shorten the output of qstat execute either qstat|grep -v hqw to filter all pending jobs in hold state, or qstat -s r to display the running jobs only.

Other options of qstat:

-u user
Typical usage:
-u $USER
Print all jobs of a given user, print all my jobs..
-j job-id Prints full information of the job with the given job-id.
-f Prints all queues and jobs.
-help Prints all possible qstat options.


In case of pending jobs, you might also get some hints on why your job with the job identifier job-id is still waiting in queue, by executing

qalter -w p job-id

You can also verify a submitted job with

qalter -w v job-id

If the previous command delivers the following message, there's something wrong with the job and it will never be able to run:

verification: no suitable queues

Deleting a job (qdel)

To delete a job with the job identifier job-id, execute

qdel job-id

Nach oben scrollen