Matlab

Description

Matlab is a high-level language and interactive environment for algorithm development, data visualization, data analysis, and numeric computation.

External documenation:

Usage

Starting the Matlab GUI on our HPC systems is as simple as loading the appropriate module (module load matlab/version) and then executing matlab in your shell. To get a brief summary of all command line options, execute matlab -help. Please refer to the official Matlab documentation or consult the Matlab helpdesk (issue the helpdesk command within your Matlab Command Window) for general usage information.

The following sections will briefly describe how to reserve Matlab licenses for SGE batch jobs and how to compile a standalone application from your m-files

Note Matlab sometimes requires ridiculous amounts of virtual memory, so don't be surprised if your job gets killed when starting Matlab (especially the Matlab GUI). You can prevent this problem by reserving enough memory for your job, see the table below listing virtual memory requirements for various platforms. It is also not easy to handle parallel/multithreaded Matlab functionality appropriately. Refer to section "Matlab resource requirements" for hints on how to handle those problems and for the resource requirements of Matlab jobs for our various HPC systems.

Reserving Matlab licenses for SGE batch jobs

When submitting a job to the SGE batch scheduler, please specify the Matlab license(s) for your applications by using the following SGE option(s):

-l lic_matlab=1 [-l lic_matlab_dct=#NODES]

You need one base license (lic_matlab) for each of your non-compiled jobs. In addition, you need one license per node (lic_matlab_dct - currently only one node is supported!) when using the Distributed/Parallel Computing Toolbox.

Note: Unfortunately this method is not utterly fault resilient. It reduces the risk that your jobs die due to unavailable licenses at runtime, though. The recommended method for non-interactive Matlab jobs is to use the Matlab Compiler to create a standalone application, which requires no licenses at runtime.

Note (PBS): There is no equivalent license integration within the PBS batch scheduler yet. Don't specify any license statements in your PBS job scripts.

Compiling a standalone application from your m-files

The main advantage in compiling standalone applications from your Matlab projects is that during runtime no Matlab licenses are required. I.e. once you have a compiled executable, your jobs will not fail due to license bottlenecks.

Perform the following steps on the login node of the cluster on which you intend to run your Matlab application. Once you have created the standalone executable, you can use it for as many joby as you wish. You need to re-compile your program only after making changes to it.

A sequential, non-interactive standalone executable without JVM functionality is generated from the main m-file my_project.m with

mcc -m -v -R '-nodisplay,-nojvm,-singleCompThread' my_project.m

This should create the executable my_project. You can then start your application simply by executing ./my_project extra_params within the current working directory.

Note: the mcc command also creates additional files readme.txt and run_my_project. These are for using your executable on machines where matlab is not installed and should be ignored / deleted. The command module load matlab/version sets all necessary environment variables for successful direct execution of your binary.

For a parallel, non-interactive standalone executable without JVM functionality omit the -singleCompThread option.
WARNING: When running parallel Matlab applications make sure that the number of parallel threads is well under your control. See the upcoming sections for more information.

Concurrent startup of multiple compiled Matlab instances

When a compiled Matlab program is started, certain runtime components are unpacked into your $HOME directory and removed after the program has finished. When multiple instances of such compiled Matlab jobs are executed simultaneously (e.g. in a job array), they may interfere with each other, causing seemingly random failures.

To redirect the unpacking directory to a unique, local location, modify your job script as follows:

for SGE (LEO clusters)
export MCR_CACHE_ROOT=/tmp/mcr-$USER-$JOB_ID-$SGE_TASK_ID-$RANDOM
./matlab_executable
rm -rf $MCR_CACHE_ROOT
for PBS (MACH)
export MCR_CACHE_ROOT=/tmp/mcr-$USER-$PBS_JOBID-$PBS_TASKNUM-$RANDOM
./matlab_executable
rm -rf $MCR_CACHE_ROOT

Setting MCR_CACHE_ROOT to a local directory also helps speed up the start of your compiled application compared to unpacking in $HOME, which resides on NFS mounted storage.

For more background information, please see http://undocumentedmatlab.com/blog/speeding-up-compiled-apps-startup (this has not been thoroughly tested at UIBK - feedback is welcome).

Matlab resource requirements

As Matlab requires massive amounts of virtual memory and often behaves unexpectedly with regard to the amount of parallelism, it is essential to ensure that your application integrates well with the batch system, respectively, to know and specify the adequate amount of resources required for your job.

Please make sure in advance that your program is well behaved, by examining thoroughly how many threads/processes your application starts, as well as what the overall resource consumption of your program will be.

Evaluating your application

The following command (executed on the same node as the matlab command) might be helpful in evaluating your application:

ps -C MATLAB -o user,pid,vsz,pcpu,nlwp,cmd

The VSZ column shows the amount of virtual memory consumed by each MATLAB process, %CPU indicates the average cpu utilization, which must (collectively for all MATLAB processes of your job) never exceed 100 times the number of reserved slots/cores of your job. NLWP depicts the number of threads spawned by each MATLAB process.

For a compiled application replace MATLAB with the name of your executable:

ps -C your_executable -o user,pid,vsz,pcpu,nlwp,cmd

Matlab virtual memory requirements

The table below lists the minimum amount of virtual memory, which is required for a sequential Matlab session and for each additional parallel thread on our various HPC systems. Make sure to specify enough memory for your Matlab jobs (add at least a couple of hundred MegaBytes for Shell overhead, etc).

Session type / System LEO1LEO2 LEO3MACH
Required virtual memory [GigaByte]
Plain Matlab GUI (matlab) 1.2 1.2 4.1 1.1
Matlab without GUI (matlab -nodisplay) 1.0 1.1 2.4 1.0
Matlab without JVM (matlab -nojvm, implies -nodisplay) 0.6 0.6 0.8 0.6
Each additional parallel Matlab process (matlabpool <N>) 1.3 1.2 3.0 1.3


Note: The exact virtual memory requirement depends on a variety of factors, such as e.g. the Matlab version in use. The above table does only provide rough estimates.

Parallel/multithreaded Matlab use

If you explicitly use Matlab's parallel functionality or if you use any parallelized Matlab feature at all (such as integrated e.g. within the single value decomposition), it is essential to control the number of threads your application will spawn. Otherwise, your application will interfere uncontrollably with itself or other jobs and performance will be affected negatively and unpredictably.

The following subsections depict some possibilities of how to get parallel Matlab programs under control.

Enforcing sequential execution

If you are unsure about the parallelism of your Matlab code and you don't use parallel functionality explicitly, play it safe and enforce the sequential execution of your application. This will result in more predictable output and runtime values as well as - in many situations - better performance.

Run your matlab command with the additional -singleCompThread runtime option, such as:

matlab -singleCompThread [-nojvm -nodisplay] -r 'my_matlab_command my_arg1 ...'

or compile your application with this runtime option explicitly set. See above.

Note: In some cases setting -singleCompThread is not sufficient for older versions of Matlab. Additionally try the methods provided within the following sections then.

Limiting the maximum number of threads explicitly

Although the use of the maxNumCompThreads function has been declared as deprecated by MathWorks, we still recommend the use of this functionality, as it has proved to be functional in several scenarios and with different (also newest) versions of Matlab.

Add the following lines to your main Matlab function, thereby controlling the maximum number of threads via the NSLOTS environment variable:

ncpu=str2num(getenv('NSLOTS'));
maxNumCompThreads(ncpu);
fprintf('Set number of cpus to %i\n',ncpu);


When using the SGE batch scheduler, the environment variable NSLOTS is usually appropriately set to the number of available cores/slots by the chosen parallel environment. Just make sure that you have requested the correct parallel environment in your job file:

...
# PBS resource requirement for OpenMP job with <X> threads
#$ -pe openmp <X>
...

where <X> is the number of desired threads.

With the PBS batch scheduler on mach, NSLOTS is not set automatically. If you are using a Matlab version prior to to R2014a, please set NSLOTS as described in the following job script fragment:

...

# Select statement for an OpenMP job with <X> threads
#PBS -l select=1:ncpus=<X>

# Set NSLOTS to $NCPUS
export NSLOTS=$NCPUS

...

Starting at R2014a, Matlab will correctly find out the number of processors of the CPU-Set that it is running in, so the maxNumCompThreads workaround is no longer necessary on Mach if you use the newest version of Matlab.

Influencing the number of threads via environment variables

The following environment variables might also influence the parallel/multithreaded behavior of Matlab.

Environment variable Value Function
MKL_DYNAMIC FALSE Prohibits dynamic threads of MKL enabled functions
MKL_NUM_THREADS N Explicitly sets number of MKL threads to N
OMP_DYNAMIC FALSE Prohibits dynamic threads of OpenMP enabled functions
OMP_NUM_THREADS N Explicitly sets number of OpenMP threads to N