Matlab

Description

Matlab is a high-level language and interactive environment for algorithm development, data visualization, data analysis, and numeric computation.

External documenation:

Usage

To start the Matlab GUI on our HPC systems, simply load the appropriate module (module load matlab/version) and then run matlab in your shell. To get a brief summary of all command line options, run matlab -nodisplay -help. Please refer to the official Matlab documentation or consult the Matlab helpdesk (issue the helpdesk command within your Matlab Command Window) for general usage information.

Note Matlab sometimes requires ludicrous amounts of virtual memory, so don't be surprised if your job gets killed when starting Matlab (especially the Matlab GUI).

You can help prevent this by doing the following:

  • To run your myprogram.m M-file, use
    matlab -nodisplay -nojvm -batch myprogram
    to avoid unnecessary overhead,
  • monitor a sample run on the login node using the top command, noting the VIRT (virtual memory) consumption,
  • specify sufficient virtual memory using the -l h_vmem submit option of SGE.

Many Matlab algorithms run in parallel by default using Matlab's multithreading capabilities. If you want to run Matlab in a sequential job, use the -singleCompThread option to prevent Matlab from grabbing all installed CPUs.

To run Matlab in parallel, learn below how to correctly run Matlab in as many threads as you requested in SGE's -pe openmp nproc command line option.

Managing Usage of Matlab Licenses

Note: As of middle 2020, the University of Innsbruck has an unlimited Matlab campus license, so compiling Matlab programs for license economy is currently not necessary. This may change again in the future.

Running a large number of Matlab processes in parallel uses one license per host, leading to shortages when available licenses are limited. The sections below describe how to work around these limitations.

The following sections will briefly describe how to reserve Matlab licenses for SGE batch jobs and how to compile a standalone application from your m-files

Reserving Matlab licenses for SGE batch jobs

When submitting a job to the SGE batch scheduler, please specify the Matlab license(s) for your applications by using the following SGE option(s):

-l lic_matlab=1 [-l lic_matlab_dct=#NODES]

You need one base license (lic_matlab) for each of your non-compiled jobs. In addition, you need one license per node (lic_matlab_dct - currently only one node is supported!) when using the Distributed/Parallel Computing Toolbox.

Note: Unfortunately this method is not utterly fault resilient. It reduces the risk that your jobs die due to unavailable licenses at runtime, though. The recommended method for non-interactive Matlab jobs is to use the Matlab Compiler to create a standalone application, which requires no licenses at runtime.

Note (PBS): There is no equivalent license integration within the PBS batch scheduler yet. Don't specify any license statements in your PBS job scripts.

Compiling a standalone application from your m-files

The main advantage in compiling standalone applications from your Matlab projects is that during runtime no Matlab licenses are required. I.e. once you have a compiled executable, your jobs will not fail due to license bottlenecks.

Perform the following steps on the login node of the cluster on which you intend to run your Matlab application. Once you have created the standalone executable, you can use it for as many joby as you wish. You need to re-compile your program only after making changes to it.

A sequential, non-interactive standalone executable without JVM functionality is generated from the main m-file my_project.m with

mcc -m -v -R '-nodisplay,-nojvm,-singleCompThread' my_project.m

This should create the executable my_project. You can then start your application simply by executing ./my_project extra_params within the current working directory.

Note: the mcc command also creates additional files readme.txt and run_my_project. These are for using your executable on machines where matlab is not installed and should be ignored / deleted. The command module load matlab/version sets all necessary environment variables for successful direct execution of your binary.

For a parallel, non-interactive standalone executable without JVM functionality omit the -singleCompThread option.
WARNING: When running parallel Matlab applications make sure that the number of parallel threads is well under your control. See the upcoming sections for more information.

Concurrent startup of multiple compiled Matlab instances

When a compiled Matlab program is started, certain runtime components are unpacked into your $HOME directory and removed after the program has finished. When multiple instances of such compiled Matlab jobs are executed simultaneously (e.g. in a job array), they may interfere with each other, causing seemingly random failures.

To redirect the unpacking directory to a unique, local location, modify your job script as follows:

for SGE (LEO clusters)
export MCR_CACHE_ROOT=/tmp/mcr-$USER-$JOB_ID-$SGE_TASK_ID-$RANDOM
./matlab_executable
rm -rf $MCR_CACHE_ROOT
for PBS (MACH)
export MCR_CACHE_ROOT=/tmp/mcr-$USER-$PBS_JOBID-$PBS_TASKNUM-$RANDOM
./matlab_executable
rm -rf $MCR_CACHE_ROOT

Setting MCR_CACHE_ROOT to a local directory also helps speed up the start of your compiled application compared to unpacking in $HOME, which resides on NFS mounted storage.

For more background information, please see http://undocumentedmatlab.com/blog/speeding-up-compiled-apps-startup (this has not been thoroughly tested at UIBK - feedback is welcome).

Matlab resource requirements

As Matlab requires large amounts of virtual memory and tries to use all installed CPUs unless you limit the degree of parallelism, you need to explicitly control the number of processors used by Matlab consistent with your job's allocation.

Evaluating your application

The following command (executed on the same node as the matlab command) might be helpful in evaluating your application:

ps -C MATLAB -o user,pid,vsz,pcpu,nlwp,cmd

The VSZ column shows the amount of virtual memory consumed by each MATLAB process, %CPU indicates the average cpu utilization, which must (collectively for all MATLAB processes of your job) never exceed 100 times the number of reserved slots/cores of your job. NLWP depicts the number of threads spawned by each MATLAB process.

For a compiled application replace MATLAB with the name of your executable:

ps -C your_executable -o user,pid,vsz,pcpu,nlwp,cmd

Matlab virtual memory requirements

The table below lists the minimum amount of virtual memory, which is required for a sequential Matlab session and for each additional parallel thread on our various HPC systems. Make sure to specify enough memory for your Matlab jobs (add at least a couple of hundred MegaBytes for Shell overhead, etc).

Session type / System LEO1LEO2 LEO3MACH
Required virtual memory [GigaByte]
Plain Matlab GUI (matlab) 1.2 1.2 4.1 1.1
Matlab without GUI (matlab -nodisplay) 1.0 1.1 2.4 1.0
Matlab without JVM (matlab -nojvm, implies -nodisplay) 0.6 0.6 0.8 0.6
Each additional parallel Matlab process (matlabpool <N>) 1.3 1.2 3.0 1.3


Note: The exact virtual memory requirement depends on a variety of factors, such as e.g. the Matlab version in use. The above table does only provide rough estimates.

Parallel/multithreaded Matlab use

If you explicitly use Matlab's parallel functionality or if you use any parallelized Matlab feature (including most matrix and vector operations and solvers) you need to control the number of threads started by Matlab, or else Matlab will use more than the resources you requested in your qsub command. This will affect performance for you and everyone using the system.

Enforcing sequential execution

If you are unsure about the parallelism of your Matlab code and you don't use parallel functionality explicitly, play it safe and enforce the sequential execution of your application.

Run your matlab script myprog.m with the -singleCompThread runtime option:

matlab -singleCompThread -nojvm -nodisplay -batch myprog

or compile your application with this runtime option. See above.

For simple tasks involving only small matrices, this may even improve performance.

Running Matlab in a parallel environment

When you want to use Matlab's multithreading capabilities to run your Matlab script myprog.m:

  • Include the following lines in your SGE job script:
    # [...]
    #$ -pe openmp nslots
    #$ -l h_vmem=memory_per_slot
    # [...]
    module load matlab/version
    matlab -nojvm -nodisplay -batch myprog
    Here nslots is the number of desired threads.
  • SGE sets the NSLOTS environment variable accordingly. To communicate the NSLOTS variable to the Matlab runtime environment, add the following lines to your Matlab script myprog.m:
    ncpus=str2num(getenv('NSLOTS'));
    maxNumCompThreads(ncpus);
    fprintf('Matlab set number of cpus to %i\n',ncpus);
Influencing the number of threads via environment variables

The following environment variables might also influence the parallel/multithreaded behavior of Matlab.

Environment variable Value Function
MKL_DYNAMIC FALSE Prohibits dynamic threads of MKL enabled functions
MKL_NUM_THREADS N Explicitly sets number of MKL threads to N
OMP_DYNAMIC FALSE Prohibits dynamic threads of OpenMP enabled functions
OMP_NUM_THREADS N Explicitly sets number of OpenMP threads to N
Nach oben scrollen