The ITP Compute Cluster REGULUS

    Actual work load  (ITP only)
Ganglia Cluster Report  (ITP only)
Nagios Monitoring (ITP only)
System Features
Use of the REGULUS Compute Cluster
User Accounts
Installed Software
Status of the REGULUS Compute Cluster
Contact

Adapted from the cluster pages at http://www.uibk.ac.at/zid/systeme/hpc-systeme by courtesy of the HPC Team of the Central Information Technology Services (ZID).

System Features

The REGULUS Compute Cluster consists of one master node (regulus.uibk.ac.at) and 48 compute nodes (20 Intel Eight-Core & 4 Intel Ten-Core Xeon machines, 8 Intel Sixteen-Core Xeon and 16 Intel Xeon 20th-core machines with a total of 2592 cores) and offers brutto 10 TB attached storage.

calleo-2280.jpg The master node is a transtec CALLEO Application Server 2280 with 2 Six-Core Intel Xeon  E5-2620v3 (2.4 GHz) processors (Haswell) and 64 GB DDR4 Reg. ECC RAM. The attached storage consists of 2 SATA-3 Samsung SSD disks (each 250 GB) configured as RAID1 for OS file systems and 6 SATA-3 disks (each 3 TB) configured as RAID6 for /home and /scratch file systems.

 The compute nodes are:

    • node01 - node08: Two Twin²-Serverbarebones 2U Novarion Quanton Twin² 2200W High-Performance Server system with 4 nodes, each with two 20th-Core Intel Xeon Gold 6230 (2.1 GHz) processors (Cascade Lake-SP) and 192 GB DDR4 2933 GHz Reg. ECC RAM. The attached 1 TB SSD SATA-3 disk provides the local swap and /tmp file system.
    • node09 - node28: 6 Twin²-Serverbarebones 2U transtec CALLEO High-Performance calleo-2880Server 2880 systems with 4 nodes, each with two Eight-Core Intel Xeon E5-2640v3 (2.6 GHz) processors (Haswell) and 64 GB DDR4 Reg. ECC RAM, except node17 - node20 and node25 - node28, which have 256 GB DDR4 Reg. ECC RAM. The attached 1 TB SATA-3 disk provides the local swap and /tmp file system.
    • node29 -  node32: One Twin²-Serverbarebone 2U transtec CALLEO High-Performance calleo-2880Server 2880 system with 4 nodes, each with two Ten-Core Intel Xeon E5-2640v4 (2.4 GHz) processors (Broadwell) and 256 GB DDR4 Reg. ECC RAM. The attached 1 TB SATA-3 disk provides the local swap and /tmp file system.
    • node33 - node40: Two Twin²-Serverbarebones 2U transtec CALLEO High-Performance calleo-2880Server 2880 system with 4 nodes, each with two Sixteen-Core Intel Xeon E5-2683v4 (2.1 GHz) processors (Broadwell) and 256 GB DDR4 Reg. ECC RAM. The attached 1 TB SATA-3 disk provides the local swap and /tmp file system.
    • node41 - node48: Two Twin²-Serverbarebones 2U Novarion Quanton Twin² 2200W High-Performance Server system with 4 nodes, each with two 20th-Core Intel Xeon Gold 6230 (2.1 GHz) processors (Cascade Lake-SP) and 192 GB DDR4 2933 GHz Reg. ECC RAM. The attached 240 GB SSD SATA-3 disk provides the local swap and /tmp file system.

The hardware was purchased from transtec AG and Novarion Systems GmbH.

Use of the ITP Opteron Cluster

How can I use the cluster?

To get a quick overview, see the Short Tutorial or have a look at the following specific topics:

Where can I store my data?

Three different types of storage are available on the cluster:

directory storage integration description
/home shared to all nodes
  • High quality RAID6 storage with 1.9 GByte quota limitations.
  • All data in the home directories will be backed up daily by the Tivoli Storage Manager (TSM).
/scratch shared to all nodes
  • This storage should be used for large input and/or output files you need for your applications.
  • A RAID6 configuration prevents data loss caused by a disc failure. Please note, that there are no backups of scratch. Users are responsible for safeguarding their data according to their needs.
  • The quota (hard) limitations are 47.7GB.
/tmp local on every node
  • This data area may be used for temporary files during job execution.
  • Limits: 914 GB nodes 01-29, 70 GB nodes 33-36, 143 GB nodes 30-32 and 37-40.
    At the moment there are no quota limitations.
  • The HPC Cluster administrators are allowed to delete files in this directory after every job run, in case there is not enough space left.
  • Note: Automatic file deleting mechanism by tmpwatch (240 hours atime)!

Please note, that the directories /home and /scratch are avaliable on every cluster node. These directories are shared, whereas the /tmp directory is local on every node.

If the above mentioned quota limitations are too strict for your needs, please contact the system administrator (system-admin[at]mcavity.uibk.ac.at). It is no problem to increase the limits if there are reasons to do so.

User Accounts

The user accounts on the cluster are managed via the Network Information Service (NIS). Ordinary users can run the command yppasswd to change their NIS password. It will prompt them for their old NIS password and then ask them for their new password twice, to ensure that the new password was typed correctly.

$ yppasswd
Changing NIS account information for "user ID" on regulus.uibk.ac.at.
Please enter old password: <type old value here>
Changing NIS password for "user ID" on regulus.uibk.ac.at.
Please enter new password: <type new value>
Please retype new password: <re-type new value>

Changing NIS password has been changed on regulus.uibk.ac.at.

If the old value does not match the current value stored for that user, or the two new values do not match each other, then the password will not be changed.

Back Up of Home Directories

The files in the "home" directories of the HPC Cluster are automatically backed up daily by centrally-scheduled ADSM/TSM tasks. Thus, if you have deleted files or directories in your home by mistake you can restore them yourself by running the command /usr/bin/dsm at the command line on the master node, where your home directory physically is located (see Restore deleted files/directories).

Note: Files in directories named tmp or cache will be NOT backed up!

For more information about ADSM/TSM, see Tivoli Storage Manager or ADSM/TSM-Server (FABS: HSM, Backup) and TSM V5: Sichern und Wiederherstellen von Daten at the ZID home page.

User Quota

In order to get a better grip on some of the more pesky users of the ITP Opteron Cluster we have activated quotas on the central file server regulus. To check your disk quota type quota at the command line. The quota(1) command displays the current disk usage along with your personal limits for disk space (blocks) and number of inodes (files). The soft limit (quota) can be temporarily exceeded (for a grace period of 7 days) whereas the hard limit (limit) is an absolute upper bound.

If you run out your quota you might first choose to tar and gzip directories. This is convenient as the kfm-window manager allows you to view and manipulate tgz files just like ordinary directories. Other simple strategies are:

  • remove unused dvi, aux, log files;
  • clear your firefox cache;
  • avoid keeping many huge matlab data files if possible.

Note: If your (hard) quota is exceeded and/or you do not have any grace time for your soft quota, your submitted jobs will be aborted (due to write errors)!

Installed Software

  1. Operating System
    The cluster is running CentOS 6.8, which is a compatible rebuild of Redhat-Enterprise Linux, on the master node and  on the compute nodes.
  2. Job-Management-System
    Submit your jobs via Son of Grid Engine 8.1.8.
  3. Software environment
    Set up your software environment by using Modules environment 3.2.10.
  4. Software packages
    Have a look at the available packages like compilers, parallel environments, numerical libraries, scientific applications, etc.

Contact

If you need additional information about the ITP Opteron Cluster or an account on it or if you have problems with applications, please contact your system administrators:

E-mail address: system-admin[at]mcavity.uibk.ac.at
Phone number: 52212
Nach oben scrollen