High Performance Parallel and Distributed Computing:
Automated Tuning of HPC Algorithms and Applications

Project Leader: Thomas Fahringer
Institute of Computer Science - Distributed and Parallel Systems Group

The planned research in the field of High Performance Parallel und Distributed Computing is in Automated Tuning of Parallel and Distributed Applications, including

The projects will contribute to the DK+ by parallelizing and optimizing the numerical algorithms and scientific applications of the DK+ partners.

Within the  DK+ curriculum, participants will be trained in parallel programming - high performance computing - scientific computing on the basis of tools condensing most current research.

Parallel Computing:
Multi-Objective Optimization of Message Passing Programs

Message passing is presumably the most widely used programming model for parallel architectures with distributed and shared memory. It highly supports programm development on clusters of symmetric multiprocessor nodes (SMP) with multicore processors, connected via a high speed communication network. The low level of message passing primitives enables the user to implement any sort of parallel algorithm. In addition, by allowing for explicit control of data locality, the message passing model fosters the optimal utilization of modern memory hierarchies, in particular caches. On the other hand, it burdens the programmer with the full responsibility for writing efficient code, which is not an easy task. It is even more complex to tune an application for optimized execution on a specific target architecture.
The goal of this thesis to design and implement an optimizing compiler which, starting from a given message passing (MPI) program, both applies static (compile time) program transformations and adjusts parameters of the runtime environment, in order to optimize several non-functional parameters including timing information, memory requirements, energy consumption, costs, reliability, etc., for a specific target architecture. The novelty of this approach will be the use of supervised learning techniques to drive the selection of program transformations and runtime parameter settings based on training data collected offline.

Parallel Computing:
Instrumentation, Measurement and Analysis of Multiple Nonfunctional Parameters

Due to fundamental physical limitations and power constraints, we are witnessing a radical change in commodity microprocessor architecture to multi- and many-core design. From now on, every single computing device is based on multi-core processors. A major barrier for the wide-spread usage of parallel computing systems is the expense and complexity of creating, tuning, running, porting, and maintaining parallel applications. A prerequisite of understanding and optimizing codes for multi-core parallel systems is the availability of effective tools for instrumentation, measurement and analysis of several non-functional parameters including timing information, memory requirements, energy consumption, costs, reliability, etc. at the hardware, operating system, network and program level. There exists technologies and tools to measure individual non-functional parameters, however, there is hardly any tool which can provide multiple parameters. Furthermore, most tools lack the ability to explore these parameters for individual code regions or external computing and network load. The goal of this thesis is to design and implement a novel tool that can instrument, measure and analyse multiple non-functional parameters at various levels of abstraction. A particular novelty will be the ability to interpret the derived non-functional parameter data which can be customized thus any client or human user of this tool can be directed to those parts of a system or application which requires attention (for instance, for further optimization).

Parallel Computing:
Multi-parameter Scheduling for Grid and Cloud Applications

Scheduling applications in distributed environments such computational Grids and Cloud enviornments needs to consider a broad variety of important parameters in the optimisation process that have been largely ignored so far. Related work on scheduling almost exclusively concentrated on execution time minimisation, while more recent work tries to broaden this scope by addressing at maximum two parameters, usually execution time and reliability. Beyond execution time and reliability, there is a series of parameters of equal importance that need to be considered such as cost, resource utilisation, efficiency, power consumption, security, or fairness. Some of these parameters are oriented towards the user needs (execution time, reliability, real-time metrics), while some others represent the provider’s interest (resource utilisation, efficiency, power consumption, fairness). Most of these parameters are contradictory and cannot be simultaneously optimised. The solution to such a multi-criteria optimisation is usually a Pareto set containing a wide range of potentially good solutions (an n-1 dimensional hyperplane, where n is the number of objectives). The challenge of our research is to properly formulate the multi-criteria optimisation problem such that the Pareto set is reduced to a small set (one) of potential solutions. Since there are multiple independent actors in a distributed Grid and Cloud environments (multiple resource providers and schedulers), the scheduling has to take place as a distributed negotiation between the scheduler and multiple providers offering virtualised resources. We will research and compare various multi-criteria heuristics with respect to different criteria combinations such as weighted sum, compromise programming, epsilon-constraint, physical programming, and normal contraint methods.