Using explicit platform descriptions to support programming of heterogeneous many-core systems
Highlights
► We define a platform description language (PDL) for heterogeneous many-core systems. ► We present a source-to-source transformation tool that utilizes PDL descriptions. ► Annotated code is transformed for different machine configurations.
Introduction
Recent developments in computer architecture have shown that heterogeneous many-core systems are a viable way to overcome major hardware design challenges related to energy-consumption and thermal constraints while offering high computational peak performance. Prominent examples for this trend are the IBM Cell B.E., general-purpose GPU-computing or acceleration via reconfigurable hardware [1].
Heterogeneous multi-core systems comprise different processing units specialized for specific computational tasks, e.g., GPUs for data-parallel or stream computations, and thus may achieve superior performance compared to homogeneous multi-core architectures. Although heterogeneous processing units may provide exceptional performance for many workloads, programming of such systems is a challenging task. Software developers are forced to deal with a diversity of different software libraries, runtime systems and programming models. As Chamberlain et al. [2] state, a complex mixture of different languages, compilers and runtime systems is inherent to many heterogeneous platforms. Even though first standardization approaches exist (e.g., OpenCL [3]), the diversity of underlying hardware designs and multitude of configuration options makes it very difficult for users, let alone automatic tools, to optimize applications and effectively distribute workloads among heterogeneous processing units [4], [5].
For efficient application development and compilation often detailed knowledge about the hardware configuration is required that usually exceeds the information exposed by programming models or platform layer. Commonly existing approaches employ an implicit and abstracted view on available processing resources (such as the OpenCL host-device model). Such implicit models, often based on control relationships between processing units, may not expose sufficient information that might be required to achieve efficient program execution on different classes of heterogeneous many-core systems.
In this paper, we propose explicit platform descriptors of heterogeneous many-core architectures and show how these can support high-level programming environments. To address the hierarchical aggregation of system components and the resulting need for efficient vertical data-management in modern heterogeneous systems, we developed a hierarchical machine model together with an XML-based platform description language (PDL) capable of expressing characteristics of a large class of current and future heterogeneous many-core systems. The PDL is intended to be used for making platform-specific information explicit to (1) expert programmers, and (2) to tools such as autotuners, compilers or runtime systems.
We demonstrate the usefulness of our approach with a source-to-source transformation framework in the context of a task-based programming model, where for major computational tasks different implementation variants tailored for different types of heterogeneous execution units are provided within the framework by expert programmers. The platform requirements of task implementation variants are made explicit by means of corresponding PDL descriptions. Non-expert or mainstream programmers use annotations to mark in the sequential source code functions as tasks and may specify references to PDL descriptors for supporting the selection of suitable implementation variants. Thus information provided by users can support compilers and runtime systems to optimize the mapping of computational tasks to processing units. Moreover, we provide data partitioning annotations for array parameters of tasks which are used by the transformation system to split up tasks operating on large arrays into many smaller tasks operating on corresponding sub-arrays.
We have developed a prototypical source-to-source compiler that supports our task annotations for C/C++. Our compiler takes as input an annotated serial task-based program and outputs, parametrized via PDL descriptors, code for a specific heterogeneous many-core target computing system. By varying the target PDL descriptor our compiler can generate code for different target architecture configurations without the need to modify the source program. Experiments with a commonly used scientific kernel and a financial application on different configurations of a state-of-the-art CPU/GPU system demonstrate the potential of our approach. In these experiments, our framework targets a flexible heterogeneous runtime system [6], which is capable of scheduling computational tasks to the different execution units of a heterogeneous many-core system. The runtime system takes care of data dependencies between tasks, and in case no data dependencies between a set of tasks exists, these tasks may be executed on different execution units in parallel. As a consequence, with this approach multiple levels of parallelism may be exploited; task parallelism, if two or more tasks are concurrently executed on different execution units, and data parallelism, if a task internally relies on a data parallel execution model, for example, CUDA or OpenCL.
This paper is structured as follows. Section 2 discusses the context and related work. Section 3 gives an overview of our approach and describes the platform description language. Section 4 introduces our programming framework for heterogeneous systems. Experimental results are presented in Section 5. Section 6 summarizes our approach and gives an outlook to future developments.
Section snippets
Related work
Programming heterogeneous systems is targeted by the OpenCL [3] and Nvidia Cuda [7] projects. Both programming models use a similar hierarchical platform abstraction involving a fixed control-relationship (host-device) to support portability between heterogeneous hardware elements. Our PDL can be seen as a generic approach to represent such platform patterns. It is not limited to a specific hierarchy of control-relationships that may only be well suited for specialized classes of heterogeneous
Platform description language
In this section we give an overview of our approach, describe the hierarchical machine model and corresponding XML-based platform description language (PDL).
Cascabel source-to-source translator
To show applicability of the proposed PDL for a task based programming model we have developed Cascabel, a prototypical code-generation system for offloading computational tasks to heterogeneous processing-units. The Cascabel programming model is based on sequential, high-level task based input programs with additional source-code annotations. Annotations indicate function invocations (computational tasks), suitable for execution on possibly heterogeneous processing units. We define tasks as
Experiments
To evaluate our framework, we investigate translations from task based sequential input programs to output programs tailored for parallel execution on a heterogeneous GPU-equipped system. Translation is based on the previously introduced source-code annotations in combination with different target platform descriptors. Each of the target platform descriptors expresses a different machine configuration for static task implementation variant selection and runtime-system configuration.
Summary and outlook
The shift to heterogeneous many-core systems requires rethinking of existing programming models and software development frameworks. The diversity of different software libraries, runtime systems and architectural features inherent to such platforms raises many challenges for application developers and tool-chains.
In this paper we have presented an XML-based platform description language (PDL) that enables to capture key architectural patterns of commonly used heterogeneous computing systems.
Acknowledgment
We would like to thank the INRIA Runtime Team for providing StarPU. This work received funding from the EU under grant agreement n248481 (PEPPHER Project, www.peppher.eu).
References (29)
- et al.
State-of-the-art in heterogeneous computing
Sci. Program.
(2010) - R.D. Chamberlain, M.A. Franklin, E.J. Tyson, J. Buhler, S. Gayen, P. Crowley, J.H. Buckley, Application development on...
- A. Munshi, The OpenCL specification, version 1.1, September...
- K. Komatsu, K. Sato, Y. Arai, K. Koyama, H. Takizawa, H. Kobayashi, Evaluating performance and portability of OpenCL...
- S. Rul, H. Vandierendonck, J. D’Haene, K. De Bosschere, An experimental study on performance portability of OpenCL...
- et al.
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
Euro-Par 2009 Parallel Process.
(2009) - et al.
Scalable parallel programming with CUDA
Queue
(2008) - K. Fatahalian, D.R. Horn, T.J. Knight, L. Leem, M. Houston, J.Y. Park, M. Erez, M. Ren, A. Aiken, W.J. Dally, et al.,...
- B. Alpern, L. Carter, J. Ferrante, Modeling parallel computers as memory hierarchies, in: Programming Models for...
- et al.
Hierarchical place trees: a portable abstraction for task parallelism and data movement
Lang. Compilers Parallel Comput.
(2010)
Elastic computing: a framework for transparent, portable, and adaptive multi-core heterogeneous computing
ACM SIGPLAN Notices
Ompss: a proposal for programming heterogeneous multi-core architectures
Parallel Process. Lett.
Cited by (25)
Programming languages for data-Intensive HPC applications: A systematic mapping study
2020, Parallel ComputingCitation Excerpt :For instance, the Summit [64] supercomputer (Ranked 1st in the current TOP500 list [65]) has 4608 nodes, and each node comprises two IBM Power9 22-core processors and six Nvidia Volta GPUs. While large-scale heterogeneous HPC systems provide high performance, there is a consensus that programming heterogeneous systems is not straightforward [32,60]. Parallelization of sequential legacy code as well as writing parallel programs from scratch is not easy and the difficulty of programming multi-core systems is also known as programmability wall [57].
PAPA: A parallel programming assistant powered by IBM Watson cognitive computing technology
2018, Journal of Computational ScienceCitation Excerpt :One of the main reasons that programmers struggle to keep up with the architecture trends is because programmers are traditionally trained to program sequential programs [5,6]. Furthermore, efficiently utilizing the available resources of modern parallel computing systems requires expert knowledge of the underlying architecture, programming models (that are often device specific), and a consideration of large amounts of device specific configuration parameters (such as, numbers of cores, core speed, memory hierarchy level, cache, run-time system) [7,8]. To alleviate the parallel programming difficulties, several high-level programming models and languages are proposed [9–11], including OpenMP, OpenACC, OpenCL, CUDA, MPI, and Intel TBB.
Using Cognitive Computing for Learning Parallel Programming: An IBM Watson Solution
2017, Procedia Computer ScienceMatching Program Implementations and Heterogeneous Computing Systems
2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Static mapping for OpenCL workloads in heterogeneous computer systems
2018, Journal of Theoretical and Applied Information Technology