Elsevier

Parallel Computing

Volume 38, Issues 1–2, January–February 2012, Pages 52-65
Parallel Computing

Using explicit platform descriptions to support programming of heterogeneous many-core systems

https://doi.org/10.1016/j.parco.2011.10.008Get rights and content

Abstract

Heterogeneous many-core systems constitute a viable approach for coping with power constraints in modern computer architectures and can now be found across the whole computing landscape ranging from mobile devices, to desktop systems and servers, all the way to high-end supercomputers and large-scale data centers. While these systems promise to offer superior performance-power ratios, programming heterogeneous many-core architectures efficiently has been shown to be notoriously difficult. Programmers typically are forced to take into account a plethora of low-level architectural details and usually have to resort to a combination of different programming models within a single application. In this paper we propose a platform description language (PDL) that enables to capture key architectural patterns of commonly used heterogeneous computing systems. PDL architecture descriptions support both programmers and toolchains by providing platform-specific information in a well-defined and explicit manner. We have developed a prototype source-to-source compilation framework that utilizes PDL descriptors to transform sequential task-based programs with source code annotations into a form that is convenient for execution on heterogeneous many-core systems. Our framework relies on a component-based approach that accommodates for different implementation variants of tasks, customized for different parts of a heterogeneous platform, and utilizes an advanced runtime system for exploiting parallelism through dynamic task scheduling. We show various usage scenarios of our PDL and demonstrate the effectiveness of our framework for a commonly used scientific kernel and a financial application on different configurations of a state-of-the-art CPU/GPU system.

Highlights

► We define a platform description language (PDL) for heterogeneous many-core systems. ► We present a source-to-source transformation tool that utilizes PDL descriptions. ► Annotated code is transformed for different machine configurations.

Introduction

Recent developments in computer architecture have shown that heterogeneous many-core systems are a viable way to overcome major hardware design challenges related to energy-consumption and thermal constraints while offering high computational peak performance. Prominent examples for this trend are the IBM Cell B.E., general-purpose GPU-computing or acceleration via reconfigurable hardware [1].

Heterogeneous multi-core systems comprise different processing units specialized for specific computational tasks, e.g., GPUs for data-parallel or stream computations, and thus may achieve superior performance compared to homogeneous multi-core architectures. Although heterogeneous processing units may provide exceptional performance for many workloads, programming of such systems is a challenging task. Software developers are forced to deal with a diversity of different software libraries, runtime systems and programming models. As Chamberlain et al. [2] state, a complex mixture of different languages, compilers and runtime systems is inherent to many heterogeneous platforms. Even though first standardization approaches exist (e.g., OpenCL [3]), the diversity of underlying hardware designs and multitude of configuration options makes it very difficult for users, let alone automatic tools, to optimize applications and effectively distribute workloads among heterogeneous processing units [4], [5].

For efficient application development and compilation often detailed knowledge about the hardware configuration is required that usually exceeds the information exposed by programming models or platform layer. Commonly existing approaches employ an implicit and abstracted view on available processing resources (such as the OpenCL host-device model). Such implicit models, often based on control relationships between processing units, may not expose sufficient information that might be required to achieve efficient program execution on different classes of heterogeneous many-core systems.

In this paper, we propose explicit platform descriptors of heterogeneous many-core architectures and show how these can support high-level programming environments. To address the hierarchical aggregation of system components and the resulting need for efficient vertical data-management in modern heterogeneous systems, we developed a hierarchical machine model together with an XML-based platform description language (PDL) capable of expressing characteristics of a large class of current and future heterogeneous many-core systems. The PDL is intended to be used for making platform-specific information explicit to (1) expert programmers, and (2) to tools such as autotuners, compilers or runtime systems.

We demonstrate the usefulness of our approach with a source-to-source transformation framework in the context of a task-based programming model, where for major computational tasks different implementation variants tailored for different types of heterogeneous execution units are provided within the framework by expert programmers. The platform requirements of task implementation variants are made explicit by means of corresponding PDL descriptions. Non-expert or mainstream programmers use annotations to mark in the sequential source code functions as tasks and may specify references to PDL descriptors for supporting the selection of suitable implementation variants. Thus information provided by users can support compilers and runtime systems to optimize the mapping of computational tasks to processing units. Moreover, we provide data partitioning annotations for array parameters of tasks which are used by the transformation system to split up tasks operating on large arrays into many smaller tasks operating on corresponding sub-arrays.

We have developed a prototypical source-to-source compiler that supports our task annotations for C/C++. Our compiler takes as input an annotated serial task-based program and outputs, parametrized via PDL descriptors, code for a specific heterogeneous many-core target computing system. By varying the target PDL descriptor our compiler can generate code for different target architecture configurations without the need to modify the source program. Experiments with a commonly used scientific kernel and a financial application on different configurations of a state-of-the-art CPU/GPU system demonstrate the potential of our approach. In these experiments, our framework targets a flexible heterogeneous runtime system [6], which is capable of scheduling computational tasks to the different execution units of a heterogeneous many-core system. The runtime system takes care of data dependencies between tasks, and in case no data dependencies between a set of tasks exists, these tasks may be executed on different execution units in parallel. As a consequence, with this approach multiple levels of parallelism may be exploited; task parallelism, if two or more tasks are concurrently executed on different execution units, and data parallelism, if a task internally relies on a data parallel execution model, for example, CUDA or OpenCL.

This paper is structured as follows. Section 2 discusses the context and related work. Section 3 gives an overview of our approach and describes the platform description language. Section 4 introduces our programming framework for heterogeneous systems. Experimental results are presented in Section 5. Section 6 summarizes our approach and gives an outlook to future developments.

Section snippets

Related work

Programming heterogeneous systems is targeted by the OpenCL [3] and Nvidia Cuda [7] projects. Both programming models use a similar hierarchical platform abstraction involving a fixed control-relationship (host-device) to support portability between heterogeneous hardware elements. Our PDL can be seen as a generic approach to represent such platform patterns. It is not limited to a specific hierarchy of control-relationships that may only be well suited for specialized classes of heterogeneous

Platform description language

In this section we give an overview of our approach, describe the hierarchical machine model and corresponding XML-based platform description language (PDL).

Cascabel source-to-source translator

To show applicability of the proposed PDL for a task based programming model we have developed Cascabel, a prototypical code-generation system for offloading computational tasks to heterogeneous processing-units. The Cascabel programming model is based on sequential, high-level task based input programs with additional source-code annotations. Annotations indicate function invocations (computational tasks), suitable for execution on possibly heterogeneous processing units. We define tasks as

Experiments

To evaluate our framework, we investigate translations from task based sequential input programs to output programs tailored for parallel execution on a heterogeneous GPU-equipped system. Translation is based on the previously introduced source-code annotations in combination with different target platform descriptors. Each of the target platform descriptors expresses a different machine configuration for static task implementation variant selection and runtime-system configuration.

Summary and outlook

The shift to heterogeneous many-core systems requires rethinking of existing programming models and software development frameworks. The diversity of different software libraries, runtime systems and architectural features inherent to such platforms raises many challenges for application developers and tool-chains.

In this paper we have presented an XML-based platform description language (PDL) that enables to capture key architectural patterns of commonly used heterogeneous computing systems.

Acknowledgment

We would like to thank the INRIA Runtime Team for providing StarPU. This work received funding from the EU under grant agreement n248481 (PEPPHER Project, www.peppher.eu).

References (29)

  • A.R. Brodtkorb et al.

    State-of-the-art in heterogeneous computing

    Sci. Program.

    (2010)
  • R.D. Chamberlain, M.A. Franklin, E.J. Tyson, J. Buhler, S. Gayen, P. Crowley, J.H. Buckley, Application development on...
  • A. Munshi, The OpenCL specification, version 1.1, September...
  • K. Komatsu, K. Sato, Y. Arai, K. Koyama, H. Takizawa, H. Kobayashi, Evaluating performance and portability of OpenCL...
  • S. Rul, H. Vandierendonck, J. D’Haene, K. De Bosschere, An experimental study on performance portability of OpenCL...
  • C. Augonnet et al.

    StarPU: a unified platform for task scheduling on heterogeneous multicore architectures

    Euro-Par 2009 Parallel Process.

    (2009)
  • J. Nickolls et al.

    Scalable parallel programming with CUDA

    Queue

    (2008)
  • K. Fatahalian, D.R. Horn, T.J. Knight, L. Leem, M. Houston, J.Y. Park, M. Erez, M. Ren, A. Aiken, W.J. Dally, et al.,...
  • B. Alpern, L. Carter, J. Ferrante, Modeling parallel computers as memory hierarchies, in: Programming Models for...
  • Y. Yan et al.

    Hierarchical place trees: a portable abstraction for task parallelism and data movement

    Lang. Compilers Parallel Comput.

    (2010)
  • M.D. Linderman, J.D. Collins, H. Wang, T.H. Meng, Merge: a programming model for heterogeneous multi-core systems, in:...
  • J.R. Wernsing et al.

    Elastic computing: a framework for transparent, portable, and adaptive multi-core heterogeneous computing

    ACM SIGPLAN Notices

    (2010)
  • A. Duran et al.

    Ompss: a proposal for programming heterogeneous multi-core architectures

    Parallel Process. Lett.

    (2011)
  • R. Dolbeau, S. Bihan, F. Bodin, HMPP: a hybrid multi-core parallel programming environment, in: First Workshop on...
  • Cited by (25)

    • Programming languages for data-Intensive HPC applications: A systematic mapping study

      2020, Parallel Computing
      Citation Excerpt :

      For instance, the Summit [64] supercomputer (Ranked 1st in the current TOP500 list [65]) has 4608 nodes, and each node comprises two IBM Power9 22-core processors and six Nvidia Volta GPUs. While large-scale heterogeneous HPC systems provide high performance, there is a consensus that programming heterogeneous systems is not straightforward [32,60]. Parallelization of sequential legacy code as well as writing parallel programs from scratch is not easy and the difficulty of programming multi-core systems is also known as programmability wall [57].

    • PAPA: A parallel programming assistant powered by IBM Watson cognitive computing technology

      2018, Journal of Computational Science
      Citation Excerpt :

      One of the main reasons that programmers struggle to keep up with the architecture trends is because programmers are traditionally trained to program sequential programs [5,6]. Furthermore, efficiently utilizing the available resources of modern parallel computing systems requires expert knowledge of the underlying architecture, programming models (that are often device specific), and a consideration of large amounts of device specific configuration parameters (such as, numbers of cores, core speed, memory hierarchy level, cache, run-time system) [7,8]. To alleviate the parallel programming difficulties, several high-level programming models and languages are proposed [9–11], including OpenMP, OpenACC, OpenCL, CUDA, MPI, and Intel TBB.

    • Matching Program Implementations and Heterogeneous Computing Systems

      2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    • Static mapping for OpenCL workloads in heterogeneous computer systems

      2018, Journal of Theoretical and Applied Information Technology
    View all citing articles on Scopus
    View full text