The PEPPHER composition tool: performance-aware composition for GPU-based systems

Dastgeer, Usman; Li, Lu; Kessler, Christoph

doi:10.1007/s00607-013-0371-8

The PEPPHER composition tool: performance-aware composition for GPU-based systems

Published: 27 November 2013

Volume 96, pages 1195–1211, (2014)
Cite this article

Computing Aims and scope Submit manuscript

Usman Dastgeer¹,
Lu Li¹ &
Christoph Kessler¹

289 Accesses
8 Citations
Explore all metrics

Abstract

The PEPPHER (EU FP7 project) component model defines the notion of component, interface and meta-data for homogeneous and heterogeneous parallel systems. In this paper, we describe and evaluate the PEPPHER composition tool, which explores the application’s components and their implementation variants, generates the necessary low-level code that interacts with the runtime system, and coordinates the native compilation and linking of the various code units to compose the overall application code to optimize performance. We discuss the concept of smart containers and its benefits for reducing dispatch overhead, exploiting implicit parallelism across component invocations and runtime optimization of data transfers. In an experimental evaluation with several applications, we demonstrate that the composition tool provides a high-level programming front-end while effectively utilizing the task-based PEPPHER runtime system (StarPU) underneath for different usage scenarios on GPU-based systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems

Article Open access 06 December 2022

Mozart : Efficient Composition of Library Functions for Heterogeneous Execution

OpenMP Target Offload Utilizing GPU Shared Memory

Notes

For demonstration purpose, we have used CUBLAS [4] and CUSP [5] components for CUDA implementations as shown in Sect. 5.
As the PEPPHER runtime system is C based and the C language does not permit to call functions with varying types depending on the actual task being run.
The read and write accesses to container data are distinguished by implementing proxy classes for element data in C++ [11].

References

Benkner S, Pllana S, Träff JL, Tsigas P, Dolinsky U, Augonnet C, Bachmayer B, Kessler C, Moloney D, Osipov V (2011) PEPPHER: efficient and productive usage of hybrid computing systems. IEEE Micro 31(5):28–41
Article Google Scholar
Augonnet C, Thibault S, Namyst R, Wacrenier PA (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exper 23(2):187–198
Article Google Scholar
Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IEEE international symposium on workload characterization (IISWC), pp 44–54
NVIDIA Corporation (2012) CUBLAS library: NVIDIA CUDA basic linear algebra subroutines. http://developer.nvidia.com/cublas/
Bell N, Garland M (2012) CUSP library v0.2: generic parallel algorithms for sparse matrix and graph computations. http://code.google.com/p/cusp-library/
Asanovic K et al (2009) A view of the parallel computing landscape. Commun ACM 52(10):56–67
Article Google Scholar
Kessler CW, Löwe W (2012) Optimized composition of performance-aware parallel components. Concurr Comput Pract Exper 24(5):481–498
Article Google Scholar
Li L, Dastgeer U, Kessler C (2013) Adaptive off-line tuning for optimized composition of components for heterogeneous many-core systems. In: Seventh international workshop on automatic performance tuning (iWAPT-2012), Proc. VECPAR-2012 conference, pp 329–345
Kicherer M, Buchty R, Karl W (2011) Cost-aware function migration in heterogeneous systems. In: Proceedings conference on High Perf. and Emb. Arch. and Comp. (HiPEAC), pp 137–145
Kicherer M, Nowak F, Buchty R, Karl W (2012) Seamlessly portable applications: Managing the diversity of modern heterogeneous systems. ACM Trans Archit Code Optim 8(4):42(1–42:20)
Google Scholar
Alexandrescu A (2001) Modern C++ design: generic programming and design patterns applied. Addison-Wesley, Reading
Park R (1992) Software size measurement: a framework for counting source statements. Software Engineering Institute, Carnegie Mellon University, Pittsburgh, Tech. rep
Davis TA, Hu Y (2011) The university of florida sparse matrix collection. ACM Trans Math Softw 38(1):1(1–1:25)
Google Scholar
Ng R, Levoy M, Brédif M, Duval G, Horowitz M, Hanrahan P (2005) Light field photography with a hand-held plenoptic camera. Stanford University, Stanford, Tech. rep
Augonnet C (2011) Scheduling tasks over multicore machines enhanced with accelerators: a runtime system’s perspective. PhD thesis, Université Bordeaux 1
Ansel J, Chan C, Wong YL, Olszewski M, Zhao Q, Edelman A, Amarasinghe S (2009) PetaBricks: a language and compiler for algorithmic choice. Proc Conf on Prog Lang Design and Impl (PLDI)
Wang PH, Collins JD, Chinya GN, Jiang H, Tian X, Girkar M, Yang NY, Lueh GY, Wang H (2007) EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system. In: Proceedings of conference on programming language design and implementation (PLDI), pp 156–166
Linderman MD, Collins JD, Wang H, Meng THY (2008) Merge: a programming model for heterogeneous multi-core systems. In: Proceedings of international conference on architecture support for programming language and Operating Systems, (ASPLOS 2008), pp 287–296
Huang SS, Hormati A, Bacon DF, Rabbah R (2008) Liquid metal: object-oriented programming across the hardware/software boundary. In: Proceedings of 22nd European conference on object-oriented progamming (ECOOP), pp 76–103
Wernsing JR, Stitt G (2010) Elastic computing: a framework for transparent, portable, and adaptive multi-core heterogeneous computing. In: Proceedings of conference on languages, compilers, and tools for embedded systems (LCTES), pp 115–124
Chafi H, Sujeeth AK, Brown KJ, Lee H, Atreya AR, Olukotun K (2011) A domain-specific approach to heterogeneous parallelism. In: 16th symposium on principles and practice of parallel programming (PPoPP), pp 35–46

Download references

Acknowledgments

This work was funded by EU FP7, project PEPPHER, grant #248481 (http://www.pep-pher.eu) and by SeRC. We would like to thank University of Vienna for providing access to their machine.

Author information

Authors and Affiliations

PELAB, Department of Computer and Information Science, Linköping University, Linköping, Sweden
Usman Dastgeer, Lu Li & Christoph Kessler

Authors

Usman Dastgeer
View author publications
You can also search for this author in PubMed Google Scholar
Lu Li
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Kessler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Usman Dastgeer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dastgeer, U., Li, L. & Kessler, C. The PEPPHER composition tool: performance-aware composition for GPU-based systems. Computing 96, 1195–1211 (2014). https://doi.org/10.1007/s00607-013-0371-8

Download citation

Received: 14 March 2013
Accepted: 07 November 2013
Published: 27 November 2013
Issue Date: December 2014
DOI: https://doi.org/10.1007/s00607-013-0371-8

Keywords

Mathematics Subject Classification

68N20 Compilers and interpreters

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The PEPPHER composition tool: performance-aware composition for GPU-based systems

Abstract

Access this article

Similar content being viewed by others

Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems

Mozart : Efficient Composition of Library Functions for Heterogeneous Execution

OpenMP Target Offload Utilizing GPU Shared Memory

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

The PEPPHER composition tool: performance-aware composition for GPU-based systems

Abstract

Access this article

Similar content being viewed by others

Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems

Mozart : Efficient Composition of Library Functions for Heterogeneous Execution

OpenMP Target Offload Utilizing GPU Shared Memory

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation