Performance-aware composition framework for GPU-based systems

Dastgeer, Usman; Kessler, Christoph

doi:10.1007/s11227-014-1105-1

Performance-aware composition framework for GPU-based systems

Published: 30 January 2014

Volume 71, pages 4646–4662, (2015)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Usman Dastgeer¹ &
Christoph Kessler¹

279 Accesses
6 Altmetric
Explore all metrics

Abstract

User-level components of applications can be made performance-aware by annotating them with performance model and other metadata. We present a component model and a composition framework for the automatically optimized composition of applications for modern GPU-based systems from such components, which may expose multiple implementation variants. The framework targets the composition problem in an integrated manner, with the ability to do global performance-aware composition across multiple invocations. We demonstrate several key features of our framework relating to performance-aware composition including implementation selection, both with performance characteristics being known (or learned) beforehand as well as cases when they are learned at runtime. We also demonstrate hybrid execution capabilities of our framework on real applications. “Furthermore, we present a bulk composition technique that can make better composition decisions by considering information about upcoming calls along with data flow information extracted from the source program by static analysis. The bulk composition improves over the traditional greedy performance aware policy that only considers the current call for optimization.”

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

In this article, we use the terms implementation, implementation variant and variant interchangably.
We consider a recursive function as a special scenario to avoid combinatorial explosion of the solution space.
We have not encountered any such scenario yet in any application that we have ported to our framework.
By default, the system generates composition code for our own GCF runtime library. The user can set the -starpu switch to generate code for the StarPU runtime system.
Registering a variable to the runtime system creates a unique data handle (with information about size, memory address etc.) for that data in the runtime system which can be used for controlling its state and data transfers.
A point represents a single execution with certain performance relevant properties.

References

Augonnet C et al (2009) Automatic calibration of performance models on heterogeneous multicore architectures. In: Euro-Par Workshops (HPPC 2009), LNCS, vol 6043
Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: Proceedings of the Annual International Symposium on Computer Architecture (ISCA)
Karcher T, Pankratius V (2011) Run-time automatic performance tuning for multicore applications. In: Euro-Par, LNCS vol 6852
Ansel J et al (2009) PetaBricks: a language and compiler for algorithmic choice. In: Proceedings conference on Programming Language Design and Implementation (PLDI)
Linderman MD et al (2008) Merge: a programming model for heterogeneous multi-core systems. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
Wernsing JR, Stitt G (2010) Elastic computing: a framework for transparent, portable, and adaptive multi-core heterogeneous computing. In: Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)
Gregg C, Hazelwood K (2011) Where is the data? Why you cannot debate CPU vs. GPU performance without the answer. In: International Symposium on Performance Analysis of Systems and Software (ISPASS)
Quinlan D, Liao C (2011) The ROSE source-to-source compiler infrastructure. In: Cetus users and compiler infrastructure workshop, USA
Augonnet C et al (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exper 23:187–198
Article Google Scholar
Feautrier P (1991) Dataflow analysis of array and scalar references. Intl J Parallel Program 20(1)
Kicherer M et al (2011) Cost-aware function migration in heterogeneous systems. In: Proceedings of the Conference on High Performance and Embedded Architectures and Compilers (HiPEAC)
Li L, Dastgeer U, Kessler C (2013) Adaptive off-line tuning for optimized composition of components for heterogeneous many-core systems. In: Seventh Int. Workshop on Automatic Performance Tuning (iWAPT-2012), Proc. VECPAR-2012 Conference
Che S et al (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization (IISWC)
Topcuoglu H et al (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Par Dist Syst 13(3)
Korch M, Rauber, T (2006) Optimizing locality and scalability of embedded Runge-Kutta solvers using block-based pipelining. J Parallel Distrib Comput 66(3)
Kicherer M, Nowak F, Buchty R, Karl W (2012) Seamlessly portable applications:Managing the diversity of modern heterogeneous systems. ACM Trans Archit Code Optim 8(4):42(1–42:20)
Google Scholar
Dastgeer U, Li L, Kessler C (2012) The PEPPHER composition tool: Performance-aware dynamic composition of applications for GPU-based systems. In: MuCoCoS, SC12
Lee S, Vetter JS (2012) Early evaluation of directive-based gpu programming models for productive exascale computing. In: Conference for high performance computing, networking, storage and analysis
Reyes R, Sande F (2011) Automatic code generation for GPUs in llc. J Supercomput 58:349–356
Article Google Scholar
Kessler CW, Löwe W (2012) Optimized composition of performance-aware parallel components. Concurr Comput Pract Exper 24(5):481–498
Article Google Scholar
Ericsson M (2008) Composition and optimization. Växjö University Press, Kalmar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, Linköping University, Linköping, Sweden
Usman Dastgeer & Christoph Kessler

Authors

Usman Dastgeer
View author publications
You can also search for this author inPubMed Google Scholar
Christoph Kessler
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Usman Dastgeer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dastgeer, U., Kessler, C. Performance-aware composition framework for GPU-based systems. J Supercomput 71, 4646–4662 (2015). https://doi.org/10.1007/s11227-014-1105-1

Download citation

Published: 30 January 2014
Issue Date: December 2015
DOI: https://doi.org/10.1007/s11227-014-1105-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance-aware composition framework for GPU-based systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enabling Dynamic Selection of Implementation Variants in Component-Based Parallel Programming for Heterogeneous Systems

Mozart : Efficient Composition of Library Functions for Heterogeneous Execution

Contrived and Remediated GPU Thread Divergence Using a Flattening Technique

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Performance-aware composition framework for GPU-based systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enabling Dynamic Selection of Implementation Variants in Component-Based Parallel Programming for Heterogeneous Systems

Mozart : Efficient Composition of Library Functions for Heterogeneous Execution

Contrived and Remediated GPU Thread Divergence Using a Flattening Technique

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now