research-article

Simplifying programming and load balancing of data parallel applications on heterogeneous systems

Authors:

José Luis Bosque,

Ramón BeivideAuthors Info & Claims

GPGPU '16: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit

Pages 42 - 51

https://doi.org/10.1145/2884045.2884051

Published: 12 March 2016 Publication History

Abstract

Heterogeneous architectures have experienced a great development thanks to their excellent cost/performance ratio and low power consumption. But heterogeneity significantly complicates both programming and efficient use of the resources. As a result, programmers have ended up using fixed roles for each kind of device: CPUs for sequential and management tasks and GPUs for parallel work. This is a waste of computing power. Maat is a library for OpenCL programmers that allows for the efficient execution of a single data-parallel kernel using all the available devices. It provides the programmer with an abstract view of the system to enable the management of heterogeneous environments regardless of the underlying architecture, and a set of load balancing methods, which perform data distribution. With Maat, programmers only need to develop a data-parallel kernel, select a load balancing method, and run it on the whole system. Experimental results show that Maat efficiently utilizes all the resources, independently of their number and nature. Provided the most appropriate method is selected, Maat is able to achieve a speedup of up to 1.97 using two GPUs with respect to a single GPU and even over 2 when the CPUs, which are much less performant, come into play.

References

[1]

AMD Accelerated Parallel Processing Software Development Kit V2.9.

[2]

A. Acosta, R. Corujo, V. Blanco, and F. Almeida. Dynamic load balancing on heterogeneous multicore/multiGPU systems. In W. W. Smari and J. P. McIntire, editors, HPCS.

[3]

M. Boyer, K. Skadron, S. Che, and N. Jayasena. Load balancing in a changing world: Dealing with heterogeneity and performance variability. In Proceedings of the ACM International Conference on Computing Frontiers, 2013.

Digital Library

[4]

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, pages 44--54. IEEE, 2009.

Digital Library

[5]

C. S. de La Lama, P. Toharia, J. L. Bosque, and O. D. Robles. Static multi-device load balancing for OpenCL. In International Symposium on Parallel and Distributed Processing with Applications ISPA 2012.

Digital Library

[6]

T. Gautier, J. Lima, N. Maillard, and B. Raffin. Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures. In Parallel Distributed Processing (IPDPS), 2013.

Digital Library

[7]

A. Haidar, C. Cao, A. Yarkhan, P. Luszczek, S. Tomov, K. Kabir, and J. Dongarra. Unified development for mixed multi-gpu and multi-coprocessor environments using a lightweight runtime environment. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pages 491--500, May 2014.

Digital Library

[8]

Y. P. Janghaeng Lee, Mehrzad Samadi and S. Mahlke. Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems. In PACT, 2013.

Digital Library

[9]

R. Kaleem, R. Barik, T. Shpeisman, B. T. Lewis, C. Hu, and K. Pingali. Adaptive heterogeneous scheduling for integrated gpus. In PACT Parallel Architectures and Compilation Techniques, 2014.

Digital Library

[10]

J. Kim, H. Kim, J. H. Lee, and J. Lee. Achieving a single compute device image in OpenCL for multiple GPUs. In Symposium on Principles and practice of parallel programming, PPoPP'11., 2011.

Digital Library

[11]

J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. Snucl: An OpenCL framework for heterogeneous cpu/gpu clusters. In Proceedings of the 26th ACM International Conference on Supercomputing, ICS '12, pages 341--352, New York, NY, USA, 2012. ACM.

Digital Library

[12]

J. Lee, M. Samadi, Y. Park, and S. Mahlke. Skmd: Single kernel on multiple devices for transparent cpu-gpu collaboration. ACM Trans. Comput. Syst., 33(3):9:1--9:27, Aug. 2015.

Digital Library

[13]

C.-K. Luk, S. Hong, and H. Kim. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, 2009.

Digital Library

[14]

A. Navarro, A. Vilches, F. Corbera, and R. Asenjo. Strategies for maximizing utilization on multi-cpu and multi-gpu heterogeneous architectures. J. Supercomput., 70(2), Nov. 2014.

Digital Library

[15]

P. Pandit and R. Govindarajan. Fluidic kernels: Cooperative execution of OpenCL programs on multiple heterogeneous devices. In Int. Symposium on Code Generation and Optimization, 2014.

Digital Library

[16]

T. Scogland, B. Rountree, W. chun Feng, and B. de Supinski. Heterogeneous task scheduling for accelerated OpenMP. In Parallel Distributed Processing Symposium (IPDPS), 2012.

Digital Library

[17]

S. Seo, J. Lee, G. Jo, and J. Lee. Automatic OpenCL work-group size selection for multicore CPUs. In Parallel Architectures and Compilation Techniques (PACT) 2013, pages 387--397, Sept 2013.

Digital Library

[18]

K. Spafford, J. Meredith, and J. Vetter. Maestro: Data orchestration and tuning for OpenCL devices. In Proceedings of the 16th International Euro-Par Conference on Parallel Processing: Part II, 2010.

Digital Library

[19]

J. Zhong and B. He. Kernelet: High-throughput gpu kernel executions with dynamic slicing and scheduling. CoRR, abs/1303.5164, 2013.

[20]

Z. Zhong, V. Rychkov, and A. Lastovetsky. Data partitioning on multicore and multi-gpu platforms using functional performance models. Computers, IEEE Transactions on, 64(9):2506--2518, Sept 2015.

Cited By

Biswas SAhmed MRahman MKhaer AIslam M(2023)A Machine Learning Approach for Predicting Efficient CPU Scheduling Algorithm2023 5th International Conference on Sustainable Technologies for Industry 5.0 (STI)10.1109/STI59863.2023.10464816(1-6)Online publication date: 9-Dec-2023
https://doi.org/10.1109/STI59863.2023.10464816
Torres YAndújar FGonzalez-Escribano ALlanos D(2023)Supporting efficient overlapping of host-device operations for heterogeneous programming with CtrlEventsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.04.009179:COnline publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1016/j.jpdc.2023.04.009
Fu ZRen JLiu YCao TZhang DZhou YZhang YGummeson JLee SGao JXing G(2022)HyperionProceedings of the 20th ACM Conference on Embedded Networked Sensor Systems10.1145/3560905.3568511(607-621)Online publication date: 6-Nov-2022
https://dl.acm.org/doi/10.1145/3560905.3568511
Show More Cited By

Index Terms

Simplifying programming and load balancing of data parallel applications on heterogeneous systems
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
2. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages

Recommendations

Load balancing in a heterogeneous world: CPU-Xeon Phi co-execution of data-parallel kernels

Heterogeneous systems composed by a CPU and a set of different hardware accelerators are very compelling thanks to their excellent performance and energy consumption features. One of the most important problems of those systems is the workload ...
Energy efficiency of load balancing for data-parallel applications in heterogeneous systems

The use of heterogeneous systems in supercomputing is on the rise as they improve both performance and energy efficiency. However, the programming of these machines requires considerable effort to get the best results in massively data-parallel ...
Portable mapping of data parallel programs to OpenCL for heterogeneous systems
CGO '13: Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Re-alizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

GPGPU '16: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit

March 2016

107 pages

ISBN:9781450341950

DOI:10.1145/2884045

Conference Chairs:
David Kaeli,
John Cavazos

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 March 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PPoPP '16

PPoPP '16: 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

March 12, 2016

Barcelona, Spain

Acceptance Rates

GPGPU '16 Paper Acceptance Rate 9 of 23 submissions, 39%;

Overall Acceptance Rate 57 of 129 submissions, 44%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

32
Total Citations
View Citations
347
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Biswas SAhmed MRahman MKhaer AIslam M(2023)A Machine Learning Approach for Predicting Efficient CPU Scheduling Algorithm2023 5th International Conference on Sustainable Technologies for Industry 5.0 (STI)10.1109/STI59863.2023.10464816(1-6)Online publication date: 9-Dec-2023
https://doi.org/10.1109/STI59863.2023.10464816
Torres YAndújar FGonzalez-Escribano ALlanos D(2023)Supporting efficient overlapping of host-device operations for heterogeneous programming with CtrlEventsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.04.009179:COnline publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1016/j.jpdc.2023.04.009
Fu ZRen JLiu YCao TZhang DZhou YZhang YGummeson JLee SGao JXing G(2022)HyperionProceedings of the 20th ACM Conference on Embedded Networked Sensor Systems10.1145/3560905.3568511(607-621)Online publication date: 6-Nov-2022
https://dl.acm.org/doi/10.1145/3560905.3568511
Ahmed ULin JSrivastava G(2022)Heterogeneous Energy-aware Load Balancing for Industry 4.0 and IoT EnvironmentsACM Transactions on Management Information Systems10.1145/354385913:4(1-23)Online publication date: 10-Aug-2022
https://dl.acm.org/doi/10.1145/3543859
Ahmed ULin JSrivastava GMekala MJung H(2022)Fuzzy Active Learning to Detect OpenCL Kernel Heterogeneous Machines in Cyber Physical SystemsIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2022.316715830:11(4618-4629)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1109/TFUZZ.2022.3167158
Moren KGohringer D(2022)GraphCL: A Framework for Execution of Data-Flow Graphs on Multi-Device Platforms2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)10.1109/PDP55904.2022.00026(116-121)Online publication date: Mar-2022
https://doi.org/10.1109/PDP55904.2022.00026
Ahmed ULin JSrivastava G(2022)A ML-based resource utilization OpenCL GPU-kernel fusion modelSustainable Computing: Informatics and Systems10.1016/j.suscom.2022.10068335(100683)Online publication date: Sep-2022
https://doi.org/10.1016/j.suscom.2022.100683
Xiao JAndelfinger PCai WEckhoff DKnoll A(2022)OptCL: A Middleware to Optimise Performance for High Performance Domain-Specific Languages on Heterogeneous PlatformsAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-95391-1_48(772-791)Online publication date: 23-Feb-2022
https://doi.org/10.1007/978-3-030-95391-1_48
Nozal RBosque J(2021)Straightforward Heterogeneous Computing with the oneAPI Coexecutor RuntimeElectronics10.3390/electronics1019238610:19(2386)Online publication date: 29-Sep-2021
https://doi.org/10.3390/electronics10192386
Moreń KGöhringer D(2021)CoopCLProceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies10.1145/3468044.3468061(1-2)Online publication date: 21-Jun-2021
https://dl.acm.org/doi/10.1145/3468044.3468061
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten