skip to main content
10.1145/2884045.2884051acmotherconferencesArticle/Chapter ViewAbstractPublication PagesgpgpuConference Proceedingsconference-collections
research-article

Simplifying programming and load balancing of data parallel applications on heterogeneous systems

Published: 12 March 2016 Publication History

Abstract

Heterogeneous architectures have experienced a great development thanks to their excellent cost/performance ratio and low power consumption. But heterogeneity significantly complicates both programming and efficient use of the resources. As a result, programmers have ended up using fixed roles for each kind of device: CPUs for sequential and management tasks and GPUs for parallel work. This is a waste of computing power. Maat is a library for OpenCL programmers that allows for the efficient execution of a single data-parallel kernel using all the available devices. It provides the programmer with an abstract view of the system to enable the management of heterogeneous environments regardless of the underlying architecture, and a set of load balancing methods, which perform data distribution. With Maat, programmers only need to develop a data-parallel kernel, select a load balancing method, and run it on the whole system. Experimental results show that Maat efficiently utilizes all the resources, independently of their number and nature. Provided the most appropriate method is selected, Maat is able to achieve a speedup of up to 1.97 using two GPUs with respect to a single GPU and even over 2 when the CPUs, which are much less performant, come into play.

References

[1]
AMD Accelerated Parallel Processing Software Development Kit V2.9.
[2]
A. Acosta, R. Corujo, V. Blanco, and F. Almeida. Dynamic load balancing on heterogeneous multicore/multiGPU systems. In W. W. Smari and J. P. McIntire, editors, HPCS.
[3]
M. Boyer, K. Skadron, S. Che, and N. Jayasena. Load balancing in a changing world: Dealing with heterogeneity and performance variability. In Proceedings of the ACM International Conference on Computing Frontiers, 2013.
[4]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, pages 44--54. IEEE, 2009.
[5]
C. S. de La Lama, P. Toharia, J. L. Bosque, and O. D. Robles. Static multi-device load balancing for OpenCL. In International Symposium on Parallel and Distributed Processing with Applications ISPA 2012.
[6]
T. Gautier, J. Lima, N. Maillard, and B. Raffin. Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures. In Parallel Distributed Processing (IPDPS), 2013.
[7]
A. Haidar, C. Cao, A. Yarkhan, P. Luszczek, S. Tomov, K. Kabir, and J. Dongarra. Unified development for mixed multi-gpu and multi-coprocessor environments using a lightweight runtime environment. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pages 491--500, May 2014.
[8]
Y. P. Janghaeng Lee, Mehrzad Samadi and S. Mahlke. Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems. In PACT, 2013.
[9]
R. Kaleem, R. Barik, T. Shpeisman, B. T. Lewis, C. Hu, and K. Pingali. Adaptive heterogeneous scheduling for integrated gpus. In PACT Parallel Architectures and Compilation Techniques, 2014.
[10]
J. Kim, H. Kim, J. H. Lee, and J. Lee. Achieving a single compute device image in OpenCL for multiple GPUs. In Symposium on Principles and practice of parallel programming, PPoPP'11., 2011.
[11]
J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. Snucl: An OpenCL framework for heterogeneous cpu/gpu clusters. In Proceedings of the 26th ACM International Conference on Supercomputing, ICS '12, pages 341--352, New York, NY, USA, 2012. ACM.
[12]
J. Lee, M. Samadi, Y. Park, and S. Mahlke. Skmd: Single kernel on multiple devices for transparent cpu-gpu collaboration. ACM Trans. Comput. Syst., 33(3):9:1--9:27, Aug. 2015.
[13]
C.-K. Luk, S. Hong, and H. Kim. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, 2009.
[14]
A. Navarro, A. Vilches, F. Corbera, and R. Asenjo. Strategies for maximizing utilization on multi-cpu and multi-gpu heterogeneous architectures. J. Supercomput., 70(2), Nov. 2014.
[15]
P. Pandit and R. Govindarajan. Fluidic kernels: Cooperative execution of OpenCL programs on multiple heterogeneous devices. In Int. Symposium on Code Generation and Optimization, 2014.
[16]
T. Scogland, B. Rountree, W. chun Feng, and B. de Supinski. Heterogeneous task scheduling for accelerated OpenMP. In Parallel Distributed Processing Symposium (IPDPS), 2012.
[17]
S. Seo, J. Lee, G. Jo, and J. Lee. Automatic OpenCL work-group size selection for multicore CPUs. In Parallel Architectures and Compilation Techniques (PACT) 2013, pages 387--397, Sept 2013.
[18]
K. Spafford, J. Meredith, and J. Vetter. Maestro: Data orchestration and tuning for OpenCL devices. In Proceedings of the 16th International Euro-Par Conference on Parallel Processing: Part II, 2010.
[19]
J. Zhong and B. He. Kernelet: High-throughput gpu kernel executions with dynamic slicing and scheduling. CoRR, abs/1303.5164, 2013.
[20]
Z. Zhong, V. Rychkov, and A. Lastovetsky. Data partitioning on multicore and multi-gpu platforms using functional performance models. Computers, IEEE Transactions on, 64(9):2506--2518, Sept 2015.

Cited By

View all
  • (2023)A Machine Learning Approach for Predicting Efficient CPU Scheduling Algorithm2023 5th International Conference on Sustainable Technologies for Industry 5.0 (STI)10.1109/STI59863.2023.10464816(1-6)Online publication date: 9-Dec-2023
  • (2023)Supporting efficient overlapping of host-device operations for heterogeneous programming with CtrlEventsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.04.009179:COnline publication date: 1-Sep-2023
  • (2022)HyperionProceedings of the 20th ACM Conference on Embedded Networked Sensor Systems10.1145/3560905.3568511(607-621)Online publication date: 6-Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
GPGPU '16: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit
March 2016
107 pages
ISBN:9781450341950
DOI:10.1145/2884045
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 March 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. OpenCL
  2. data parallel applications
  3. heterogeneous architectures
  4. load balancing

Qualifiers

  • Research-article

Conference

PPoPP '16

Acceptance Rates

GPGPU '16 Paper Acceptance Rate 9 of 23 submissions, 39%;
Overall Acceptance Rate 57 of 129 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Machine Learning Approach for Predicting Efficient CPU Scheduling Algorithm2023 5th International Conference on Sustainable Technologies for Industry 5.0 (STI)10.1109/STI59863.2023.10464816(1-6)Online publication date: 9-Dec-2023
  • (2023)Supporting efficient overlapping of host-device operations for heterogeneous programming with CtrlEventsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.04.009179:COnline publication date: 1-Sep-2023
  • (2022)HyperionProceedings of the 20th ACM Conference on Embedded Networked Sensor Systems10.1145/3560905.3568511(607-621)Online publication date: 6-Nov-2022
  • (2022)Heterogeneous Energy-aware Load Balancing for Industry 4.0 and IoT EnvironmentsACM Transactions on Management Information Systems10.1145/354385913:4(1-23)Online publication date: 10-Aug-2022
  • (2022)Fuzzy Active Learning to Detect OpenCL Kernel Heterogeneous Machines in Cyber Physical SystemsIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2022.316715830:11(4618-4629)Online publication date: 1-Nov-2022
  • (2022)GraphCL: A Framework for Execution of Data-Flow Graphs on Multi-Device Platforms2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)10.1109/PDP55904.2022.00026(116-121)Online publication date: Mar-2022
  • (2022)A ML-based resource utilization OpenCL GPU-kernel fusion modelSustainable Computing: Informatics and Systems10.1016/j.suscom.2022.10068335(100683)Online publication date: Sep-2022
  • (2022)OptCL: A Middleware to Optimise Performance for High Performance Domain-Specific Languages on Heterogeneous PlatformsAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-95391-1_48(772-791)Online publication date: 23-Feb-2022
  • (2021)Straightforward Heterogeneous Computing with the oneAPI Coexecutor RuntimeElectronics10.3390/electronics1019238610:19(2386)Online publication date: 29-Sep-2021
  • (2021)CoopCLProceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies10.1145/3468044.3468061(1-2)Online publication date: 21-Jun-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media