skip to main content
10.1145/3229710.3229724acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Graph Support and Scheduling for OpenCL on Heterogeneous Multi-core Systems

Published: 13 August 2018 Publication History

Abstract

Computation on heterogeneous multi-core systems has great opportunities for optimization which may include the compute resource scheduling such as workload distribution between CPU and GPU, as well as finding the best combination of tasks and compute devices for best performance. Currently, OpenCL, the parallel programming standard for heterogeneous computing, contains mainly low-level APIs to interact with the runtime and hardware device of each individual vendor. To apply efficient scheduling algorithm, the overall execution flow and information of OpenCL kernels must be considered. In this paper, we proposed computational graph support for OpenCL. The framework features computational graphs that store meta-data and execution dependencies of kernels. We then provide scheduling framework for OpenCL programs based on the graph information. In our optimization framework, the kernel task scheduling is based on the graph model. In addition, we have kernel code analysis for target device decision as well as runtime work-group size optimization. The preliminary experimental results show that our scheme enables significant performance enhancement, achieving about 1.59 times speedup relative to our neural network program baseline.

References

[1]
J. E. Stone, D. Gohara, and G. Shi. Opencl: A parallel programming standard for heterogeneous computing systems. Computing in Science Engineering, 12(3):66--73, May 2010.
[2]
Aaftab Munshi. The opencl specification. In Hot Chips 21 Symposium (HCS), 2009 IEEE, pages 1--314. IEEE, 2009.
[3]
Khronos Vision Working Group et al. The openvxâĎć specification v1. 1. Web: https://www.khronos.org/registry/OpenVX/specs/1.1/OpenVX_Specification_1_1.pdf, 2017.
[4]
Erik Rainey, Jesse Villarreal, Goksel Dedeoglu, Kari Pulli, Thierry Lepley, and Frank Brill. Addressing system-level optimization with openvx graphs. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on, pages 658--663. IEEE, 2014.
[5]
Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 265--283, 2016.
[6]
Chris Lattner and Vikram Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, page 75. IEEE Computer Society, 2004.
[7]
Yann LeCun. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998.
[8]
Jonathan Tompson and Kristofer Schlachter. An introduction to the opencl programming model. Person Education, 49, 2012.
[9]
Chun-Chieh Yang, Shao-Chung Wang, Chou-Chuan Chen, and Jenq-Kuen Lee. The support of an experimental opencl compiler on hsa environments. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), page 184. The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), 2015.
[10]
Chun-Chieh Yang, Shao-Chung Wang, Min-Yi Hsu, Yuan-Ming Chang, Yuan-Shin Hwang, and Jenq-Kuen Lee. Opencl 2.0 compiler adaptation on llvm for ptx simulators. In Parallel Processing Workshops (ICPPW), 2017 46th International Conference on, pages 53--58. IEEE, 2017.
[11]
Yuan-Ming Chang, Shao-Chung Wang, Chun-Chieh Yang, Yuan-Shin Hwang, and Jenq-Kuen Lee. Enabling pocl-based runtime frameworks on the hsa for opencl 2.0 support. Journal of Systems Architecture, 81:71--82, 2017.
[12]
Ashwin Mandayam Aji, Antonio J Pena, Pavan Balaji, and Wu-chun Feng. Automatic command queue scheduling for task-parallel workloads in opencl. In Cluster Computing (CLUSTER), 2015 IEEE International Conference on, pages 42--51. IEEE, 2015.
[13]
Yuan Wen, Zheng Wang, and Michael FP O'boyle. Smart multi-task scheduling for opencl programs on cpu/gpu heterogeneous platforms. In High Performance Computing (HiPC), 2014 21st International Conference on, pages 1--10. IEEE, 2014.
[14]
Qi Zhu, Bo Wu, Xipeng Shen, Li Shen, and Zhiying Wang. Co-run scheduling with power cap on integrated cpu-gpu systems. In Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International, pages 967--977. IEEE, 2017.
[15]
Jungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo, and Jaejin Lee. Snucl: an opencl framework for heterogeneous cpu/gpu clusters. In Proceedings of the 26th ACM international conference on Supercomputing, pages 341--352. ACM, 2012.
[16]
Yi-Ping You, Hen-Jung Wu, Yeh-Ning Tsai, and Yen-Ting Chao. Virtcl: a framework for opencl device abstraction and management. In ACM SIGPLAN Notices, volume 50, pages 161--172. ACM, 2015.
[17]
Arthur B Kahn. Topological sorting of large networks. Communications of the ACM, 5(11):558--562, 1962.
[18]
Tianyi David Han and Tarek S Abdelrahman. Reducing branch divergence in gpu programs. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, page 3. ACM, 2011.
[19]
Tzu-Hsiang Lin, Cheng-Yen Lin, and Jenq-Kuen Lee. Scheduling methods for openvx programs on heterogeneous multi-core systems. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), page 20. The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), 2015.

Cited By

View all
  • (2022)Multi-task scheduling framework for OpenCL programs on CPUs-GPUs heterogeneous platformsThird International Conference on Electronics and Communication; Network and Computer Technology (ECNCT 2021)10.1117/12.2628558(31)Online publication date: 7-Mar-2022
  • (2019)CLPKMJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2019.06.00898:C(53-62)Online publication date: 1-Sep-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing
August 2018
409 pages
ISBN:9781450365239
DOI:10.1145/3229710
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • University of Oregon: University of Oregon

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. OpenCL
  2. computational graph
  3. heterogeneous computing
  4. multi-core
  5. scheduling algorithm
  6. work-group optimization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP '18 Comp

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)3
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Multi-task scheduling framework for OpenCL programs on CPUs-GPUs heterogeneous platformsThird International Conference on Electronics and Communication; Network and Computer Technology (ECNCT 2021)10.1117/12.2628558(31)Online publication date: 7-Mar-2022
  • (2019)CLPKMJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2019.06.00898:C(53-62)Online publication date: 1-Sep-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media