research-article

Graph Support and Scheduling for OpenCL on Heterogeneous Multi-core Systems

Authors:

Shih-Huan Chien,

Yuan-Ming Chang,

Chun-Chieh Yang,

Yuan-Shin Hwang,

Jenq-Kuen LeeAuthors Info & Claims

ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing

Article No.: 14, Pages 1 - 7

https://doi.org/10.1145/3229710.3229724

Published: 13 August 2018 Publication History

Abstract

Computation on heterogeneous multi-core systems has great opportunities for optimization which may include the compute resource scheduling such as workload distribution between CPU and GPU, as well as finding the best combination of tasks and compute devices for best performance. Currently, OpenCL, the parallel programming standard for heterogeneous computing, contains mainly low-level APIs to interact with the runtime and hardware device of each individual vendor. To apply efficient scheduling algorithm, the overall execution flow and information of OpenCL kernels must be considered. In this paper, we proposed computational graph support for OpenCL. The framework features computational graphs that store meta-data and execution dependencies of kernels. We then provide scheduling framework for OpenCL programs based on the graph information. In our optimization framework, the kernel task scheduling is based on the graph model. In addition, we have kernel code analysis for target device decision as well as runtime work-group size optimization. The preliminary experimental results show that our scheme enables significant performance enhancement, achieving about 1.59 times speedup relative to our neural network program baseline.

References

[1]

J. E. Stone, D. Gohara, and G. Shi. Opencl: A parallel programming standard for heterogeneous computing systems. Computing in Science Engineering, 12(3):66--73, May 2010.

Digital Library

[2]

Aaftab Munshi. The opencl specification. In Hot Chips 21 Symposium (HCS), 2009 IEEE, pages 1--314. IEEE, 2009.

[3]

Khronos Vision Working Group et al. The openvxâĎć specification v1. 1. Web: https://www.khronos.org/registry/OpenVX/specs/1.1/OpenVX_Specification_1_1.pdf, 2017.

[4]

Erik Rainey, Jesse Villarreal, Goksel Dedeoglu, Kari Pulli, Thierry Lepley, and Frank Brill. Addressing system-level optimization with openvx graphs. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on, pages 658--663. IEEE, 2014.

Digital Library

[5]

Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 265--283, 2016.

Digital Library

[6]

Chris Lattner and Vikram Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, page 75. IEEE Computer Society, 2004.

Digital Library

[7]

Yann LeCun. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998.

[8]

Jonathan Tompson and Kristofer Schlachter. An introduction to the opencl programming model. Person Education, 49, 2012.

[9]

Chun-Chieh Yang, Shao-Chung Wang, Chou-Chuan Chen, and Jenq-Kuen Lee. The support of an experimental opencl compiler on hsa environments. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), page 184. The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), 2015.

[10]

Chun-Chieh Yang, Shao-Chung Wang, Min-Yi Hsu, Yuan-Ming Chang, Yuan-Shin Hwang, and Jenq-Kuen Lee. Opencl 2.0 compiler adaptation on llvm for ptx simulators. In Parallel Processing Workshops (ICPPW), 2017 46th International Conference on, pages 53--58. IEEE, 2017.

[11]

Yuan-Ming Chang, Shao-Chung Wang, Chun-Chieh Yang, Yuan-Shin Hwang, and Jenq-Kuen Lee. Enabling pocl-based runtime frameworks on the hsa for opencl 2.0 support. Journal of Systems Architecture, 81:71--82, 2017.

Digital Library

[12]

Ashwin Mandayam Aji, Antonio J Pena, Pavan Balaji, and Wu-chun Feng. Automatic command queue scheduling for task-parallel workloads in opencl. In Cluster Computing (CLUSTER), 2015 IEEE International Conference on, pages 42--51. IEEE, 2015.

Digital Library

[13]

Yuan Wen, Zheng Wang, and Michael FP O'boyle. Smart multi-task scheduling for opencl programs on cpu/gpu heterogeneous platforms. In High Performance Computing (HiPC), 2014 21st International Conference on, pages 1--10. IEEE, 2014.

[14]

Qi Zhu, Bo Wu, Xipeng Shen, Li Shen, and Zhiying Wang. Co-run scheduling with power cap on integrated cpu-gpu systems. In Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International, pages 967--977. IEEE, 2017.

[15]

Jungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo, and Jaejin Lee. Snucl: an opencl framework for heterogeneous cpu/gpu clusters. In Proceedings of the 26th ACM international conference on Supercomputing, pages 341--352. ACM, 2012.

Digital Library

[16]

Yi-Ping You, Hen-Jung Wu, Yeh-Ning Tsai, and Yen-Ting Chao. Virtcl: a framework for opencl device abstraction and management. In ACM SIGPLAN Notices, volume 50, pages 161--172. ACM, 2015.

Digital Library

[17]

Arthur B Kahn. Topological sorting of large networks. Communications of the ACM, 5(11):558--562, 1962.

Digital Library

[18]

Tianyi David Han and Tarek S Abdelrahman. Reducing branch divergence in gpu programs. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, page 3. ACM, 2011.

Digital Library

[19]

Tzu-Hsiang Lin, Cheng-Yen Lin, and Jenq-Kuen Lee. Scheduling methods for openvx programs on heterogeneous multi-core systems. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), page 20. The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), 2015.

Cited By

Wang HWang HWang S(2022)Multi-task scheduling framework for OpenCL programs on CPUs-GPUs heterogeneous platformsThird International Conference on Electronics and Communication; Network and Computer Technology (ECNCT 2021)10.1117/12.2628558(31)Online publication date: 7-Mar-2022
https://doi.org/10.1117/12.2628558
Chiu MYou Y(2019)CLPKMJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2019.06.00898:C(53-62)Online publication date: 1-Sep-2019
https://dl.acm.org/doi/10.1016/j.sysarc.2019.06.008

Index Terms

Graph Support and Scheduling for OpenCL on Heterogeneous Multi-core Systems

Recommendations

Performance Gaps between OpenMP and OpenCL for Multi-core CPUs
ICPPW '12: Proceedings of the 2012 41st International Conference on Parallel Processing Workshops

OpenCL and OpenMP are the most commonly used programming models for multi-core processors. They are also fundamentally different in their approach to parallelization. In this paper, we focus on comparing the performance of OpenCL and OpenMP. We select ...
An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor

Modern systems-on-chip augment their baseline CPU with coprocessors and accelerators to increase overall computational capability and power efficiency, and thus have evolved into heterogeneous multi-core systems. Several languages have been developed to ...
Heterogeneous acceleration of volumetric JPEG 2000 using OpenCL

This paper discusses an OpenCL version of a volumetric JPEG 2000 codec that runs on GPUs, multi-core processors or a combination of both. Since the performance critical part consists of a fine-grained discrete wavelet transform and coarse-grained ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing

August 2018

409 pages

ISBN:9781450365239

DOI:10.1145/3229710

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

University of Oregon: University of Oregon

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPP '18 Comp

ICPP '18 Comp: 47th International Conference on Parallel Processing Companion

August 13 - 16, 2018

OR, Eugene, USA

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
130
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)3

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang HWang HWang S(2022)Multi-task scheduling framework for OpenCL programs on CPUs-GPUs heterogeneous platformsThird International Conference on Electronics and Communication; Network and Computer Technology (ECNCT 2021)10.1117/12.2628558(31)Online publication date: 7-Mar-2022
https://doi.org/10.1117/12.2628558
Chiu MYou Y(2019)CLPKMJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2019.06.00898:C(53-62)Online publication date: 1-Sep-2019
https://dl.acm.org/doi/10.1016/j.sysarc.2019.06.008

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents