research-article

Scheduling Methods to Optimize Dependent Programs for GPU Architecture

Authors:
Wei-Cheng Liao

National Taiwan University of Science and Technology, Taipei, Taiwan

National Taiwan University of Science and Technology, Taipei, Taiwan
View Profile

,
Yuan-Ming Chang

National Tsing Hua University, Hsinchu, Taiwan

National Tsing Hua University, Hsinchu, Taiwan
View Profile

,
Shao-Chung Wang

National Tsing Hua University, Hsinchu, Taiwan

National Tsing Hua University, Hsinchu, Taiwan
View Profile

,
Chun-Chieh Yang

National Tsing Hua University, Hsinchu, Taiwan

National Tsing Hua University, Hsinchu, Taiwan
View Profile

,
Jenq-Kuen Lee

National Tsing-Hua University, Hsinchu, Taiwan

National Tsing-Hua University, Hsinchu, Taiwan
View Profile

,
Yuan-Shin Hwang

National Taiwan University of Science and Technology, Taipei, Taiwan

National Taiwan University of Science and Technology, Taipei, Taiwan
View Profile

ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel ProcessingAugust 2018Article No.: 13Pages 1–8https://doi.org/10.1145/3229710.3229723

Published:13 August 2018Publication History

ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing

Pages 1–8

ABSTRACT

GPUs have now been widely used in various computation-intensive applications, such as image processing, deep learning, artificial intelligence, etc. As these applications could be modeled by multiple GPU kernels, some of which might even be dependent, it is essential to find an efficient method to schedule dependent kernels on GPU cores. Simply observing dependences of kernels by executing them in sequence will result in performance degradation. Furthermore, dependent kernels generally need to share data. Consequently, without properly scheduling dependent kernels, unnecessary memory accesses and copies will be generated. This paper proposes an efficient method for scheduling dependent kernels on GPUs. Preliminary experimental results show that this technique improves performance by 43% on average when combining with appropriate memory write-back policies.

References

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).Google Scholar
Ali Bakhoda, George L Yuan, Wilson WL Fung, Henry Wong, and Tor M Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on. IEEE, 163--174.Google ScholarCross Ref
Yuan-Ming Chang, Shao-Chung Wang, Chun-Chieh Yang, Yuan-Shin Hwang, and Jenq-Kuen Lee. 2017. Enabling PoCL-based runtime frameworks on the HSA for OpenCL 2.0 support. Journal of Systems Architecture 81 (2017), 71--82. Google ScholarDigital Library
Tai-Liang Chen, Shih-Huan Chien, and Jenq-Kuen Lee. 2018. ViennaCL++: Enable TensorFlow/Eigen via ViennaCL with OpenCL C++ Flow. In IWOCL (Poster). Google ScholarDigital Library
Li-An Her and Jenq-Kuen Lee. 2018. OpenCL Vector Swizzling Optimization under LLVM Global Value Numbering. In April 2018 Workshop on Compilers for Parallel Computing (CPC).Google Scholar
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 675--678. Google ScholarDigital Library
Onur Kayıran, Adwait Jog, Mahmut Taylan Kandemir, and Chita Ranjan Das. 2013. Neither more nor less: optimizing thread-level parallelism for GPGPUs. In Proceedings of the 22nd international conference on Parallel architectures and compilation techniques. IEEE Press, 157--166. Google ScholarDigital Library
Gwangsun Kim, Jiyun Jeong, John Kim, and Mark Stephenson. 2016. Automatically exploiting implicit Pipeline Parallelism from multiple dependent kernels for GPUs. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation. ACM, 341--352. Google ScholarDigital Library
Yu-Te Lin and Jenq-Kuen Lee. 2016. Vector data flow analysis for SIMD optimizations on OpenCL programs. Concurrency and Computation: Practice and Experience 28, 5 (2016), 1629--1654. Google ScholarDigital Library
Fermi NVidia. 2009. NvidiaâĂ&Zacute;s next generation cuda compute architecture. NVidia, Santa Clara, Calif USA (2009).Google Scholar
Ivan Tanasic, Isaac Gelado, Javier Cabezas, Alex Ramirez, Nacho Navarro, and Mateo Valero. 2014. Enabling preemptive multiprogramming on GPUs. In ACM SIGARCH Computer Architecture News, Vol. 42. IEEE Press, 193--204. Google ScholarDigital Library
Shao-Chung Wang, Li-Chen Kan, Chao-Lin Lee, Yuan-Shin Hwang, and Jenq-Kuen Lee. 2017. Architecture and Compiler Support for GPUs Using Energy-Efficient Affine Register Files. ACM Transactions on Design Automation of Electronic Systems (TODAES) 23, 2 (2017), 18. Google ScholarDigital Library
Lin-Ya Yu, Shao-Chung Wang, and Jenq-Kuen Lee. 2017. Hierarchical Read/Write Analysis for Pointer-Based OpenCL Programs on RRAM. In Parallel Processing Workshops (ICPPW), 2017 46th International Conference on. IEEE, 45--52.Google Scholar

Index Terms

Scheduling Methods to Optimize Dependent Programs for GPU Architecture
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures

Recommendations

Exploration of CPU/GPU co-execution: from the perspective of performance, energy, and temperature
RACS '11: Proceedings of the 2011 ACM Symposium on Research in Applied Computation

In recent computing systems, CPUs have encountered the situations in which they cannot meet the increasing throughput demands. To overcome the limits of CPUs in processing heavy tasks, especially for computer graphics, GPUs have been widely used. ...
Read More
Optimizing stencil application on multi-thread GPU architecture using stream programming model
ARCS'10: Proceedings of the 23rd international conference on Architecture of Computing Systems

With fast development of GPU hardware and software, using GPUs to accelerate non-graphics CPU applications is becoming inevitable trend. GPUs are good at performing ALU-intensive computation and feature high peak performance; however, how to harness ...
Read More
Architecture-Aware Mapping and Optimization on a 1600-Core GPU
ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems

The graphics processing unit (GPU) continues to make in-roads as a computational accelerator for high-performance computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task, it is a multi-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing
August 2018
409 pages
ISBN:9781450365239
DOI:10.1145/3229710

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 August 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GPU
Memory
Scheduling
Simulator
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate91of313submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 113
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scheduling Methods to Optimize Dependent Programs for GPU Architecture

ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploration of CPU/GPU co-execution: from the perspective of performance, energy, and temperature

Optimizing stencil application on multi-thread GPU architecture using stream programming model

Architecture-Aware Mapping and Optimization on a 1600-Core GPU

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Scheduling Methods to Optimize Dependent Programs for GPU Architecture

ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploration of CPU/GPU co-execution: from the perspective of performance, energy, and temperature

Optimizing stencil application on multi-thread GPU architecture using stream programming model

Architecture-Aware Mapping and Optimization on a 1600-Core GPU

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media