skip to main content
10.1145/2616498.2616547acmotherconferencesArticle/Chapter ViewAbstractPublication PagesxsedeConference Proceedingsconference-collections
research-article

Runtime Pipeline Scheduling System for Heterogeneous Architectures

Published: 13 July 2014 Publication History

Abstract

Heterogeneous architectures can improve the performance of applications with computationally intensive, data-parallel operations. Even when these architectures may reduce the execution time of applications, there are opportunities for additional performance improvement as the memory hierarchy of the central processor cores and the graphics processor cores are separate. Applications executing on heterogeneous architectures must allocate space in the GPU global memory, copy input data, invoke kernels, and copy results to the CPU memory. This scheme does not overlap inter-memory data transfers and GPU computations, thus increasing application execution time. This research presents a software architecture with a runtime pipeline system for GPU input/output scheduling that acts as a bidirectional interface between the GPU computing application and the physical device. The main aim of this system is to reduce the impact of the processor-memory performance gap by exploiting device I/O and computation overlap. Evaluation using application benchmarks shows processing improvements with speedups above 2x with respect to baseline, non-streamed GPU execution.

References

[1]
M. Bauer, H. Cook, and B. Khailany. CudaDMA: optimizing GPU memory bandwidth via warp specialization. In Proc. of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC2011), page 12. ACM, 2011.
[2]
T. Bradley. Hyper-Q example, NVIDIA. Technical report, 2012.
[3]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium, pages 44--54. IEEE, 2009.
[4]
R. Farber. CUDA application design and development. Elsevier, 2011.
[5]
A. Gharaibeh, S. Al-Kiswany, and M. Ripeanu. CrystalGPU: Transparent and efficient utilization of GPU power. Technical report, 2010.
[6]
S. E. Goodman. and S. Hedetniem. Introduction to the Design and Analysis of Algorithms. McGraw-Hill, 1977.
[7]
V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia, V. Talwar, and P. Ranganathan. GViM: GPU-accelerated virtual machines. In Proc. of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing, pages 17--24. ACM, 2009.
[8]
D. B. Kirk and W. H. Wen-mei. Programming massively parallel processors: a hands-on approach. Morgan Kaufmann, 2010.
[9]
D. Lustig and M. Martonosi. Reducing GPU offload latency via fine-grained CPU-GPU synchronization. In HPCA, pages 354--365, 2013.
[10]
NVIDIA. CUDA toolkit 4.0 readiness for CUDA applications. Technical report, 2011.
[11]
NVIDIA. CUDA C Programming Guide, 2013 (accessed March 20, 2014).
[12]
NVIDIA. http://www.nvidia.com, 2014 (accessed March 20, 2014).
[13]
S. Pai, M. J. Thazhuthaveetil, and R. Govindarajan. Improving GPGPU concurrency with elastic kernels. In Proc. of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 407--418. ACM, 2013.
[14]
H. Peters, M. Koper, and N. Luttenberger. Efficiently using a CUDA-enabled GPU as shared resource. In Proc. IEEE International Conference on Computer and Information Technology, pages 1122--1127. IEEE, 2010.
[15]
L. Shi, H. Chen, J. Sun, and K. Li. vCUDA: GPU-accelerated high-performance computing in virtual machines. Computers, IEEE Transactions., 61(6):804--816, 2012.
[16]
W. Sun and R. Ricci. Augmenting operating systems with the GPU. Technical report, 2010.
[17]
TACC. http://www.tacc.utexas.edu, 2013 (accessed October 16, 2013).
[18]
L. Wang, M. Huang, and T. El-Ghazawi. Exploiting concurrent kernel execution on graphic processing units. In Proc. 2011 International Conference on High Performance Computing and Simulation, pages 24--32. IEEE, 2011.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
XSEDE '14: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment
July 2014
445 pages
ISBN:9781450328937
DOI:10.1145/2616498
  • General Chair:
  • Scott Lathrop,
  • Program Chair:
  • Jay Alameda
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • NSF: National Science Foundation
  • Drexel University
  • Indiana University: Indiana University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Concurrency
  2. heterogeneous architecture
  3. over-lapping
  4. pipeline
  5. runtime
  6. scheduling
  7. streams

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

XSEDE '14

Acceptance Rates

XSEDE '14 Paper Acceptance Rate 80 of 120 submissions, 67%;
Overall Acceptance Rate 129 of 190 submissions, 68%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 104
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media