research-article

Runtime Pipeline Scheduling System for Heterogeneous Architectures

Authors:

Julio C. Olaya,

Rodrigo A. RomeroAuthors Info & Claims

XSEDE '14: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment

Article No.: 45, Pages 1 - 7

https://doi.org/10.1145/2616498.2616547

Published: 13 July 2014 Publication History

Abstract

Heterogeneous architectures can improve the performance of applications with computationally intensive, data-parallel operations. Even when these architectures may reduce the execution time of applications, there are opportunities for additional performance improvement as the memory hierarchy of the central processor cores and the graphics processor cores are separate. Applications executing on heterogeneous architectures must allocate space in the GPU global memory, copy input data, invoke kernels, and copy results to the CPU memory. This scheme does not overlap inter-memory data transfers and GPU computations, thus increasing application execution time. This research presents a software architecture with a runtime pipeline system for GPU input/output scheduling that acts as a bidirectional interface between the GPU computing application and the physical device. The main aim of this system is to reduce the impact of the processor-memory performance gap by exploiting device I/O and computation overlap. Evaluation using application benchmarks shows processing improvements with speedups above 2x with respect to baseline, non-streamed GPU execution.

References

[1]

M. Bauer, H. Cook, and B. Khailany. CudaDMA: optimizing GPU memory bandwidth via warp specialization. In Proc. of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC2011), page 12. ACM, 2011.

Digital Library

[2]

T. Bradley. Hyper-Q example, NVIDIA. Technical report, 2012.

[3]

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium, pages 44--54. IEEE, 2009.

Digital Library

[4]

R. Farber. CUDA application design and development. Elsevier, 2011.

Digital Library

[5]

A. Gharaibeh, S. Al-Kiswany, and M. Ripeanu. CrystalGPU: Transparent and efficient utilization of GPU power. Technical report, 2010.

[6]

S. E. Goodman. and S. Hedetniem. Introduction to the Design and Analysis of Algorithms. McGraw-Hill, 1977.

Digital Library

[7]

V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia, V. Talwar, and P. Ranganathan. GViM: GPU-accelerated virtual machines. In Proc. of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing, pages 17--24. ACM, 2009.

Digital Library

[8]

D. B. Kirk and W. H. Wen-mei. Programming massively parallel processors: a hands-on approach. Morgan Kaufmann, 2010.

Digital Library

[9]

D. Lustig and M. Martonosi. Reducing GPU offload latency via fine-grained CPU-GPU synchronization. In HPCA, pages 354--365, 2013.

Digital Library

[10]

NVIDIA. CUDA toolkit 4.0 readiness for CUDA applications. Technical report, 2011.

[11]

NVIDIA. CUDA C Programming Guide, 2013 (accessed March 20, 2014).

[12]

NVIDIA. http://www.nvidia.com, 2014 (accessed March 20, 2014).

[13]

S. Pai, M. J. Thazhuthaveetil, and R. Govindarajan. Improving GPGPU concurrency with elastic kernels. In Proc. of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 407--418. ACM, 2013.

Digital Library

[14]

H. Peters, M. Koper, and N. Luttenberger. Efficiently using a CUDA-enabled GPU as shared resource. In Proc. IEEE International Conference on Computer and Information Technology, pages 1122--1127. IEEE, 2010.

Digital Library

[15]

L. Shi, H. Chen, J. Sun, and K. Li. vCUDA: GPU-accelerated high-performance computing in virtual machines. Computers, IEEE Transactions., 61(6):804--816, 2012.

Digital Library

[16]

W. Sun and R. Ricci. Augmenting operating systems with the GPU. Technical report, 2010.

[17]

TACC. http://www.tacc.utexas.edu, 2013 (accessed October 16, 2013).

[18]

L. Wang, M. Huang, and T. El-Ghazawi. Exploiting concurrent kernel execution on graphic processing units. In Proc. 2011 International Conference on High Performance Computing and Simulation, pages 24--32. IEEE, 2011.

Index Terms

Runtime Pipeline Scheduling System for Heterogeneous Architectures
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages

Recommendations

Task-based FMM for heterogeneous architectures

High performance fast multipole method is crucial for the numerical simulation of many physical problems. In a previous study, we have shown that task-based fast multipole method provides the flexibility required to process a wide spectrum of particle ...
Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory
SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures

In this paper, we describe a runtime to automatically enhance the performance of applications running on heterogeneous platforms consisting of a multi-core (CPU) and a throughput-oriented many-core (GPU). The CPU and GPU are connected by a non-coherent ...
Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor
IPDPSW '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

The Intel® Xeon Phi™ coprocessor platform has a new software stack that enables new programming models. One such model is offload of computation from a host processor to a coprocessor that is a fully-capable Intel® Architecture CPU, namely, the Intel® ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

XSEDE '14: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment

July 2014

445 pages

ISBN:9781450328937

DOI:10.1145/2616498

General Chair:
Scott Lathrop
National Center for Supercomputing Applications
,
Program Chair:
Jay Alameda
National Center for Supercomputing Applications

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

NSF: National Science Foundation
Drexel University
Indiana University: Indiana University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

XSEDE '14

XSEDE '14: Annual Conference of the Extreme Science and Engineering Discovery Environment

July 13 - 18, 2014

GA, Atlanta, USA

Acceptance Rates

XSEDE '14 Paper Acceptance Rate 80 of 120 submissions, 67%;

Overall Acceptance Rate 129 of 190 submissions, 68%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
104
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten