column

Towards efficient GPU sharing on multicore processors

Authors:

Lingyuan Wang,

Miaoqing Huang,

Tarek El-GhazawiAuthors Info & Claims

ACM SIGMETRICS Performance Evaluation Review, Volume 40, Issue 2

Pages 119 - 124

https://doi.org/10.1145/2381056.2381081

Published: 08 October 2012 Publication History

Get Access

Abstract

Scalable systems employing a mix of GPUs with CPUs are becoming increasingly prevalent in high-performance computing. The presence of such accelerators introduces significant challenges and complexities to both language developers and end users. This paper provides a close study of efficient coordination mechanisms to handle parallel requests from multiple hosts of control to a GPU under hybrid programming. Using a set of microbenchmarks and applications on a GPU cluster, we show that thread and process-based context hosting have different tradeoffs. Experimental results on application benchmarks suggest that both thread-based context funneling and process-based context switching natively perform similarly on the latest Fermi GPUs, while manually guided context funneling is currently the best way to achieve optimal performance.

References

[1]

NAS Parallel Benchmarks. http://www.nas.nasa.gov/Resources/Software/npb.html.

Google Scholar

[2]

Blagojevic, F., Hargrove, P., Iancu, C., and Yelick, K. Hybrid PGAS runtime support for multicore nodes. In Proc. 4th Conference on Partitioned Global Address Space Programming Model (PGAS10) (Oct. 2010).

Digital Library

Google Scholar

[3]

Chen, L., Liu, L., Tang, S., Huang, L., Jing, Z., Xu, S., Zhang, D., and Shou, B. Unified parallel C for GPU clusters: language extensions and compiler implementation. In Proc. 23rd International Conference on Languages and Compilers for Parallel Computing (LCPC'10) (Oct. 2010), pp. 151--165.

Digital Library

Google Scholar

[4]

El-Ghazawi, T., and Cantonnet, F. UPC performance and potential: A NPB experimental study. In Proc. ACM/IEEE 2002 conference on Supercomputing (SC'02) (Nov. 2002).

Digital Library

Google Scholar

[5]

Lee, S., Min, S., and Eigenmann, R. OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In Proc. 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'09) (Feb. 2009), pp. 101--110.

Digital Library

Google Scholar

[6]

NVIDIA. NVIDIA's next generation CUDA computer architecture: Fermi, 2009.

Google Scholar

[7]

Stuart, J. A., and Owens, J. D. Message passing on data-parallel architectures. In Proc. 23rd IEEE International Parallel & Distributed Processing Symposium (IPDPS'09) (May 2009).

Digital Library

Google Scholar

[8]

Wang, L., Huang, M., and El-Ghazawi, T. Exploiting concurrent kernel execution on graphic nprocessing units. In Proc. 2011 International Conference on High Performance Computing & Simulation (HPCS 2011) (July 2011), pp. 24--32.

Crossref

Google Scholar

[9]

Wang, L., Huang, M., Narayana, V. K., and El-Ghazawi, T. Scaling scientific applications on clusters of hybrid multicore/GPU nodes. In Proc. 8th ACM International Conference on Computing Frontiers (CF'11) (May 2011), pp. 6:1--6:10.

Digital Library

Google Scholar

[10]

Yang, C., Wang, F., Du, Y., Chen, J., Liu, J., Yi, H., and Lu, K. Adaptive optimization for petascale heterogeneous CPU/GPU computing. In Proc. IEEE International Conference on Cluster Computing (Cluster 2010) (Sept. 2010), pp. 19--28.

Digital Library

Google Scholar

Cited By

View all

Chaturvedi IGodala BWu YXu ZIliakis KEleftherakis PXydis SSoudris DSorensen TCampanoni SAamodt TAugust D(2024)GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00011(1-16)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00011
Gerzhoy DSun XZuzak MYeung D(2019)Nested MIMD-SIMD Parallelization for Heterogeneous MicroprocessorsACM Transactions on Architecture and Code Optimization10.1145/336830416:4(1-27)Online publication date: 17-Dec-2019
https://dl.acm.org/doi/10.1145/3368304
Sarofeen CGillett P(2016)A High Performance Parallel and Heterogeneous Approach to Narrowband BeamformingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.249403827:8(2196-2207)Online publication date: 13-Jul-2016
https://dl.acm.org/doi/10.1109/TPDS.2015.2494038
Show More Cited By

Index Terms

Towards efficient GPU sharing on multicore processors

Recommendations

Towards efficient GPU sharing on multicore processors
PMBS '11: Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems

Scalable systems employing a mix of GPUs with CPUs are becoming increasingly prevalent in high-performance computing (HPC). The presence of such accelerators introduces significant challenges and complexities to both language developers and end users. ...
Scaling scientific applications on clusters of hybrid multicore/GPU nodes
CF '11: Proceedings of the 8th ACM International Conference on Computing Frontiers

Rapid advances in the performance and programmability of graphics accelerators have made GPU computing a compelling solution for a wide variety of application domains. However, the increased complexity as a result of architectural heterogeneity and ...
Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems

With the raw computing power of graphics processing units (GPUs) being more widely available in commodity multicore systems, there is an imminent need to harness their power for important numerical libraries such as LAPACK. In this paper, we consider ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMETRICS Performance Evaluation Review

ACM SIGMETRICS Performance Evaluation Review Volume 40, Issue 2

September 2012

129 pages

ISSN:0163-5999

DOI:10.1145/2381056

Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2012

Published in SIGMETRICS Volume 40, Issue 2

Check for updates

Author Tags

Qualifiers

Column

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
314
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Chaturvedi IGodala BWu YXu ZIliakis KEleftherakis PXydis SSoudris DSorensen TCampanoni SAamodt TAugust D(2024)GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00011(1-16)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00011
Gerzhoy DSun XZuzak MYeung D(2019)Nested MIMD-SIMD Parallelization for Heterogeneous MicroprocessorsACM Transactions on Architecture and Code Optimization10.1145/336830416:4(1-27)Online publication date: 17-Dec-2019
https://dl.acm.org/doi/10.1145/3368304
Sarofeen CGillett P(2016)A High Performance Parallel and Heterogeneous Approach to Narrowband BeamformingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.249403827:8(2196-2207)Online publication date: 13-Jul-2016
https://dl.acm.org/doi/10.1109/TPDS.2015.2494038
Sengupta DGoswami ASchwan KPallavi KDamkroger TDongarra J(2014)Scheduling multi-tenant cloud workloads on accelerator-based systemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.47(513-524)Online publication date: 16-Nov-2014
https://dl.acm.org/doi/10.1109/SC.2014.47
Zhang JWang HLin HFeng W(2013)Consolidating Applications for Energy Efficiency in Heterogeneous Computing Systems2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing10.1109/HPCC.and.EUC.2013.64(399-406)Online publication date: Nov-2013
https://doi.org/10.1109/HPCC.and.EUC.2013.64
Wang LHuang MEl-Ghazawi TJarvis S(2011)Towards efficient GPU sharing on multicore processorsProceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems10.1145/2088457.2088473(23-24)Online publication date: 13-Nov-2011
https://dl.acm.org/doi/10.1145/2088457.2088473

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations