skip to main content
column

Towards efficient GPU sharing on multicore processors

Published: 08 October 2012 Publication History

Abstract

Scalable systems employing a mix of GPUs with CPUs are becoming increasingly prevalent in high-performance computing. The presence of such accelerators introduces significant challenges and complexities to both language developers and end users. This paper provides a close study of efficient coordination mechanisms to handle parallel requests from multiple hosts of control to a GPU under hybrid programming. Using a set of microbenchmarks and applications on a GPU cluster, we show that thread and process-based context hosting have different tradeoffs. Experimental results on application benchmarks suggest that both thread-based context funneling and process-based context switching natively perform similarly on the latest Fermi GPUs, while manually guided context funneling is currently the best way to achieve optimal performance.

References

[1]
NAS Parallel Benchmarks. http://www.nas.nasa.gov/Resources/Software/npb.html.
[2]
Blagojevic, F., Hargrove, P., Iancu, C., and Yelick, K. Hybrid PGAS runtime support for multicore nodes. In Proc. 4th Conference on Partitioned Global Address Space Programming Model (PGAS10) (Oct. 2010).
[3]
Chen, L., Liu, L., Tang, S., Huang, L., Jing, Z., Xu, S., Zhang, D., and Shou, B. Unified parallel C for GPU clusters: language extensions and compiler implementation. In Proc. 23rd International Conference on Languages and Compilers for Parallel Computing (LCPC'10) (Oct. 2010), pp. 151--165.
[4]
El-Ghazawi, T., and Cantonnet, F. UPC performance and potential: A NPB experimental study. In Proc. ACM/IEEE 2002 conference on Supercomputing (SC'02) (Nov. 2002).
[5]
Lee, S., Min, S., and Eigenmann, R. OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In Proc. 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'09) (Feb. 2009), pp. 101--110.
[6]
NVIDIA. NVIDIA's next generation CUDA computer architecture: Fermi, 2009.
[7]
Stuart, J. A., and Owens, J. D. Message passing on data-parallel architectures. In Proc. 23rd IEEE International Parallel & Distributed Processing Symposium (IPDPS'09) (May 2009).
[8]
Wang, L., Huang, M., and El-Ghazawi, T. Exploiting concurrent kernel execution on graphic nprocessing units. In Proc. 2011 International Conference on High Performance Computing & Simulation (HPCS 2011) (July 2011), pp. 24--32.
[9]
Wang, L., Huang, M., Narayana, V. K., and El-Ghazawi, T. Scaling scientific applications on clusters of hybrid multicore/GPU nodes. In Proc. 8th ACM International Conference on Computing Frontiers (CF'11) (May 2011), pp. 6:1--6:10.
[10]
Yang, C., Wang, F., Du, Y., Chen, J., Liu, J., Yi, H., and Lu, K. Adaptive optimization for petascale heterogeneous CPU/GPU computing. In Proc. IEEE International Conference on Cluster Computing (Cluster 2010) (Sept. 2010), pp. 19--28.

Cited By

View all
  • (2024)GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00011(1-16)Online publication date: 29-Jun-2024
  • (2019)Nested MIMD-SIMD Parallelization for Heterogeneous MicroprocessorsACM Transactions on Architecture and Code Optimization10.1145/336830416:4(1-27)Online publication date: 17-Dec-2019
  • (2016)A High Performance Parallel and Heterogeneous Approach to Narrowband BeamformingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.249403827:8(2196-2207)Online publication date: 13-Jul-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMETRICS Performance Evaluation Review
ACM SIGMETRICS Performance Evaluation Review  Volume 40, Issue 2
September 2012
129 pages
ISSN:0163-5999
DOI:10.1145/2381056
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2012
Published in SIGMETRICS Volume 40, Issue 2

Check for updates

Author Tags

  1. GPU
  2. UPC
  3. hybrid parallel programming
  4. multicore

Qualifiers

  • Column

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00011(1-16)Online publication date: 29-Jun-2024
  • (2019)Nested MIMD-SIMD Parallelization for Heterogeneous MicroprocessorsACM Transactions on Architecture and Code Optimization10.1145/336830416:4(1-27)Online publication date: 17-Dec-2019
  • (2016)A High Performance Parallel and Heterogeneous Approach to Narrowband BeamformingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.249403827:8(2196-2207)Online publication date: 13-Jul-2016
  • (2014)Scheduling multi-tenant cloud workloads on accelerator-based systemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.47(513-524)Online publication date: 16-Nov-2014
  • (2013)Consolidating Applications for Energy Efficiency in Heterogeneous Computing Systems2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing10.1109/HPCC.and.EUC.2013.64(399-406)Online publication date: Nov-2013
  • (2011)Towards efficient GPU sharing on multicore processorsProceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems10.1145/2088457.2088473(23-24)Online publication date: 13-Nov-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media