skip to main content
10.1145/1996130.1996160acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework

Published: 08 June 2011 Publication History

Abstract

Driven by the emergence of GPUs as a major player in high performance computing and the rapidly growing popularity of cloud environments, GPU instances are now being offered by cloud providers. The use of GPUs in a cloud environment, however, is still at initial stages, and the challenge of making GPU a true shared resource in the cloud has not yet been addressed.
This paper presents a framework to enable applications executing within virtual machines to transparently share one or more GPUs. Our contributions are twofold: we extend an open source GPU virtualization software to include efficient GPU sharing, and we propose solutions to the conceptual problem of GPU kernel consolidation. In particular, we introduce a method for computing the affinity score between two or more kernels, which provides an indication of potential performance improvements upon kernel consolidation. In addition, we explore molding as a means to achieve efficient GPU sharing also in the case of kernels with high or conflicting resource requirements. We use these concepts to develop an algorithm to efficiently map a set of kernels on a pair of GPUs. We extensively evaluate our framework using eight popular GPU kernels and two Fermi GPUs. We find that even when contention is high our consolidation algorithm is effective in improving the throughput, and that the runtime overhead of our framework is low.

References

[1]
S. Baghsorkhi, M. Lathara, and W. mei Hwu. CUDA-lite: Reducing GPU Programming Complexity. In LCPC 2008, 2008.
[2]
M. Becchi\textit, et al. Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory. In SPAA '10, pages 82--91, New York, NY, USA, 2010. ACM.
[3]
W. Cirne and F. Berman. Using moldability to improve the performance of supercomputer jobs. JPDC, 62(10):1571--1601, 2002.
[4]
E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good. The cost of doing science on the cloud: the montage example. In SC'08, pages 1--12, Piscataway, NJ, USA, 2008. IEEE Press.
[5]
Y. Diao, N. Gandhi, J. Hellerstein, S. Parekh, and D. Tilbury. Mimo control of an apache web server: Modeling and controller design. In ACC02, pages 4922--4927, May 2002.
[6]
J. Duato, A. J. Peña, F. Silla, R. Mayo, and E. S. Quintana-Ortí.rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. In HPCS, pages 224--231, 2010.
[7]
D. G. Feitelson, L. Rudolph, and U. Schwiegelshohn. Parallel job scheduling - a status report. In JSSPP, pages 1--16, 2004.
[8]
G. Giunta, R. Montella, G. Agrillo, and G. Coviello. A GPGPU transparent virtualization component for high performance computing clouds. In Euro-Par 2010 - Parallel Processing, volume 6271 of Lecture Notes in Computer Science, chapter 37, pages 379--391--391. Springer Berlin / Heidelberg, Berlin, Heidelberg, 2010.
[9]
M. Guevara, C. Gregg, K. Hazelwood, and K. Skadron. Enabling task parallelism in the CUDA scheduler. In PMEA 2009, pages 69--76, Raleigh, NC, USA, 2009. ACM.
[10]
V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia, V. Talwar, and P. Ranganathan. GViM: GPU-accelerated virtual machines. In HPCVirt '09, pages 17--24, New York, NY, USA, 2009. ACM.
[11]
J. Heo, X. Zhu, P. Padala, and Z. Wang. Memory overbooking and dynamic control of xen virtual machines in consolidated environments. In IM09, pages 630--637, June 2009.
[12]
A. HORI, H. TEZUKA, and Y. ISHIKAWA. Highly efficient gang scheduling implementation. SC Conference, page 43, 1998.
[13]
D. Kondo, B. Javadi, P. Malecot, F. Cappello, and D. P. Anderson. Cost-benefit analysis of cloud computing versus desktop grids. In IPDPS '09, pages 1--12, Washington, DC, USA, 2009. IEEE Computer Society.
[14]
S. Lee and R. Eigenmann. OpenMPC: Extended OpenMP Programming and Tuning for GPUs. In SC, Nov 2010.
[15]
D. Li, S. Byna, and S. Chakradhar. Energy aware workload consolidation on GPU. Technical Report TR-11-01, Virginia Tech. Computer Science Dept., 2010.
[16]
H. Lim, S. Babu, J. Chase, and S. Parekh. Automated control in cloud computing: Challenges and opportunities. In ACDC09, pages 13--18, June 2009.
[17]
J. Li, et al. escience in the cloud: A modis satellite data reprojection and reduction pipeline in the windows azure platform. In IPDPS '10, Washington, DC, USA, 2010. IEEE Computer Society.
[18]
C.-K. Luk, S. Hong, and H. Kim. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Micro '09, pages 45--55, New York, NY, USA, 2009. ACM.
[19]
P. Marshall, K. Keahey, and T. Freeman. Elastic Site: Using Clouds to Elastically Extend Site Resources. In CCGRID, May 2010.
[20]
P.Padala, et al. Automated control of multiple virtualized resources. In Eurosys09, pages 13--26, 2009.
[21]
V. T. Ravi, W. Ma, D. Chiu, and G. Agrawal. Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In ICS '10, pages 137--146, New York, NY, USA, 2010. ACM.
[22]
G. Sabin, V. Sahasrabudhe, and P. Sadayappan. Assessment and enhancement of meta-schedulers for multi-site job sharing. In HPDC, pages 144--153, 2005.
[23]
L. Shi, H. Chen, and J. Sun. vcuda: Gpu accelerated high performance computing in virtual machines. IPDPS '09, 0:1--11, 2009.
[24]
E. Shmueli and D. G. Feitelson. On simulation and design of parallel-systems schedulers: Are we doing the right thing? TPDS, 20(7):983--996, 2009.
[25]
S.M.Park and M.Humphrey. Feedback-controlled resource sharing for predictable escience. In HPCN '08, Nov. 2008.
[26]
S. Srinivasan, S. Krishnamoorthy, and P. Sadayappan. A robust scheduling strategy for moldable scheduling of parallel jobs. IEEE Cluster, 0:92, 2003.
[27]
S. Srinivasan, V. Subramani, R. Kettimuthu, P. Holenarsipur, and P. Sadayappan. Effective selection of partition sizes for moldable scheduling of parallel jobs. In HiPC, pages 174--183, 2002.
[28]
D. Talby and D. G. Feitelson. Improving and stabilizing parallel computer performance using adaptive backfilling. In IPDPS, 2005.
[29]
D. Tarditi, S. Puri, and J. Oglesby. Accelerator: Using Data Parallelism to Program GPUs for General-purpose Uses. In ASPLOS '06, pages 325--335, New York, NY, USA, 2006. ACM.
[30]
G. Teodoro, R. S. Oliveira, O. Sertel, M. N. Gurcan, W. M. Jr., Ü. Çatalyürek, and R. Ferreira. Coordinating the use of GPU and CPU for improving performance of compute intensive applications. In CLUSTER, pages 1--10, 2009.
[31]
D. Tsafrir, Y. Etsion, and D. G. Feitelson. Backfilling using system-generated predictions rather than user runtime estimates. TPDS, 18(6):789--803, 2007.
[32]
C. Vecchiola, S. Pandey, and R. Buyya. High-performance cloud computing: A view of scientific applications. ISPAN, 0:4--16, 2009.\endthebibliography

Cited By

View all
  • (2024)EnergAt: Fine-Grained Energy Attribution for Multi-TenancyACM SIGEnergy Energy Informatics Review10.1145/3698365.36983694:3(18-25)Online publication date: 1-Jul-2024
  • (2024)SMSS: Stateful Model Serving in Metaverse With Serverless Computing and GPU SharingIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.334540142:3(799-811)Online publication date: Mar-2024
  • (2023)Understanding the Topics and Challenges of GPU Programming by Classifying and Analyzing Stack Overflow PostsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616365(1444-1456)Online publication date: 30-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '11: Proceedings of the 20th international symposium on High performance distributed computing
June 2011
296 pages
ISBN:9781450305525
DOI:10.1145/1996130
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud computing
  2. cuda
  3. gpu
  4. virtualization

Qualifiers

  • Research-article

Conference

HPDC '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)63
  • Downloads (Last 6 weeks)3
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)EnergAt: Fine-Grained Energy Attribution for Multi-TenancyACM SIGEnergy Energy Informatics Review10.1145/3698365.36983694:3(18-25)Online publication date: 1-Jul-2024
  • (2024)SMSS: Stateful Model Serving in Metaverse With Serverless Computing and GPU SharingIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.334540142:3(799-811)Online publication date: Mar-2024
  • (2023)Understanding the Topics and Challenges of GPU Programming by Classifying and Analyzing Stack Overflow PostsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616365(1444-1456)Online publication date: 30-Nov-2023
  • (2023)Disaggregated GPU Acceleration for Serverless ApplicationsACM SIGOPS Operating Systems Review10.1145/3606557.360656057:1(10-20)Online publication date: 28-Jun-2023
  • (2023)EnergAt: Fine-Grained Energy Attribution for Multi-TenancyProceedings of the 2nd Workshop on Sustainable Computer Systems10.1145/3604930.3605716(1-8)Online publication date: 9-Jul-2023
  • (2023)Gemini: Enabling Multi-Tenant GPU Sharing Based on Kernel Burst EstimationIEEE Transactions on Cloud Computing10.1109/TCC.2021.311920511:1(854-867)Online publication date: 1-Jan-2023
  • (2022)GPUPoolProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569650(317-332)Online publication date: 8-Oct-2022
  • (2022)TCUDA: A QoS-based GPU Sharing Framework for Autonomous Navigation Systems2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD55451.2022.00011(1-10)Online publication date: Nov-2022
  • (2022)DGSF: Disaggregated GPUs for Serverless Functions2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00077(739-750)Online publication date: May-2022
  • (2022)A Provenance-based Execution Strategy for Variant GPU-accelerated Scientific Workflows in CloudsJournal of Grid Computing10.1007/s10723-022-09625-y20:4Online publication date: 1-Dec-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media