KubeGPU: efficient sharing and isolation mechanisms for GPU resource management in container cloud

Shen, Wenfeng; Liu, Zhengsen; Tan, Yunjie; Luo, Zhaokai; Lei, Zhou

doi:10.1007/s11227-022-04682-2

KubeGPU: efficient sharing and isolation mechanisms for GPU resource management in container cloud

Published: 14 July 2022

Volume 79, pages 591–625, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Wenfeng Shen¹,
Zhengsen Liu ORCID: orcid.org/0000-0002-4642-754X¹,
Yunjie Tan¹,
Zhaokai Luo² &
…
Zhou Lei¹

804 Accesses
2 Citations
Explore all metrics

Abstract

With the increasing number of new containerized applications, such as high performance and deep learning applications, started to reply on GPU, efficiently supporting GPU in container cloud becomes essential. While GPU sharing has been extensively studied for VM, limited work has been done for containers. Existing works only use a single specific GPU virtualization technique to deploy containers, like GPU pass-through or API forwarding, and lack remote GPU virtualization optimization. The limitations lead to low system throughput and container performance degradation due to the dynamic and heterogeneous nature of container resource requirement and GPU virtualization technique, and the problem of communication overhead and resource racing. Therefore, we designed and implemented KubeGPU, which extends Kubernetes to enable GPU sharing with adaptive share strategy. Adaptive sharing strategy gives KubeGPU the ability to make a dynamic choice of GPU virtualization to deploy containers according to available GPU resources and containers’ configuration parameters such as GPU resource requirement in order to achieve a good container performance and system throughput. Besides that, network-aware scheduling approach and fine-grained allocation of remote GPU resources are proposed to optimize remote GPU virtualization. Finally, using representative real-world workloads for HPC and deep learning, we demonstrate the superiority of KubeGPU compared to other existing works, and the effectiveness of KubeGPU in minimizing communication overhead and eliminating remote GPU resource racing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Design of an adaptive GPU sharing and scheduling scheme in container-based cluster

Article 31 July 2019

CUDA Virtualization and Remoting for GPGPU Based Acceleration Offloading at the Edge

Efficient Container Management Scheme Based on Deep Learning Model

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Notes

The details of network mode and network device are described in Sect. 2.3.
Transparency means that KubeGPU can be seamlessly integrated with other Kubernetes components to optimize remote GPU virtualization. Therefore, KubeGPU allocates containers with available and better network performing network modes instead of replacing the installed plugin.
An idle GPU Device Resource means that the GPU has no container scheduled on it.
Kubeshare is incompatible with GROMACS; therefore, it introduces more performance overhead than GaiaGPU. Similarly, GaiaGPU is incompatible with Pytorch MNIST. In this case, the incompatibility means that the resource allocation strategy of GPU management framework cannot meet the resource demand of container application.

References

Al Jawarneh IM, Bellavista P, Bosi F, Foschini L, Martuscelli G, Montanari R, Palopoli A (2019) Container orchestration engines: a thorough functional and performance comparison. In: ICC 2019-2019 IEEE International Conference on Communications (ICC), pp 1–6. IEEE
Hindman B, Konwinski A, Zaharia M, Ghodsi A, Joseph AD, Katz RH, Shenker S, Stoica I (2011) Mesos: a platform for fine-grained resource sharing in the data center. In: NSDI, 11:22–22
Red Hat OpenShift makes container orchestration easier (2021) https://www.redhat.com/en/technologies/cloud-computing/openshift. Accessed 29 Mar
Swarm mode overview (2021) https://www.redhat.com/en/technologies/cloud-computing/openshift. Accessed 29 Mar
Kubernetes (2021) https://github.com/kubernetes/kubernetes. Accessed 18 Nov
Altintas I, Marcus K, Nealey I, Sellars SL, Graham J, Mishin D, Polizzi J, Crawl D, DeFanti T, Smarr L (2019) Workflow-driven distributed machine learning in CHASE-CI: a cognitive hardware and software ecosystem community infrastructure. In: 2019 IEEE international parallel and distributed processing symposium workshops (IPDPSW), pp 865–873. IEEE
Managing Resources for Containers (2021) https://kubernetes.io/docs/concepts/configuration/manage-resources-containers. Accessed 22 Oct
Yoon DH, Han Y (2020) Parallel power flow computation trends and applications: a review focusing on gpu. Energies 13(9):2147
Article Google Scholar
Hong CH, Spence I, Nikolopoulos DS (2017) GPU virtualization and scheduling methods: a comprehensive survey. ACM Comput Surv (CSUR) 50(3):1–37
Article Google Scholar
Naranjo DM, Risco S, de Alfonso C, Pérez A, Blanquer I, Moltó G (2020) Accelerated serverless computing based on GPU virtualization. J Parallel Distrib Comput 139:32–42
Article Google Scholar
Silla F, Prades J, Iserte S, Reano C (2016) Remote GPU virtualization: Is it useful?. In: 2016 2nd IEEE international workshop on high-performance interconnection networks in the exascale and big-data era (HiPINEB), pp 41–48. IEEE
Thinakaran P, Gunasekaran JR, Sharma B, Kandemir MT, Das CR (2019) Kube-knots: resource harvesting through dynamic container orchestration in gpu-based datacenters. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 1–13
Google Scholar
Lu Q, Yao J, Guan H, Gao P (2019) gQoS: a QoS-oriented GPU virtualization with adaptive capacity sharing. IEEE Trans Parallel Distrib Syst 31(4):843–855
Article Google Scholar
Gonzalez NM, Elengikal T (2021) Transparent i/o-aware gpu virtualization for efficient resource consolidation. In: 2021 IEEE international parallel and distributed processing symposium (IPDPS), pp 131–140. IEEE
Tang D, Li L, Ma J, Liu X, Qi Z, Guan H (2021) gremote: cloud rendering on gpu resource pool based on api-forwarding. J Syst Archit 116:102055
Article Google Scholar
Song S, Deng L, Gong J, Luo H (2018) Gaia scheduler: a kubernetes-based scheduler framework. In: 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom). IEEE, pp 252–259
Chapter Google Scholar
cGPU overview (2021) https://www.alibabacloud.com/help/en/container-service-for-kubernetes/latest/cgpu-overview. Accessed 22 Sep
k8s-device-plugin (2021) https://kubernetes.io/docs/concepts/configuration/manage-resources-containers. Accessed 13 Nov
Reaño C, Silla F, Shainer G, Schultz S (2015) Local and Remote gpus Perform Similar with edr 100g Infiniband. In: Proceedings of the Industrial Track of the 16th International Middleware Conference, pp 1–7
Reaño C, Silla F (2016) Reducing the performance gap of remote gpu virtualization with infiniband connect-ib. In: 2016 IEEE symposium on computers and communication (ISCC), pp 920–925. IEEE
Reaño C, Silla F (2017) A Comparative Performance Analysis of Remote gpu Virtualization Over Three Generations of gpus. In: 2017 46th International Conference on Parallel Processing Workshops (ICPPW), pp. 121–128. IEEE
Qi S, Kulkarni SG, Ramakrishnan K (2020) Understanding container network interface plugins: design considerations and performance. In: 2020 IEEE international symposium on local and metropolitan area networks (LANMAN, pp 1–6. IEEE
Xu C, Rajamani K, Felter W (2018) Nbwguard: Realizing Network qos for Kubernetes. In: Proceedings of the 19th International Middleware Conference Industry, pp 32–38
Deepomatic (2021) https://github.com/Deepomatic/shared-gpu-nvidia-k8s-device-plugin. Accessed 12 Oct
gpushare-scheduler-extender (2021) https://github.com/AliyunContainerService/gpushare-scheduler-extender Accessed 4 Nov
Kang D, Jun TJ, Kim D, Kim J, Kim D (2017) Convgpu: Gpu Management Middleware in Container Based Virtualized Environment. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 301–309
Chapter Google Scholar
Yeh TA, Chen HH, Chou J (2020) KubeShare: a framework to manage GPUs as first-class and shared resources in container cloud. In: Proceedings of the 29th international symposium on high-performance parallel and distributed computing, pp 173–184
Chiang MC, Chou J (2021) DynamoML: dynamic resource management operators for machine learning Workloads. In: CLOSER, pp 122–132
Satzke K, Akkus IE, Chen R, Rimac I, Stein M, Beck A, Aditya P, Vanga M, Hilt V (2020) Efficient GPU sharing for serverless workflows. In: Proceedings of the 1st workshop on high performance serverless computing, pp 17–24
Vinoski S (2002) Chain of responsibility. IEEE Internet Comput 6(6):80–83
Article Google Scholar
Single Root I/O Virtualization and Sharing Specification Revision 1.1. https://members.pcisig.com/wg/PCI-SIG/document/download/8238. Accessed 20 Jan, 2010
Kang J, Lim J, Yu H (2020) Partial migration technique for GPGPU tasks to Prevent GPU Memory Starvation in RPC-based GPU Virtualization. Softw: Pract Exp 50(6):948–972
Google Scholar
Xiao S, Balaji P, Zhu Q, Thakur R, Coghlan S, Lin H, Wen G, Hong J, Feng W-c (2012) Vocl: an optimized environment for transparent virtualization of graphics processing units. In: 2012 innovative parallel computing (InPar), pp 1–12. IEEE
Abraham MJ, Murtola T, Schulz R, Páll S, Smith JC, Hess B, Lindahl E (2015) GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1:19–25
Article Google Scholar
Paszke A, Gross S, Massa F et al (2019) PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32 (NeurIPS 2019), pp 8024–8035
Vouzis PD, Sahinidis NV (2011) GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2):182–188
Article Google Scholar
Anderson JA, Glaser J, Glotzer SC (2020) Hoomd-blue: a python package for high-performance molecular dynamics and hard particle monte carlo simulations. Comput Mater Sci 173:109363
Article Google Scholar
Liu Z, Chen C, Li J, Cheng Y, Kou Y, Zhang D (2022) Kubfbs: a fine-grained and balance-aware scheduling system for deep learning tasks based on kubernetes. Concurr Comput: Pract Exp 34(11):6836
Article Google Scholar
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: 12th \(\{\)USENIX\(\}\) symposium on operating systems design and implementation (\(\{\)OSDI\(\}\) 16), pp 265–283
Reaño C, Prades J, Silla F (2018) Exploring the use of remote gpu virtualization in low-power systems for bioinformatics applications. In: Proceedings of the 47th International Conference on Parallel Processing Companion, pp 1–8

Download references

Acknowledgements

This work was supported by Shanghai Engineering Research Center of Intelligent Computing System, Shanghai University(Grant number 19DZ2252600), The National Key Research and Development Program of China (No. 2018YFB0704400), Key Program of Science and Technology of Yunnan Province(No. 202002AB080001-2),the Grand Joint Projects of Shanghai University(Grant number 202124) and GHfund B (Grand No. 20210702).

Author information

Authors and Affiliations

School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, Shanghai, China
Wenfeng Shen, Zhengsen Liu, Yunjie Tan & Zhou Lei
Platform and Content Group, Tencent, Shenzen, 518058, Guangdong, China
Zhaokai Luo

Authors

Wenfeng Shen
View author publications
You can also search for this author in PubMed Google Scholar
Zhengsen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yunjie Tan
View author publications
You can also search for this author in PubMed Google Scholar
Zhaokai Luo
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Lei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhengsen Liu.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, W., Liu, Z., Tan, Y. et al. KubeGPU: efficient sharing and isolation mechanisms for GPU resource management in container cloud. J Supercomput 79, 591–625 (2023). https://doi.org/10.1007/s11227-022-04682-2

Download citation

Accepted: 21 June 2022
Published: 14 July 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11227-022-04682-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

KubeGPU: efficient sharing and isolation mechanisms for GPU resource management in container cloud

Abstract

Access this article

Similar content being viewed by others

Design of an adaptive GPU sharing and scheduling scheme in container-based cluster

CUDA Virtualization and Remoting for GPGPU Based Acceleration Offloading at the Edge

Efficient Container Management Scheme Based on Deep Learning Model

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

KubeGPU: efficient sharing and isolation mechanisms for GPU resource management in container cloud

Abstract

Access this article

Similar content being viewed by others

Design of an adaptive GPU sharing and scheduling scheme in container-based cluster

CUDA Virtualization and Remoting for GPGPU Based Acceleration Offloading at the Edge

Efficient Container Management Scheme Based on Deep Learning Model

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation