Interference-aware execution framework with Co-scheML on GPU clusters

Kim, Sejin; Kim, Yoonhee

doi:10.1007/s10586-021-03299-z

Interference-aware execution framework with Co-scheML on GPU clusters

Published: 18 May 2021

Volume 26, pages 2577–2589, (2023)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Sejin Kim¹^na1 &
Yoonhee Kim¹^na1

301 Accesses
Explore all metrics

Abstract

Recently, improving the overall resource utilization through efficient scheduling of applications on graphic processing unit (GPU) clusters has been a concern. Traditional cluster-orchestration platforms providing GPUs exclusively for applications constrain high resource utilization. Co-execution of GPU applications is suggested to utilize limited resources. However, the co-execution of GPU applications without considering their diverse characteristics can lead to their unpredictable performances owing to interference resulting from contention and unbalanced usage of resources among applications. This paper proposes an interference-aware execution framework with Co-scheML for various GPU applications such as high performance computing (HPC), deep learning (DL) training, and DL inference. Various resource-usage characteristics of GPU applications are analyzed and profiled to identify various degrees of their application interference. As interference prediction is challenging owing to the complexity of GPU systems, an interference model is generated by applying defined GPU metrics to machine learning (ML) models. A Co-scheML scheduler deploys applications to minimize the interference using the predicted interference from the constructed model. Experimental results of our framework demonstrated that the resource utilization improved by 24%, the average job completion time (JCT) improved by 23%, and the makespan shortened by 22% on average, compared to baseline schedulers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Article Open access 19 January 2019

A Hybrid Machine Learning Model for Code Optimization

Article 22 September 2023

From distributed machine learning to federated learning: a survey

Article 22 March 2022

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Aupy, G., Benoit, A., Goglin, B., Pottier, L., Robert, Y.: Co-scheduling HPC workloads on cache-partitioned CMP platforms. Int. J. High Perform. Comput. Appl. 33(6), 1221–1239 (2019)
Article Google Scholar
Bao, Y., et al.: Deep learning-based job placement in distributed machine learning clusters. In: IEEE INFOCOM 2019—IEEE Conference on Computer Communications (2019)
Chang, C.C., Yang, S.R., et al.: A kubernetes-based monitoring platform for dynamic cloud resource provisioning. In: GLOBECOM 2017—2017 IEEE Global Communications Conference (2017)
Chen, Z., Quan, W., et al.: Deep learning research and development platform: characterizing and scheduling with GOS guarantees on GPU clusters. IEEE Trans. Parallel Distrib. Syst. 31, 34–50 (2019)
Article Google Scholar
Dauwe, D., Jonardi, E., Friese, R., Pasricha, S., Maciejewski, A.A., Bader, D.A., Siegel, H.J.: A methodology for co-location aware application performance modeling in multicore computing. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 434–443. IEEE (2015)
Diab, K.M., et al.: Dynamic sharing of GPUS in cloud systems. In: IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum (2013)
DJINN: https://github.com/LLNL/DJINN
Geng, X., Zhang, H., et al.: Interference-aware parallelization for deep learning workload in GPU cluster. Clust. Comput. 23, 2689–2702 (2020)
Article Google Scholar
Gu, J., et al.: Gaiagpu: Sharing GPUS in container clouds. In: IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom) (2018)
Gu, J., Chowdhury, M., et al.: Tiresias: a GPU cluster manager for distributed deep learning. In: In16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19) (2019)
Hong, C.-H., et al.: FairGV: fair and fast GPU virtualization. IEEE Trans. Parallel Distrib. Syst. 28(12), 3472–3485 (2017)
Article Google Scholar
InfuxDB: https://www.influxdata.com/
Jiang, Y., Shen, X., Jie, C., Tripathi, R.: Analysis and approximation of optimal co-scheduling on chip multiprocessors. In: 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 220–229. IEEE (2008)
Kim, S., Kim, Y.: Co-scheml: interference-aware container co-scheduling scheme using machine learning application profiles for GPU clusters. In: 2020 IEEE International Conference on Cluster Computing (CLUSTER), pp. 104–108. IEEE (2020)
Kim, S., Kim, Y.: Toward interference-aware GPU container co-scheduling learning from application profiles. In: 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C). IEEE, pp. 19–23 (2020)
Kubernetes: https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/ (2020)
LAMMPS-Molecular-Dynamics-Simulator: https://lammps.sandia.gov/
Liaw, R., Bhardwaj, R., et al.: Hypersched: dynamic resource reallocation for model development on a deadline. In: Proceedings of the ACM Symposium on Cloud Computing (2019)
Lo, D., Cheng, L., Govindaraju, R., Ranganathan, P., Kozyrakis, C.: Improving resource efficiency at scale with heracles. ACM Trans. Comput. Syst. (TOCS) 34(2), 1–33 (2016)
Article Google Scholar
Muralidhara, S.P., Subramanian, L., Mutlu, O., Kandemir, M., Moscibroda, T.: Reducing memory interference in multicore systems via application-aware memory channel partitioning. In: 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 374–385. IEEE (2011)
NVIDIA-GPU-Container(NGC): https://ngc.nvidia.com/
NVIDIA-Multi-Process-Service: https://docs.nvidia.com/deploy/pdf/CUDA-Multi-Process-Service-Overview.pdf (2019)
NVIDIA-VGPU: https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html
Openstack: https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/add-support-for-vgpu.html
Peng, Y., Bao, Y., et al.: Optimus: an efficient dynamic resource scheduler for deep learning clusters. In: Proceedings of the Thirteenth EuroSys Conference (2018)
QMCPACK: https://qmcpack.org/
Song, S., et al.: Gaia scheduler: A kubernetes-based scheduler framework. In: IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom) (2018)
Tensorflow-CNN-benchmarks: https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_c- nn_benchmarks
Thinakaran, P., Gunasekaran, J.R., et al.: Kube-knots: resource harvesting through dynamic container orchestration in gpu-based datacenters. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER) (2019)
Ukidave, Y., et al.: Mystic: predictive scheduling for gpu based cloud servers using machine learning. In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2016)
Wen, Y., O’Boyle, M.F., Fensch, C.: MaxPair: enhance OpenCL concurrent kernel execution by weighted maximum matching. In: Proceedings of the 11th Workshop on General Purpose GPUs (2018)
Xiao, W., et al.: Gandiva: Introspective cluster scheduling for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (2018)
Xu, X., et al.: Characterization and prediction of performance interference on mediated passthrough GPUs for interference-aware scheduler. In: 11th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 19) (2019)
YARN: https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html (2018)

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea Government (MSIT) (No. 2015M3C4A7065646, 2020R1H1A2011685, NRF-2021R1A2C1003379).

Author information

Both authors have contributed equally to the work.

Authors and Affiliations

Department of Computer Science, Sookmyung Women’s University, Seoul, South Korea
Sejin Kim & Yoonhee Kim

Authors

Sejin Kim
View author publications
You can also search for this author in PubMed Google Scholar
Yoonhee Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yoonhee Kim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version [15] of this article was presented at the 1st IEEE International Conference on Autonomic Computing and Self-Organizing Systems, Washington, DC, USA, August 2020.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, S., Kim, Y. Interference-aware execution framework with Co-scheML on GPU clusters. Cluster Comput 26, 2577–2589 (2023). https://doi.org/10.1007/s10586-021-03299-z

Download citation

Received: 04 January 2021
Revised: 07 March 2021
Accepted: 03 May 2021
Published: 18 May 2021
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10586-021-03299-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interference-aware execution framework with Co-scheML on GPU clusters

Abstract

Access this article

Similar content being viewed by others

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

A Hybrid Machine Learning Model for Code Optimization

From distributed machine learning to federated learning: a survey

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Interference-aware execution framework with Co-scheML on GPU clusters

Abstract

Access this article

Similar content being viewed by others

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

A Hybrid Machine Learning Model for Code Optimization

From distributed machine learning to federated learning: a survey

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation