Abstract
Recently, improving the overall resource utilization through efficient scheduling of applications on graphic processing unit (GPU) clusters has been a concern. Traditional cluster-orchestration platforms providing GPUs exclusively for applications constrain high resource utilization. Co-execution of GPU applications is suggested to utilize limited resources. However, the co-execution of GPU applications without considering their diverse characteristics can lead to their unpredictable performances owing to interference resulting from contention and unbalanced usage of resources among applications. This paper proposes an interference-aware execution framework with Co-scheML for various GPU applications such as high performance computing (HPC), deep learning (DL) training, and DL inference. Various resource-usage characteristics of GPU applications are analyzed and profiled to identify various degrees of their application interference. As interference prediction is challenging owing to the complexity of GPU systems, an interference model is generated by applying defined GPU metrics to machine learning (ML) models. A Co-scheML scheduler deploys applications to minimize the interference using the predicted interference from the constructed model. Experimental results of our framework demonstrated that the resource utilization improved by 24%, the average job completion time (JCT) improved by 23%, and the makespan shortened by 22% on average, compared to baseline schedulers.
Similar content being viewed by others
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Aupy, G., Benoit, A., Goglin, B., Pottier, L., Robert, Y.: Co-scheduling HPC workloads on cache-partitioned CMP platforms. Int. J. High Perform. Comput. Appl. 33(6), 1221–1239 (2019)
Bao, Y., et al.: Deep learning-based job placement in distributed machine learning clusters. In: IEEE INFOCOM 2019—IEEE Conference on Computer Communications (2019)
Chang, C.C., Yang, S.R., et al.: A kubernetes-based monitoring platform for dynamic cloud resource provisioning. In: GLOBECOM 2017—2017 IEEE Global Communications Conference (2017)
Chen, Z., Quan, W., et al.: Deep learning research and development platform: characterizing and scheduling with GOS guarantees on GPU clusters. IEEE Trans. Parallel Distrib. Syst. 31, 34–50 (2019)
Dauwe, D., Jonardi, E., Friese, R., Pasricha, S., Maciejewski, A.A., Bader, D.A., Siegel, H.J.: A methodology for co-location aware application performance modeling in multicore computing. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 434–443. IEEE (2015)
Diab, K.M., et al.: Dynamic sharing of GPUS in cloud systems. In: IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum (2013)
Geng, X., Zhang, H., et al.: Interference-aware parallelization for deep learning workload in GPU cluster. Clust. Comput. 23, 2689–2702 (2020)
Gu, J., et al.: Gaiagpu: Sharing GPUS in container clouds. In: IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom) (2018)
Gu, J., Chowdhury, M., et al.: Tiresias: a GPU cluster manager for distributed deep learning. In: In16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19) (2019)
Hong, C.-H., et al.: FairGV: fair and fast GPU virtualization. IEEE Trans. Parallel Distrib. Syst. 28(12), 3472–3485 (2017)
InfuxDB: https://www.influxdata.com/
Jiang, Y., Shen, X., Jie, C., Tripathi, R.: Analysis and approximation of optimal co-scheduling on chip multiprocessors. In: 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 220–229. IEEE (2008)
Kim, S., Kim, Y.: Co-scheml: interference-aware container co-scheduling scheme using machine learning application profiles for GPU clusters. In: 2020 IEEE International Conference on Cluster Computing (CLUSTER), pp. 104–108. IEEE (2020)
Kim, S., Kim, Y.: Toward interference-aware GPU container co-scheduling learning from application profiles. In: 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C). IEEE, pp. 19–23 (2020)
Kubernetes: https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/ (2020)
LAMMPS-Molecular-Dynamics-Simulator: https://lammps.sandia.gov/
Liaw, R., Bhardwaj, R., et al.: Hypersched: dynamic resource reallocation for model development on a deadline. In: Proceedings of the ACM Symposium on Cloud Computing (2019)
Lo, D., Cheng, L., Govindaraju, R., Ranganathan, P., Kozyrakis, C.: Improving resource efficiency at scale with heracles. ACM Trans. Comput. Syst. (TOCS) 34(2), 1–33 (2016)
Muralidhara, S.P., Subramanian, L., Mutlu, O., Kandemir, M., Moscibroda, T.: Reducing memory interference in multicore systems via application-aware memory channel partitioning. In: 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 374–385. IEEE (2011)
NVIDIA-GPU-Container(NGC): https://ngc.nvidia.com/
NVIDIA-Multi-Process-Service: https://docs.nvidia.com/deploy/pdf/CUDA-Multi-Process-Service-Overview.pdf (2019)
NVIDIA-VGPU: https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html
Openstack: https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/add-support-for-vgpu.html
Peng, Y., Bao, Y., et al.: Optimus: an efficient dynamic resource scheduler for deep learning clusters. In: Proceedings of the Thirteenth EuroSys Conference (2018)
QMCPACK: https://qmcpack.org/
Song, S., et al.: Gaia scheduler: A kubernetes-based scheduler framework. In: IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom) (2018)
Tensorflow-CNN-benchmarks: https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_c- nn_benchmarks
Thinakaran, P., Gunasekaran, J.R., et al.: Kube-knots: resource harvesting through dynamic container orchestration in gpu-based datacenters. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER) (2019)
Ukidave, Y., et al.: Mystic: predictive scheduling for gpu based cloud servers using machine learning. In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2016)
Wen, Y., O’Boyle, M.F., Fensch, C.: MaxPair: enhance OpenCL concurrent kernel execution by weighted maximum matching. In: Proceedings of the 11th Workshop on General Purpose GPUs (2018)
Xiao, W., et al.: Gandiva: Introspective cluster scheduling for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (2018)
Xu, X., et al.: Characterization and prediction of performance interference on mediated passthrough GPUs for interference-aware scheduler. In: 11th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 19) (2019)
YARN: https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html (2018)
Acknowledgements
This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea Government (MSIT) (No. 2015M3C4A7065646, 2020R1H1A2011685, NRF-2021R1A2C1003379).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A preliminary version [15] of this article was presented at the 1st IEEE International Conference on Autonomic Computing and Self-Organizing Systems, Washington, DC, USA, August 2020.
Rights and permissions
About this article
Cite this article
Kim, S., Kim, Y. Interference-aware execution framework with Co-scheML on GPU clusters. Cluster Comput 26, 2577–2589 (2023). https://doi.org/10.1007/s10586-021-03299-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-021-03299-z