Skip to main content
Log in

Interference-aware execution framework with Co-scheML on GPU clusters

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Recently, improving the overall resource utilization through efficient scheduling of applications on graphic processing unit (GPU) clusters has been a concern. Traditional cluster-orchestration platforms providing GPUs exclusively for applications constrain high resource utilization. Co-execution of GPU applications is suggested to utilize limited resources. However, the co-execution of GPU applications without considering their diverse characteristics can lead to their unpredictable performances owing to interference resulting from contention and unbalanced usage of resources among applications. This paper proposes an interference-aware execution framework with Co-scheML for various GPU applications such as high performance computing (HPC), deep learning (DL) training, and DL inference. Various resource-usage characteristics of GPU applications are analyzed and profiled to identify various degrees of their application interference. As interference prediction is challenging owing to the complexity of GPU systems, an interference model is generated by applying defined GPU metrics to machine learning (ML) models. A Co-scheML scheduler deploys applications to minimize the interference using the predicted interference from the constructed model. Experimental results of our framework demonstrated that the resource utilization improved by 24%, the average job completion time (JCT) improved by 23%, and the makespan shortened by 22% on average, compared to baseline schedulers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Aupy, G., Benoit, A., Goglin, B., Pottier, L., Robert, Y.: Co-scheduling HPC workloads on cache-partitioned CMP platforms. Int. J. High Perform. Comput. Appl. 33(6), 1221–1239 (2019)

    Article  Google Scholar 

  2. Bao, Y., et al.: Deep learning-based job placement in distributed machine learning clusters. In: IEEE INFOCOM 2019—IEEE Conference on Computer Communications (2019)

  3. Chang, C.C., Yang, S.R., et al.: A kubernetes-based monitoring platform for dynamic cloud resource provisioning. In: GLOBECOM 2017—2017 IEEE Global Communications Conference (2017)

  4. Chen, Z., Quan, W., et al.: Deep learning research and development platform: characterizing and scheduling with GOS guarantees on GPU clusters. IEEE Trans. Parallel Distrib. Syst. 31, 34–50 (2019)

    Article  Google Scholar 

  5. Dauwe, D., Jonardi, E., Friese, R., Pasricha, S., Maciejewski, A.A., Bader, D.A., Siegel, H.J.: A methodology for co-location aware application performance modeling in multicore computing. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 434–443. IEEE (2015)

  6. Diab, K.M., et al.: Dynamic sharing of GPUS in cloud systems. In: IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum (2013)

  7. DJINN: https://github.com/LLNL/DJINN

  8. Geng, X., Zhang, H., et al.: Interference-aware parallelization for deep learning workload in GPU cluster. Clust. Comput. 23, 2689–2702 (2020)

    Article  Google Scholar 

  9. Gu, J., et al.: Gaiagpu: Sharing GPUS in container clouds. In: IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom) (2018)

  10. Gu, J., Chowdhury, M., et al.: Tiresias: a GPU cluster manager for distributed deep learning. In: In16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19) (2019)

  11. Hong, C.-H., et al.: FairGV: fair and fast GPU virtualization. IEEE Trans. Parallel Distrib. Syst. 28(12), 3472–3485 (2017)

    Article  Google Scholar 

  12. InfuxDB: https://www.influxdata.com/

  13. Jiang, Y., Shen, X., Jie, C., Tripathi, R.: Analysis and approximation of optimal co-scheduling on chip multiprocessors. In: 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 220–229. IEEE (2008)

  14. Kim, S., Kim, Y.: Co-scheml: interference-aware container co-scheduling scheme using machine learning application profiles for GPU clusters. In: 2020 IEEE International Conference on Cluster Computing (CLUSTER), pp. 104–108. IEEE (2020)

  15. Kim, S., Kim, Y.: Toward interference-aware GPU container co-scheduling learning from application profiles. In: 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C). IEEE, pp. 19–23 (2020)

  16. Kubernetes: https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/ (2020)

  17. LAMMPS-Molecular-Dynamics-Simulator: https://lammps.sandia.gov/

  18. Liaw, R., Bhardwaj, R., et al.: Hypersched: dynamic resource reallocation for model development on a deadline. In: Proceedings of the ACM Symposium on Cloud Computing (2019)

  19. Lo, D., Cheng, L., Govindaraju, R., Ranganathan, P., Kozyrakis, C.: Improving resource efficiency at scale with heracles. ACM Trans. Comput. Syst. (TOCS) 34(2), 1–33 (2016)

    Article  Google Scholar 

  20. Muralidhara, S.P., Subramanian, L., Mutlu, O., Kandemir, M., Moscibroda, T.: Reducing memory interference in multicore systems via application-aware memory channel partitioning. In: 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 374–385. IEEE (2011)

  21. NVIDIA-GPU-Container(NGC): https://ngc.nvidia.com/

  22. NVIDIA-Multi-Process-Service: https://docs.nvidia.com/deploy/pdf/CUDA-Multi-Process-Service-Overview.pdf (2019)

  23. NVIDIA-VGPU: https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html

  24. Openstack: https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/add-support-for-vgpu.html

  25. Peng, Y., Bao, Y., et al.: Optimus: an efficient dynamic resource scheduler for deep learning clusters. In: Proceedings of the Thirteenth EuroSys Conference (2018)

  26. QMCPACK: https://qmcpack.org/

  27. Song, S., et al.: Gaia scheduler: A kubernetes-based scheduler framework. In: IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom) (2018)

  28. Tensorflow-CNN-benchmarks: https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_c- nn_benchmarks

  29. Thinakaran, P., Gunasekaran, J.R., et al.: Kube-knots: resource harvesting through dynamic container orchestration in gpu-based datacenters. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER) (2019)

  30. Ukidave, Y., et al.: Mystic: predictive scheduling for gpu based cloud servers using machine learning. In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2016)

  31. Wen, Y., O’Boyle, M.F., Fensch, C.: MaxPair: enhance OpenCL concurrent kernel execution by weighted maximum matching. In: Proceedings of the 11th Workshop on General Purpose GPUs (2018)

  32. Xiao, W., et al.: Gandiva: Introspective cluster scheduling for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (2018)

  33. Xu, X., et al.: Characterization and prediction of performance interference on mediated passthrough GPUs for interference-aware scheduler. In: 11th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 19) (2019)

  34. YARN: https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html (2018)

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea Government (MSIT) (No. 2015M3C4A7065646, 2020R1H1A2011685, NRF-2021R1A2C1003379).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoonhee Kim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version [15] of this article was presented at the 1st IEEE International Conference on Autonomic Computing and Self-Organizing Systems, Washington, DC, USA, August 2020.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, S., Kim, Y. Interference-aware execution framework with Co-scheML on GPU clusters. Cluster Comput 26, 2577–2589 (2023). https://doi.org/10.1007/s10586-021-03299-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03299-z

Keywords

Navigation