Interference-aware parallelization for deep learning workload in GPU cluster

Geng, Xin; Zhang, Haitao; Zhao, Zhengyang; Ma, Huadong

doi:10.1007/s10586-019-03037-6

Interference-aware parallelization for deep learning workload in GPU cluster

Published: 02 January 2020

Volume 23, pages 2689–2702, (2020)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Xin Geng¹,
Haitao Zhang ORCID: orcid.org/0000-0002-9131-3517¹,
Zhengyang Zhao¹ &
…
Huadong Ma¹

748 Accesses
17 Citations
Explore all metrics

Abstract

With the widespread use of GPUs for performing deep learning applications, the issue of efficient execution of multiple deep learning jobs in a GPU cluster has attracted great attention. It becomes more difficult to achieve efficient workloads parallelization since modern GPUs support concurrent execution of multiple jobs. However, traditional coarse-grained scheduling methods without taking into account interference caused by resource contention among co-executing jobs and characteristics of deep learning jobs can lead to unbalanced use of computing resource and further cause the degradation of jobs performance in the GPU cluster. In this paper, we propose a two-stage workload parallelization approach for deep learning training workloads. We firstly propose two interference-aware prediction models including the Interference-Aware Similarity Prediction (IASP) model based on deep collaborative filtering and the Interference-Aware Performance Prediction (IAPP) model based on deep neural network. Our parallelization approach includes both the cluster-level workload parallelization strategy and the node-level workload parallelization strategy. Specifically, the Cluster-Level Workload Parallelization (CLWP) strategy assigns deep learning jobs to appropriate worker node according to the proposed IASP model, and the Node-Level Workload Parallelization (NLWP) strategy places deep learning tasks to appropriate GPUs according to the proposed IAPP model and the communication costs among tasks. We evaluate our deep learning workload parallelization strategy on a prototype platform with other widely used methods. The experimental results show that the proposed strategy can averagely improve the GPU utilization by 18% and reduce the job completion time by around 22%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic Resource Management for Machine Learning Pipeline Workloads

Article 30 August 2023

Min-Chi Chiang, Lu-Wen Zhang, … Jerry Chou

Interference-aware execution framework with Co-scheML on GPU clusters

Article 18 May 2021

Sejin Kim & Yoonhee Kim

Mixtran: an efficient and fair scheduler for mixed deep learning workloads in heterogeneous GPU environments

Article 12 August 2023

Xiao Zhang

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)
He, K.M., Zhang, X.Y., Ren, S.Q., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A.R., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649 (2013)
Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., et al.: Deep speech: scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014)
Boag, S., Dube, P., Herta, B., Hummer, W., Ishakian, V., et al.: Scalable multi-framework multi-tenant lifecycle management of deep learning training jobs. In: Workshop on ML Systems, NIPS (2017)
Jeon, M., Venkataraman, S., Qian, J.J., Phanishayee, A., Xiao, W.C., et al.: Multi-tenant GPU clusters for deep learning workloads: analysis and implications. Technical report, MSR-TR-2018 (2018)
Bernstein, D.: Containers and cloud: from LXC to Docker to Kubernetes. IEEE Cloud Comput. 3, 81–84 (2014)
Article Google Scholar
Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., et al.: Apache Hadoop Yarn: yet another resource negotiator. In: Proceedings of the 4th ACM Symposium on Cloud Computing (SOCC) (2013)
Amaral, M., Polo, J., Carrera, D., Seelam, S., Steinder, M.: Topology-aware GPU scheduling for learning workloads in cloud environments. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2017)
Xiao, W.C., Bhardwaj, R., Ramjee, R., Sivathanu, M., Kwatra, N., et al.: Gandiva: introspective cluster scheduling for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 595–610 (2018)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016)
Article Google Scholar
Ng, J.Y.H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., et al.: Beyond short snippets: deep networks for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4694–4702 (2015)
Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Eleventh Annual Conference of the International Speech Communication Association (2010)
Jia, X., Song, S., Wei, H., Wang, Y., Chu, X.: Highly scalable deep learning training system with mixed-precision: training imagenet in four minutes (2018)
Koren, T., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 8, 30–37 (2009)
Article Google Scholar
Li, S., Kawale, J., Fu, Y.: Deep collaborative filtering via marginalized denoising auto-encoder. In: ACM International Conference on Information and Knowledge Management (CIKM), pp. 811–820 (2015)
He, X.N., Liao, L.Z., Zhang, H.W., Nie, L.Q., Hu, X., et al.: Neural collaborative filtering. In: International Conference on World Wide Web, pp. 173–182 (2017)
Bottou, L.: Stochastic gradient descent tricks. Neural Networks: Tricks of the Trade, pp. 421–436. Springer, Berlin (2012)
Chapter Google Scholar
Erhan, D., Manzagol, P.A., Bengio, Y., Bengio, S., Vincent, P.: The difficulty of training deep architectures and the effect of unsupervised pre-training. In: Artificial Intelligence and Statistics, pp. 153–160 (2009)
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., et al.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11(2), 625–660 (2010)
MathSciNet MATH Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), 2121–2159 (2011)
MathSciNet MATH Google Scholar
Hinton, G., Srivastava, N., Swersky, K.: Neural networks for machine learning lecture 6a overview of mini-batch gradient descent (2012)
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: IEEE International Conference on Neural Networks, pp. 1942–1948 (1995)
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. In: International Conference on Learning Representations (ICLR) (2015)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., et al.: Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)
Deng, L.: The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29(6), 141–142 (2012)
Article Google Scholar
Marcus, M., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
Google Scholar
Common voice dataset. https://voice.mozilla.org/
Mishra, N., Lafferty, J.D., Hoffmann, H.: ESP: A machine learning approach to predicting application interference. In: IEEE International Conference on Autonomic Computing (ICAC), pp. 125–134 (2017)
Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1223–1231 (2012)
Teerapittayanon, S., McDanel, B., Kung, H.: Distributed deep neural networks over the cloud, the edge and end devices. In: IEEE International Conference on Distributed Computing Systems (ICDCS), pp. 328–339 (2017)
Cui, H.G., Zhang, H., Ganger, G.R., Gibbons, P.B., Xing, E.P.: GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server. In: Proceedings of the Eleventh European Conference on Computer Systems (2016)
Gupta, S., Zhang, W., Wang, F.: Model accuracy and runtime tradeoff in distributed deep learning: a systematic study. In: IEEE International Conference on Data Mining (ICDM), pp. 171–180 (2016)
Qiao, W., Li, Y., Wu, Z.H.: Dltap: A network-efficient scheduling method for distributed deep learning workload in containerized cluster environment. In: ITM Web of Conferences, vol. 12 (2017)

Download references

Funding

The work is supported in part by the NSFC under Grant No.61720106007, the Funds for Creative Research Groups of China under Grant No. 61921003, and the 111 Project (B18008).

Author information

Authors and Affiliations

Beijing Key Lab of Intelligent Telecomm. Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Xin Geng, Haitao Zhang, Zhengyang Zhao & Huadong Ma

Authors

Xin Geng
View author publications
You can also search for this author in PubMed Google Scholar
Haitao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhengyang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Huadong Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haitao Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Geng, X., Zhang, H., Zhao, Z. et al. Interference-aware parallelization for deep learning workload in GPU cluster. Cluster Comput 23, 2689–2702 (2020). https://doi.org/10.1007/s10586-019-03037-6

Download citation

Received: 27 June 2019
Revised: 09 October 2019
Accepted: 19 December 2019
Published: 02 January 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s10586-019-03037-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interference-aware parallelization for deep learning workload in GPU cluster

Abstract

Access this article

Similar content being viewed by others

Dynamic Resource Management for Machine Learning Pipeline Workloads

Interference-aware execution framework with Co-scheML on GPU clusters

Mixtran: an efficient and fair scheduler for mixed deep learning workloads in heterogeneous GPU environments

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Interference-aware parallelization for deep learning workload in GPU cluster

Abstract

Access this article

Similar content being viewed by others

Dynamic Resource Management for Machine Learning Pipeline Workloads

Interference-aware execution framework with Co-scheML on GPU clusters

Mixtran: an efficient and fair scheduler for mixed deep learning workloads in heterogeneous GPU environments

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation