Skip to main content
Log in

Interference-aware parallelization for deep learning workload in GPU cluster

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

With the widespread use of GPUs for performing deep learning applications, the issue of efficient execution of multiple deep learning jobs in a GPU cluster has attracted great attention. It becomes more difficult to achieve efficient workloads parallelization since modern GPUs support concurrent execution of multiple jobs. However, traditional coarse-grained scheduling methods without taking into account interference caused by resource contention among co-executing jobs and characteristics of deep learning jobs can lead to unbalanced use of computing resource and further cause the degradation of jobs performance in the GPU cluster. In this paper, we propose a two-stage workload parallelization approach for deep learning training workloads. We firstly propose two interference-aware prediction models including the Interference-Aware Similarity Prediction (IASP) model based on deep collaborative filtering and the Interference-Aware Performance Prediction (IAPP) model based on deep neural network. Our parallelization approach includes both the cluster-level workload parallelization strategy and the node-level workload parallelization strategy. Specifically, the Cluster-Level Workload Parallelization (CLWP) strategy assigns deep learning jobs to appropriate worker node according to the proposed IASP model, and the Node-Level Workload Parallelization (NLWP) strategy places deep learning tasks to appropriate GPUs according to the proposed IAPP model and the communication costs among tasks. We evaluate our deep learning workload parallelization strategy on a prototype platform with other widely used methods. The experimental results show that the proposed strategy can averagely improve the GPU utilization by 18% and reduce the job completion time by around 22%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)

  2. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)

  3. He, K.M., Zhang, X.Y., Ren, S.Q., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

  4. Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A.R., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  5. Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649 (2013)

  6. Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., et al.: Deep speech: scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014)

  7. Boag, S., Dube, P., Herta, B., Hummer, W., Ishakian, V., et al.: Scalable multi-framework multi-tenant lifecycle management of deep learning training jobs. In: Workshop on ML Systems, NIPS (2017)

  8. Jeon, M., Venkataraman, S., Qian, J.J., Phanishayee, A., Xiao, W.C., et al.: Multi-tenant GPU clusters for deep learning workloads: analysis and implications. Technical report, MSR-TR-2018 (2018)

  9. Bernstein, D.: Containers and cloud: from LXC to Docker to Kubernetes. IEEE Cloud Comput. 3, 81–84 (2014)

    Article  Google Scholar 

  10. Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., et al.: Apache Hadoop Yarn: yet another resource negotiator. In: Proceedings of the 4th ACM Symposium on Cloud Computing (SOCC) (2013)

  11. Amaral, M., Polo, J., Carrera, D., Seelam, S., Steinder, M.: Topology-aware GPU scheduling for learning workloads in cloud environments. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2017)

  12. Xiao, W.C., Bhardwaj, R., Ramjee, R., Sivathanu, M., Kwatra, N., et al.: Gandiva: introspective cluster scheduling for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 595–610 (2018)

  13. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016)

    Article  Google Scholar 

  14. Ng, J.Y.H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., et al.: Beyond short snippets: deep networks for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4694–4702 (2015)

  15. Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Eleventh Annual Conference of the International Speech Communication Association (2010)

  16. Jia, X., Song, S., Wei, H., Wang, Y., Chu, X.: Highly scalable deep learning training system with mixed-precision: training imagenet in four minutes (2018)

  17. Koren, T., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 8, 30–37 (2009)

    Article  Google Scholar 

  18. Li, S., Kawale, J., Fu, Y.: Deep collaborative filtering via marginalized denoising auto-encoder. In: ACM International Conference on Information and Knowledge Management (CIKM), pp. 811–820 (2015)

  19. He, X.N., Liao, L.Z., Zhang, H.W., Nie, L.Q., Hu, X., et al.: Neural collaborative filtering. In: International Conference on World Wide Web, pp. 173–182 (2017)

  20. Bottou, L.: Stochastic gradient descent tricks. Neural Networks: Tricks of the Trade, pp. 421–436. Springer, Berlin (2012)

    Chapter  Google Scholar 

  21. Erhan, D., Manzagol, P.A., Bengio, Y., Bengio, S., Vincent, P.: The difficulty of training deep architectures and the effect of unsupervised pre-training. In: Artificial Intelligence and Statistics, pp. 153–160 (2009)

  22. Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., et al.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11(2), 625–660 (2010)

    MathSciNet  MATH  Google Scholar 

  23. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)

  24. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  25. Hinton, G., Srivastava, N., Swersky, K.: Neural networks for machine learning lecture 6a overview of mini-batch gradient descent (2012)

  26. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: IEEE International Conference on Neural Networks, pp. 1942–1948 (1995)

  27. Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. In: International Conference on Learning Representations (ICLR) (2015)

  28. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  29. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  30. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., et al.: Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)

  31. Deng, L.: The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29(6), 141–142 (2012)

    Article  Google Scholar 

  32. Marcus, M., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)

    Google Scholar 

  33. Common voice dataset. https://voice.mozilla.org/

  34. Mishra, N., Lafferty, J.D., Hoffmann, H.: ESP: A machine learning approach to predicting application interference. In: IEEE International Conference on Autonomic Computing (ICAC), pp. 125–134 (2017)

  35. Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1223–1231 (2012)

  36. Teerapittayanon, S., McDanel, B., Kung, H.: Distributed deep neural networks over the cloud, the edge and end devices. In: IEEE International Conference on Distributed Computing Systems (ICDCS), pp. 328–339 (2017)

  37. Cui, H.G., Zhang, H., Ganger, G.R., Gibbons, P.B., Xing, E.P.: GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server. In: Proceedings of the Eleventh European Conference on Computer Systems (2016)

  38. Gupta, S., Zhang, W., Wang, F.: Model accuracy and runtime tradeoff in distributed deep learning: a systematic study. In: IEEE International Conference on Data Mining (ICDM), pp. 171–180 (2016)

  39. Qiao, W., Li, Y., Wu, Z.H.: Dltap: A network-efficient scheduling method for distributed deep learning workload in containerized cluster environment. In: ITM Web of Conferences, vol. 12 (2017)

Download references

Funding

The work is supported in part by the NSFC under Grant No.61720106007, the Funds for Creative Research Groups of China under Grant No. 61921003, and the 111 Project (B18008).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haitao Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Geng, X., Zhang, H., Zhao, Z. et al. Interference-aware parallelization for deep learning workload in GPU cluster. Cluster Comput 23, 2689–2702 (2020). https://doi.org/10.1007/s10586-019-03037-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-019-03037-6

Keywords

Navigation