Training deep neural network on multiple GPUs with a model averaging method

Yao, Qiongjie; Liao, Xiaofei; Jin, Hai

doi:10.1007/s12083-017-0574-4

Training deep neural network on multiple GPUs with a model averaging method

Published: 15 June 2017

Volume 11, pages 1012–1021, (2018)
Cite this article

Peer-to-Peer Networking and Applications Aims and scope Submit manuscript

Qiongjie Yao¹,
Xiaofei Liao¹ &
Hai Jin¹

574 Accesses
4 Citations
Explore all metrics

Abstract

Deep learning has shown considerable promise in numerous practical machine learning applications. However training deep learning models is highly time-consuming. To solve this problem, many studies design distributed deep learning systems with multiple graphics processing units (GPUs) on a single machine or across machines. Data parallelism is the usually method to use multiple GPUs. However, this method is not suitable for all deep learning models such as fully connected deep neural network (DNN) because of the transfer overhead. In this paper we have analyzed the transfer overhead. Parameters synchronization is the key factor to cause the transfer overhead. To reduce parameters synchronization, we propose a multiple-GPUs framework based on the model averaging where each GPU trains a whole model until convergence and the CPU averages the models as the final optimal model. The only one parameters synchronization occurs when all GPUs have completed the training model, thus dramatically reducing transfer overhead. Experimental results show that the model averaging method achieves a speedup of 1.6x with two GPUs and 1.8x with four GPUs compared with the training method on a single GPU, respectively. Compared with the data parallelism method, it also achieves a speedup of 17x and 25x on two GPUs and four GPUs, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visualizing and Understanding Convolutional Networks

A survey of the recent architectures of deep convolutional neural networks

Article 21 April 2020

Asifullah Khan, Anabia Sohail, … Aqsa Saeed Qureshi

A comprehensive review of Binary Neural Network

Article 30 March 2023

Chunyu Yuan & Sos S. Agaian

References

Chen K, Huo Q (2016) Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering. In: Proceedings of the 2016 IEEE international conference on acoustics, speech and signal processing, pp 5880–5884
Chen T, Li M, Li Y, Lin M, Wang N, Wang M, Xiao T, Xu B, Zhang C, Zhang Z (2015) Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems. In: Proceedings of the workshop on machine learning systems with the 29th annual conference on neural information processing systems (NIPS), pp 80–86
Coates A, Huval B, Wang T, Wu D, Catanzaro B, Andrew N (2013) Deep learning with cots hpc systems. In: Proceedings of the 30th international conference on machine learning (ICML), pp 1337–1345
Cui H, Zhang H, Ganger G R, Gibbons P B, Xing E P (2016) Geeps: scalable deep learning on distributed gpus with a gpu-specialized parameter server. In: Proceedings of the eleventh European conference on computer systems, (EuroSys), pp 1–16
Dean J, Corrado G, Monga R, Chen K, Devin M, Le Q V, Mao M Z, Ranzato M, Senior A W, Tucker P A, Yang K, Ng A Y (2012) Large scale distributed deep networks. In: Proceedings of the 26th annual conference on neural information processing systems (NIPS), pp 1223–1231
Dong L, Wei F, Zhou M, Xu K (2014) Adaptive multi-compositionality for recursive neural models with applications to sentiment analysis. In: Proceedings of the 28th AAAI conference on artificial intelligence (AAAI), pp 1537–1543
Gao W, Zhou Z H (2016) Dropout rademacher complexity of deep neural networks. Sci Chin Inf Sci 59 (7):1–12
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–781
Hinton G, Deng L, Yu D, Dahl G E, Mohamed AR, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97
Article Google Scholar
Hinton G E, Osindero S, Teh Y W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet MATH Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678
Krizhevsky A (2014) One weird trick for parallelizing convolutional neural networks. Eprint Arxiv
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 26th annual conference on neural information processing systems (NIPS), pp 1097–1105
Le Q V (2013) Building high-level features using large scale unsupervised learning. In: Proceedings of the 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 8595–8598
LeCun Y, Boser B E, Denker J S, Henderson D, Howard R E, Hubbard W E, Jackel L D (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Article Google Scholar
Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita E J, Su B Y (2014) Scaling distributed machine learning with the parameter server. In: Proceedings of the 11th USENIX symposium on operating systems design and implementation (OSDI), pp 583–598
Li X, Zhang G, Huang HH, Wang Z, Zheng W (2016) Performance analysis of gpu-based convolutional neural networks. In: Proceedings of the 45th international conference on parallel processing, pp 67–76
Mann G, Mcdonald RT, Mohri M, Silberman N, Dan W, Mann G, Mcdonald RT, Mohri M, Silberman N, Dan W (2009) Efficient large-scale distributed training of conditional maximum entropy models. In: Proceedings of the 23rd annual conference on neural information processing systems, pp 1231–1239
Martens J (2010) Deep learning via hessian-free optimization. In: Proceedings of the 30th international conference on machine learning (ICML), pp 1337–1345
Mcmahan HB, Moore E, Ramage D, Arcas BAY (2016) Federated learning of deep networks using model averaging. Eprint Arxiv
Ngiam J, Coates A, Lahiri A, Prochnow B, Le Q V, Ng AY (2011) On optimization methods for deep learning. In: Proceedings of the 28th international conference on machine learning (ICML), pp 265–272
Raina R, Madhavan A, Ng AY (2009) Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th international conference on machine learning (ICML), pp 873–880
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3156–3164
Yang XJ, Tao T, Wang GB (2012) Mptostream:an openmp compiler for cpu-gpu heterogeneous parallel systems. Sci Chin Inf Sci 55(9):1961–1971
Article Google Scholar
Zhang Y, Duchi JC, Wainwright MJ (2013) Communication-efficient algorithms for statistical optimization. J Mach Learn Res 14(1):3321–3363
MathSciNet MATH Google Scholar
Zhi Y, Yang Y (2015) Discrete control of longitudinal dynamics for hypersonic flight vehicle using neural networks. Sci Chin Inf Sci 58(7):1–10
Article MathSciNet Google Scholar
Zinkevich M, Weimer M, Smola AJ, Li L (2010) Parallelized stochastic gradient descent, pp 2595–2603
Zou Y, Jin X, Li Y, Guo Z, Wang E, Xiao B (2014) Mariana: tencent deep learning platform and its applications. PVLDB 7(13):1772–1777
Google Scholar

Download references

Acknowledgments

This paper is supported by National High-tech Research and Development Program of China (863 Program) under grant No.2015AA015303, National Natural Science Foundation of China under grant No. 61322210, 61272408, 61433019, Doctoral Fund of Ministry of Education of China under grant No. 20130142110048.

Author information

Authors and Affiliations

Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Qiongjie Yao, Xiaofei Liao & Hai Jin

Authors

Qiongjie Yao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Liao
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hai Jin.

Additional information

This article is part of the Topical Collection: Special Issue on Big Data Networking

Guest Editors: Xiaofei Liao, Song Guo, Deze Zeng, and Kun Wang

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yao, Q., Liao, X. & Jin, H. Training deep neural network on multiple GPUs with a model averaging method. Peer-to-Peer Netw. Appl. 11, 1012–1021 (2018). https://doi.org/10.1007/s12083-017-0574-4

Download citation

Received: 30 December 2016
Accepted: 29 May 2017
Published: 15 June 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s12083-017-0574-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Training deep neural network on multiple GPUs with a model averaging method

Abstract

Access this article

Similar content being viewed by others

Visualizing and Understanding Convolutional Networks

A survey of the recent architectures of deep convolutional neural networks

A comprehensive review of Binary Neural Network

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Training deep neural network on multiple GPUs with a model averaging method

Abstract

Access this article

Similar content being viewed by others

Visualizing and Understanding Convolutional Networks

A survey of the recent architectures of deep convolutional neural networks

A comprehensive review of Binary Neural Network

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation