Automatically Setting Parameter-Exchanging Interval for Deep Learning

Wang, Siyuan; Liao, Xiaofei; Fan, Xuepeng; Jin, Hai; Yao, Qiongjie; Zhang, Yu

doi:10.1007/s11036-016-0740-6

Automatically Setting Parameter-Exchanging Interval for Deep Learning

Published: 03 June 2016

Volume 22, pages 186–194, (2017)
Cite this article

Mobile Networks and Applications Aims and scope Submit manuscript

Siyuan Wang¹,
Xiaofei Liao¹,
Xuepeng Fan¹,
Hai Jin¹,
Qiongjie Yao¹ &
…
Yu Zhang¹

375 Accesses
3 Altmetric
Explore all metrics

Abstract

Parameter-server frameworks play an important role in scaling-up distributed deep learning algorithms. However, the constant growth of neural network size has led to a serious bottleneck on exchanging parameters across machines. Recent efforts rely on manually setting a parameter-exchanging interval to reduce communication overhead, regardless of the parameter-server’s resource availability as well. It may face poor performance or inaccurate results for inappropriate interval. Meanwhile, request burst may occur, exacerbating the bottleneck.

In this paper, we propose an approach to automatically set the optimal exchanging interval, aiming to remove the parameter-exchanging bottleneck and to evenly utilize resources without losing training accuracy. The key idea is to increase the interval on different training nodes on the basis of the knowledge of available resources and choose different intervals for each slave node to avoid request bursts. We adopted this method to optimize the parallel Stochastic Gradient Descent algorithm, through which we successfully sped up parameter-exchanging process by eight times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rationing bandwidth resources for mitigating network resource contention in distributed DNN training clusters

Article 02 March 2021

EP4DDL: addressing straggler problem in heterogeneous distributed deep learning

Article 21 April 2022

DOSP: an optimal synchronization of parameter server for distributed machine learning

Article 25 March 2022

References

Parameter-server framework with parameter-exchanging interval setting code. http://github.com/sherryshare/deep-learn-framework
Prediction as a candidate for learning deep hierarchical models of data code. http://github.com/rasmusbergpalm/DeepLearnToolbox
Ahmed A, Aly M, Gonzalez J, Narayanamurthy S, Smola AJ (2012) Scalable inference in latent variable models. In: Proceedings of the 5th ACM international conference on web search and data mining, pp. 123–132
Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Proceedings of the 19th advances in neural information processing systems, pp. 153–160
Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of the 19th international conference on computational statistics, pp. 177–186
Chilimbi T, Suzue Y, Apacible J, Kalyanaraman K (2014) Project adam: Building an efficient and scalable deep learning training system. In: Proceedings of the 11th USENIX symposium on operating systems design and implementation, pp. 571–582
Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le QV et al (2012) Large scale distributed deep networks. In: Proceedings of the 25th advances in neural information processing systems, pp. 1223–1231
Gemulla R, Nijkamp E, Haas PJ, Sismanis Y (2011) Large-scale matrix factorization with distributed stochastic gradient descent. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 69– 77
Hall KB, Gilpin S, Mann G (2010) Mapreduce/bigtable for distributed optimization. In: Proceedings of the NIPS 2010 workshop on learning on cores, clusters and clouds, pp. 1–7
Hinton GE, Zemel RS (1994) Autoencoders, minimum description length, and helmholtz free energy. In: Proceedings of the 6th advances in neural information processing systems, pp. 3–10
Le Q, Ranzato M, Monga R, Devin M, Chen K, Corrado G, Dean J, Ng A (2012) Building high-level features using large scale unsupervised learning. In: Proceedings of the 29th international conference on machine learning, pp. 81–88
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su BY (2014) Scaling distributed machine learning with the parameter server. In: Proceedings of the 11th USENIX conference on operating systems design and implementation, pp. 583–598
McDonald R, Hall K, Mann G (2010) Distributed training strategies for the structured perceptron. In: Proceedings of the 2010 annual conference of the north american chapter of the association for computational linguistics, pp. 456–464
Mcdonald R, Mohri M, Silberman N, Walker D, Mann GS (2009) Efficient large-scale distributed training of conditional maximum entropy models. In: Proceedings of the 22th advances in neural information processing systems, pp. 1231–1239
Recht B, Re C, Wright S, Niu F (2011) Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In: Proceedings of the 24th advances in neural information processing systems, pp. 693–701
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536
Article Google Scholar
Saaty TL (1961) Elements of queueing theory McGraw-Hill
Schiller J (2003) Mobile Communication Addison Wesley
Smola A, Narayanamurthy S (2010) An architecture for parallel topic models. Proceedings of the VLDB Endowment 3(1-2):703–710
Article Google Scholar
Wang P, Xu B, Wu Y, Zhou X (2015) Link prediction in social networks: the state-of-the-art. SCIENCE CHINA Inf. Sci. 58(1):1–38
Google Scholar
Zhuang Y, Chin WS, Juan YC, Lin CJ (2013) A fast parallel sgd for matrix factorization in shared memory systems. In: Proceedings of the 7th ACM conference on recommender systems, pp. 249–256
Zinkevich M, Weimer M, Li L, Smola AJ (2010) Parallelized stochastic gradient descent Proceedings of the 23th advances in neural information processing systems, pp. 2595–2603

Download references

Acknowledgments

This paper is supported by National Hightech Research and Development Program of China (863 Program) under grant No.2015AA015303, National Natural Science Foundation of China under grant No. 61322210, 61272408, 61433019, Doctoral Fund of Ministry of Education of China under grant No. 20130142110048.

Author information

Authors and Affiliations

Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Siyuan Wang, Xiaofei Liao, Xuepeng Fan, Hai Jin, Qiongjie Yao & Yu Zhang

Authors

Siyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Liao
View author publications
You can also search for this author in PubMed Google Scholar
Xuepeng Fan
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar
Qiongjie Yao
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaofei Liao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, S., Liao, X., Fan, X. et al. Automatically Setting Parameter-Exchanging Interval for Deep Learning. Mobile Netw Appl 22, 186–194 (2017). https://doi.org/10.1007/s11036-016-0740-6

Download citation

Published: 03 June 2016
Issue Date: April 2017
DOI: https://doi.org/10.1007/s11036-016-0740-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatically Setting Parameter-Exchanging Interval for Deep Learning

Abstract

Access this article

Similar content being viewed by others

Rationing bandwidth resources for mitigating network resource contention in distributed DNN training clusters

EP4DDL: addressing straggler problem in heterogeneous distributed deep learning

DOSP: an optimal synchronization of parameter server for distributed machine learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatically Setting Parameter-Exchanging Interval for Deep Learning

Abstract

Access this article

Similar content being viewed by others

Rationing bandwidth resources for mitigating network resource contention in distributed DNN training clusters

EP4DDL: addressing straggler problem in heterogeneous distributed deep learning

DOSP: an optimal synchronization of parameter server for distributed machine learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation