Skip to main content
Log in

Automatically Setting Parameter-Exchanging Interval for Deep Learning

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

Parameter-server frameworks play an important role in scaling-up distributed deep learning algorithms. However, the constant growth of neural network size has led to a serious bottleneck on exchanging parameters across machines. Recent efforts rely on manually setting a parameter-exchanging interval to reduce communication overhead, regardless of the parameter-server’s resource availability as well. It may face poor performance or inaccurate results for inappropriate interval. Meanwhile, request burst may occur, exacerbating the bottleneck.

In this paper, we propose an approach to automatically set the optimal exchanging interval, aiming to remove the parameter-exchanging bottleneck and to evenly utilize resources without losing training accuracy. The key idea is to increase the interval on different training nodes on the basis of the knowledge of available resources and choose different intervals for each slave node to avoid request bursts. We adopted this method to optimize the parallel Stochastic Gradient Descent algorithm, through which we successfully sped up parameter-exchanging process by eight times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Parameter-server framework with parameter-exchanging interval setting code. http://github.com/sherryshare/deep-learn-framework

  2. Prediction as a candidate for learning deep hierarchical models of data code. http://github.com/rasmusbergpalm/DeepLearnToolbox

  3. Ahmed A, Aly M, Gonzalez J, Narayanamurthy S, Smola AJ (2012) Scalable inference in latent variable models. In: Proceedings of the 5th ACM international conference on web search and data mining, pp. 123–132

  4. Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Proceedings of the 19th advances in neural information processing systems, pp. 153–160

  5. Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of the 19th international conference on computational statistics, pp. 177–186

  6. Chilimbi T, Suzue Y, Apacible J, Kalyanaraman K (2014) Project adam: Building an efficient and scalable deep learning training system. In: Proceedings of the 11th USENIX symposium on operating systems design and implementation, pp. 571–582

  7. Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le QV et al (2012) Large scale distributed deep networks. In: Proceedings of the 25th advances in neural information processing systems, pp. 1223–1231

  8. Gemulla R, Nijkamp E, Haas PJ, Sismanis Y (2011) Large-scale matrix factorization with distributed stochastic gradient descent. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 69– 77

  9. Hall KB, Gilpin S, Mann G (2010) Mapreduce/bigtable for distributed optimization. In: Proceedings of the NIPS 2010 workshop on learning on cores, clusters and clouds, pp. 1–7

  10. Hinton GE, Zemel RS (1994) Autoencoders, minimum description length, and helmholtz free energy. In: Proceedings of the 6th advances in neural information processing systems, pp. 3–10

  11. Le Q, Ranzato M, Monga R, Devin M, Chen K, Corrado G, Dean J, Ng A (2012) Building high-level features using large scale unsupervised learning. In: Proceedings of the 29th international conference on machine learning, pp. 81–88

  12. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  13. Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su BY (2014) Scaling distributed machine learning with the parameter server. In: Proceedings of the 11th USENIX conference on operating systems design and implementation, pp. 583–598

  14. McDonald R, Hall K, Mann G (2010) Distributed training strategies for the structured perceptron. In: Proceedings of the 2010 annual conference of the north american chapter of the association for computational linguistics, pp. 456–464

  15. Mcdonald R, Mohri M, Silberman N, Walker D, Mann GS (2009) Efficient large-scale distributed training of conditional maximum entropy models. In: Proceedings of the 22th advances in neural information processing systems, pp. 1231–1239

  16. Recht B, Re C, Wright S, Niu F (2011) Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In: Proceedings of the 24th advances in neural information processing systems, pp. 693–701

  17. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536

    Article  Google Scholar 

  18. Saaty TL (1961) Elements of queueing theory McGraw-Hill

  19. Schiller J (2003) Mobile Communication Addison Wesley

  20. Smola A, Narayanamurthy S (2010) An architecture for parallel topic models. Proceedings of the VLDB Endowment 3(1-2):703–710

    Article  Google Scholar 

  21. Wang P, Xu B, Wu Y, Zhou X (2015) Link prediction in social networks: the state-of-the-art. SCIENCE CHINA Inf. Sci. 58(1):1–38

    Google Scholar 

  22. Zhuang Y, Chin WS, Juan YC, Lin CJ (2013) A fast parallel sgd for matrix factorization in shared memory systems. In: Proceedings of the 7th ACM conference on recommender systems, pp. 249–256

  23. Zinkevich M, Weimer M, Li L, Smola AJ (2010) Parallelized stochastic gradient descent Proceedings of the 23th advances in neural information processing systems, pp. 2595–2603

Download references

Acknowledgments

This paper is supported by National Hightech Research and Development Program of China (863 Program) under grant No.2015AA015303, National Natural Science Foundation of China under grant No. 61322210, 61272408, 61433019, Doctoral Fund of Ministry of Education of China under grant No. 20130142110048.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofei Liao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Liao, X., Fan, X. et al. Automatically Setting Parameter-Exchanging Interval for Deep Learning. Mobile Netw Appl 22, 186–194 (2017). https://doi.org/10.1007/s11036-016-0740-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-016-0740-6

Keywords

Navigation