Sparse Asynchronous Distributed Learning

Grischenko, Dmitry; Iutzeler, Franck; Amini, Massih-Reza

doi:10.1007/978-3-030-63823-8_50

Dmitry Grischenko^11,12,
Franck Iutzeler¹¹ &
Massih-Reza Amini¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1333))

Included in the following conference series:

International Conference on Neural Information Processing

2385 Accesses

Abstract

In this paper, we propose an asynchronous distributed learning algorithm where parameter updates are performed by worker machines simultaneously on a local sub-part of the training data. These workers send their updates to a master machine that coordinates all received parameters in order to minimize a global empirical loss. The communication exchanges between workers and the master machine are generally the bottleneck of most asynchronous scenarios. We propose to reduce this communication cost by a sparsification mechanism which, for each worker machine, consists in randomly and independently choosing some local update entries that will not be transmitted to the master. We provably show that if the probability of choosing such local entries is high and that the global loss is strongly convex, then the whole process is guaranteed to converge to the minimum of the loss. In the case where this probability is low, we empirically show on three datasets that our approach converges to the minimum of the loss in most of the cases with a better convergence rate and much less parameter exchanges between the master and the worker machines than without using our sparsification technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Large-scale asynchronous distributed learning based on parameter exchanges

Article 08 March 2018

Asynchronous Distributed ADMM for Learning with Large-Scale and High-Dimensional Sparse Data Set

Genuinely distributed Byzantine machine learning

Article Open access 26 May 2022

Notes

References

Bach, F., Jenatton, R., Mairal, J., Obozinski, G., et al.: Optimization with sparsity-inducing penalties. Found. Trends® Mach. Learn. 4(1), 1–106 (2012)
Google Scholar
Boyd, S.P., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article Google Scholar
Candes, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted l 1 minimization. J. Fourier Anal. Appl. 14(5–6), 877–905 (2008)
Article MathSciNet Google Scholar
Chen, J., Monga, R., Bengio, S., Jozefowicz, R.: Revisiting distributed synchronous SGD. In: International Conference on Learning Representations Workshop Track (2016). https://arxiv.org/abs/1604.00981
Fadili, J., Malick, J., Peyré, G.: Sensitivity analysis for mirror-stratifiable convex functions. SIAM J. Optim. 28(4), 2975–3000 (2018)
Article MathSciNet Google Scholar
Grishchenko, D., Iutzeler, F., Malick, J., Amini, M.R.: Asynchronous distributed learning with sparse communications and identification. arXiv preprint arXiv:1812.03871 (2018)
Hannah, R., Yin, W.: On unbounded delays in asynchronous parallel fixed-point algorithms. J. Sci. Comput. 76(1), 299–326 (2017). https://doi.org/10.1007/s10915-017-0628-z
Article MathSciNet MATH Google Scholar
Konečnỳ, J., McMahan, H.B., Ramage, D., Richtárik, P.: Federated optimization: distributed machine learning for on-device intelligence. arXiv:1610.02527 (2016)
Kumar, V.: Introduction to Parallel Computing. Addison-Wesley Longman (2002)
Google Scholar
Lee, S., Wright, S.J.: Manifold identification in dual averaging for regularized stochastic online learning. J. Mach. Learn. Res. 13(1), 1705–1744 (2012)
MathSciNet MATH Google Scholar
Lin, Y., Han, S., Mao, H., Wang, Y., Dally, W.J.: Deep gradient compression: Reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887 (2017)
Ma, C., Jaggi, M., Curtis, F.E., Srebro, N., Takáč, M.: An accelerated communication-efficient primal-dual optimization framework for structured machine learning. Optimization Methods and Software, pp. 1–25 (2019)
Google Scholar
Mishchenko, K., Iutzeler, F., Malick, J.: A distributed flexible delay-tolerant proximal gradient algorithm. SIAM J. Optim. 30(1), 933–959 (2020)
Article MathSciNet Google Scholar
Mishchenko, K., Iutzeler, F., Malick, J., Amini, M.R.: A delay-tolerant proximal-gradient algorithm for distributed learning. In: Proceedings of the 35th International Conference on Machine Learning (ICML), vol. 80, pp. 3587–3595 (2018)
Google Scholar
Nutini, J., Schmidt, M., Hare, W.: “active-set complexity” of proximal gradient: how long does it take to find the sparsity pattern? Optimization Lett. 13(4), 645–655 (2019)
Google Scholar
Sun, T., Hannah, R., Yin, W.: Asynchronous coordinate descent under more realistic assumptions. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Article Google Scholar
Wangni, J., Wang, J., Liu, J., Zhang, T.: Gradient sparsification for communication-efficient distributed optimization. In: Advances in Neural Information Processing Systems, pp. 1306–1316 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics (LJK), Université Grenoble Alpes, CNRS, Grenoble, France
Dmitry Grischenko & Franck Iutzeler
Department of Computer Science (LIG), Université Grenoble Alpes, CNRS, Grenoble, France
Dmitry Grischenko & Massih-Reza Amini

Authors

Dmitry Grischenko
View author publications
You can also search for this author in PubMed Google Scholar
Franck Iutzeler
View author publications
You can also search for this author in PubMed Google Scholar
Massih-Reza Amini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dmitry Grischenko .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 157 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grischenko, D., Iutzeler, F., Amini, MR. (2020). Sparse Asynchronous Distributed Learning. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1333. Springer, Cham. https://doi.org/10.1007/978-3-030-63823-8_50

Download citation

DOI: https://doi.org/10.1007/978-3-030-63823-8_50
Published: 17 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63822-1
Online ISBN: 978-3-030-63823-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics