Skip to main content

Sparse Asynchronous Distributed Learning

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1333))

Included in the following conference series:

  • 2120 Accesses

Abstract

In this paper, we propose an asynchronous distributed learning algorithm where parameter updates are performed by worker machines simultaneously on a local sub-part of the training data. These workers send their updates to a master machine that coordinates all received parameters in order to minimize a global empirical loss. The communication exchanges between workers and the master machine are generally the bottleneck of most asynchronous scenarios. We propose to reduce this communication cost by a sparsification mechanism which, for each worker machine, consists in randomly and independently choosing some local update entries that will not be transmitted to the master. We provably show that if the probability of choosing such local entries is high and that the global loss is strongly convex, then the whole process is guaranteed to converge to the minimum of the loss. In the case where this probability is low, we empirically show on three datasets that our approach converges to the minimum of the loss in most of the cases with a better convergence rate and much less parameter exchanges between the master and the worker machines than without using our sparsification technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

  2. 2.

    https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.normalize.html.

  3. 3.

    https://mpi4py.readthedocs.io/en/stable/citing.html.

References

  1. Bach, F., Jenatton, R., Mairal, J., Obozinski, G., et al.: Optimization with sparsity-inducing penalties. Found. Trends® Mach. Learn. 4(1), 1–106 (2012)

    Google Scholar 

  2. Boyd, S.P., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  Google Scholar 

  3. Candes, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted l 1 minimization. J. Fourier Anal. Appl. 14(5–6), 877–905 (2008)

    Article  MathSciNet  Google Scholar 

  4. Chen, J., Monga, R., Bengio, S., Jozefowicz, R.: Revisiting distributed synchronous SGD. In: International Conference on Learning Representations Workshop Track (2016). https://arxiv.org/abs/1604.00981

  5. Fadili, J., Malick, J., Peyré, G.: Sensitivity analysis for mirror-stratifiable convex functions. SIAM J. Optim. 28(4), 2975–3000 (2018)

    Article  MathSciNet  Google Scholar 

  6. Grishchenko, D., Iutzeler, F., Malick, J., Amini, M.R.: Asynchronous distributed learning with sparse communications and identification. arXiv preprint arXiv:1812.03871 (2018)

  7. Hannah, R., Yin, W.: On unbounded delays in asynchronous parallel fixed-point algorithms. J. Sci. Comput. 76(1), 299–326 (2017). https://doi.org/10.1007/s10915-017-0628-z

    Article  MathSciNet  MATH  Google Scholar 

  8. Konečnỳ, J., McMahan, H.B., Ramage, D., Richtárik, P.: Federated optimization: distributed machine learning for on-device intelligence. arXiv:1610.02527 (2016)

  9. Kumar, V.: Introduction to Parallel Computing. Addison-Wesley Longman (2002)

    Google Scholar 

  10. Lee, S., Wright, S.J.: Manifold identification in dual averaging for regularized stochastic online learning. J. Mach. Learn. Res. 13(1), 1705–1744 (2012)

    MathSciNet  MATH  Google Scholar 

  11. Lin, Y., Han, S., Mao, H., Wang, Y., Dally, W.J.: Deep gradient compression: Reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887 (2017)

  12. Ma, C., Jaggi, M., Curtis, F.E., Srebro, N., Takáč, M.: An accelerated communication-efficient primal-dual optimization framework for structured machine learning. Optimization Methods and Software, pp. 1–25 (2019)

    Google Scholar 

  13. Mishchenko, K., Iutzeler, F., Malick, J.: A distributed flexible delay-tolerant proximal gradient algorithm. SIAM J. Optim. 30(1), 933–959 (2020)

    Article  MathSciNet  Google Scholar 

  14. Mishchenko, K., Iutzeler, F., Malick, J., Amini, M.R.: A delay-tolerant proximal-gradient algorithm for distributed learning. In: Proceedings of the 35th International Conference on Machine Learning (ICML), vol. 80, pp. 3587–3595 (2018)

    Google Scholar 

  15. Nutini, J., Schmidt, M., Hare, W.: “active-set complexity” of proximal gradient: how long does it take to find the sparsity pattern? Optimization Lett. 13(4), 645–655 (2019)

    Google Scholar 

  16. Sun, T., Hannah, R., Yin, W.: Asynchronous coordinate descent under more realistic assumptions. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

  17. Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)

    Article  Google Scholar 

  18. Wangni, J., Wang, J., Liu, J., Zhang, T.: Gradient sparsification for communication-efficient distributed optimization. In: Advances in Neural Information Processing Systems, pp. 1306–1316 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dmitry Grischenko .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 157 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Grischenko, D., Iutzeler, F., Amini, MR. (2020). Sparse Asynchronous Distributed Learning. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1333. Springer, Cham. https://doi.org/10.1007/978-3-030-63823-8_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63823-8_50

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63822-1

  • Online ISBN: 978-3-030-63823-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics