Large-scale asynchronous distributed learning based on parameter exchanges

Joshi, Bikash; Iutzeler, Franck; Amini, Massih-Reza

doi:10.1007/s41060-018-0110-5

Large-scale asynchronous distributed learning based on parameter exchanges

Regular Paper
Published: 08 March 2018

Volume 5, pages 223–232, (2018)
Cite this article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

305 Accesses
Explore all metrics

Abstract

In many distributed learning problems, the heterogeneous loading of computing machines may harm the overall performance of synchronous strategies, as each machine begins its new computations after receiving an aggregated information from a master and any delay in sending local information to the latter may be a bottleneck. In this paper, we propose an effective asynchronous distributed framework for the minimization of a sum of smooth functions, where each machine performs iterations in parallel on its local function and updates a shared parameter asynchronously. In this way, all machines can continuously work even though they do not have the latest version of the shared parameter. We prove the convergence of the consistency of this general distributed asynchronous method for gradient iterations and then show its efficiency on the matrix factorization problem for recommender systems and on binary classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

Notes

References

Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)
Book MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall Inc, Upper Saddle River (1999)
MATH Google Scholar
Chang, T., Hong, M., Liao, W., Wang, X.: Asynchronous distributed ADMM for large-scale optimization—part I: algorithm and convergence analysis (2015). arXiv preprint arXiv:1509.02597
Chin, W.S., Zhuang, Y., Juan, Y.C., Lin, C.J.: A learning-rate schedule for stochastic gradient methods to matrix factorization. In: Advances in Knowledge Discovery and Data Mining, pp. 442–455. Springer, Cham (2015)
Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Senior, A., Tucker, P., Yang, K., Le, Q.V., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, pp. 1223–1231 (2012)
Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014)
Gemulla, R., Nijkamp, E., Haas, P.J., Sismanis, Y.: Large-scale matrix factorization with distributed stochastic gradient descent. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 69–77. ACM (2011)
Ho, Q., Cipar, J., Cui, H., Lee, S., Kim, J.K., Gibbons, P.B., Gibson, G.A., Ganger, G., Xing, E.P.: More effective distributed ml via a stale synchronous parallel parameter server. In: Advances in Neural Information Processing Systems, vol. 26, pp. 1223–1231 (2013)
Huo, Z., Huang, H.: Asynchronous stochastic gradient descent with variance reduction for non-convex optimization (2016). ArXiv preprint arXiv:1604.03584
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323 (2013)
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)
Article Google Scholar
Mairal, J.: Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM J. Optim. 25(2), 829–855 (2015). https://doi.org/10.1137/140957639
Article MathSciNet MATH Google Scholar
Makari, F., Teflioudi, C., Gemulla, R., Haas, P., Sismanis, Y.: Shared-memory and shared-nothing stochastic gradient descent algorithms for matrix completion. Knowl. Inf. Syst. 42(3), 493–523 (2015)
Article Google Scholar
Roux, N.L., Schmidt, M., Bach, F.R.: A stochastic gradient method with an exponential convergence_rate for finite training sets. In: Advances in Neural Information Processing Systems, pp. 2663–2671 (2012)
Sra, S.: Scalable nonconvex inexact proximal splitting. In: Advances in Neural Information Processing Systems, vol. 25, pp. 530–538 (2012)
Xu, Y., Yin, W.: A globally convergent algorithm for nonconvex optimization based on block coordinate update. (2014). ArXiv e-prints arXiv:1410.1386
Yu, Z.Q., Shi, X.J., Yan, L., Li, W.J.: Distributed stochastic ADMM for matrix factorization. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1259–1268. ACM (2014)
Zhang, R., Zheng, S., Kwok, J.T.: Fast distributed asynchronous SGD with variance reduction. CoRR, arXiv:1508.01633 (2015)
Zhu, D.L., Marcotte, P.: Co-coercivity and its role in the convergence of iterative schemes for solving variational inequalities. SIAM J. Optim. 6(3), 714–726 (1996)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France
Bikash Joshi & Massih-Reza Amini
Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, Grenoble, France
Franck Iutzeler

Authors

Bikash Joshi
View author publications
You can also search for this author inPubMed Google Scholar
Franck Iutzeler
View author publications
You can also search for this author inPubMed Google Scholar
Massih-Reza Amini
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Massih-Reza Amini.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Joshi, B., Iutzeler, F. & Amini, MR. Large-scale asynchronous distributed learning based on parameter exchanges. Int J Data Sci Anal 5, 223–232 (2018). https://doi.org/10.1007/s41060-018-0110-5

Download citation

Received: 23 June 2017
Accepted: 24 February 2018
Published: 08 March 2018
Issue Date: June 2018
DOI: https://doi.org/10.1007/s41060-018-0110-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-scale asynchronous distributed learning based on parameter exchanges

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sparse Asynchronous Distributed Learning

Asynchronous Fully-Decentralized SGD in the Cluster-Based Model

A distributed Frank–Wolfe framework for learning low-rank matrices with the trace norm

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Large-scale asynchronous distributed learning based on parameter exchanges

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sparse Asynchronous Distributed Learning

Asynchronous Fully-Decentralized SGD in the Cluster-Based Model

A distributed Frank–Wolfe framework for learning low-rank matrices with the trace norm

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now