Skip to main content
Log in

2D-THA-ADMM: communication efficient distributed ADMM algorithm framework based on two-dimensional torus hierarchical AllReduce

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Model synchronization refers to the communication process involved in large-scale distributed machine learning tasks. As the cluster scales up, the synchronization of model parameters becomes a challenging task that has to be coordinated among thousands of workers. Firstly, this study proposes a hierarchical AllReduce algorithm structured on a two-dimensional torus (2D-THA), which utilizes a hierarchical structure to synchronize model parameters and maximize bandwidth utilization. Secondly, this study introduces a distributed consensus algorithm called 2D-THA-ADMM, which combines the 2D-THA synchronization algorithm with the alternating direction method of multipliers (ADMM). Thirdly, we evaluate the model parameter synchronization performance of 2D-THA and the scalability of 2D-THA-ADMM on the Tianhe-2 supercomputing platform using real public datasets. Our experiments demonstrate that 2D-THA significantly reduces synchronization time by \(63.447\%\) compared to MPI_Allreduce. Furthermore, the proposed 2D-THA-ADMM algorithm exhibits excellent scalability, with a training speed increase of over 3\(\times \) compared to the state-of-the-art methods, while maintaining high accuracy and computational efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Data availability

The data underlying this article are available in the article.

Notes

  1. https://www.csie.ntu.edu.tw/cjlin/libsvmtools/datasets/binary.html##url.

  2. https://www.csie.ntu.edu.tw/cjlin/libsvmtools/datasets/binary.html##webspam.

References

  1. Gu R, Qi Y, Wu T, Wang Z, Xu X, Yuan C, Huang Y (2021) Sparkdq: efficient generic big data quality management on distributed data-parallel computation. J Parall Distrib Comput 156:132–147

    Article  Google Scholar 

  2. Nagrecha K (2021) Model-parallel model selection for deep learning systems. In: Proceedings of the 2021 international conference on management of data, pp 2929–2931

  3. Shang F, Xu T, Liu Y, Liu H, Shen L, Gong M (2021) Differentially private ADMM algorithms for machine learning. IEEE Trans Inf Forens Secur 16:4733–4745

    Article  Google Scholar 

  4. Boyd S, Parikh N, Chu E (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc, Norwell

    Google Scholar 

  5. Yang Y, Guan X, Jia Q.-S, Yu L, Xu B, Spanos CJ (2022) A survey of ADMM variants for distributed optimization: problems, algorithms and features. arXiv preprint arXiv:2208.03700

  6. Elgabli A, Park J, Bedi AS, Issaid CB, Bennis M, Aggarwal V (2020) Q-GADMM: quantized group ADMM for communication efficient decentralized machine learning. IEEE Trans Commun 69(1):164–181

    Article  Google Scholar 

  7. Wang D, Lei Y, Xie J, Wang G (2021) HSAC-ALADMM: an asynchronous lazy ADMM algorithm based on hierarchical sparse allreduce communication. J Supercomput 77:8111–8134

    Article  Google Scholar 

  8. Liu Z, Xu Y (2022) Multi-task nonparallel support vector machine for classification. Appl Soft Comput 124:109051

    Article  Google Scholar 

  9. Zhou S, Li GY (2023) Federated learning via inexact ADMM. IEEE Trans Pattern Anal Mach Intell

  10. Liu Y, Wu G, Tian Z, Ling Q (2021) Dqc-admm: decentralized dynamic admm with quantized and censored communications’’. IEEE Trans Neural Netw Learn Syst 33(8):3290–3304

    Article  MathSciNet  Google Scholar 

  11. Wang S, Geng J, Li D (2021) Impact of synchronization topology on dml performance: both logical topology and physical topology. IEEE/ACM Trans Netw 30(2):572–585

    Article  Google Scholar 

  12. Sun DL, Fevotte C (2014) Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6201–6205

  13. Shi S, Tang Z, Chu X, Liu C, Wang W, Li B (2020) A quantitative survey of communication optimizations in distributed deep learning. IEEE Netw 35(3):230–237

    Article  Google Scholar 

  14. Thakur R, Rabenseifner R, Gropp W (2005) Optimization of collective communication operations in mpich. Int J High Perform Comput Appl 19(1):49–66

    Article  Google Scholar 

  15. Graham RL, Barrett BW, Shipman GM, Woodall TS, Bosilca G (2007) Open mpi: a high performance, flexible implementation of mpi point-to-point communications. Parall Process Lett 17(01):79–88

    Article  MathSciNet  Google Scholar 

  16. Patarasuk P, Yuan X (2009) Bandwidth optimal all-reduce algorithms for clusters of workstations. J Parall Distrib Comput 69(2):117–124

    Article  Google Scholar 

  17. Research B (2017) “baidu-allreduce.” [Online]. https://github.com/baidu-research/baidu-allreduce

  18. Lee J, Hwang I, Shah S, Cho M (2020) Flexreduce: Flexible all-reduce for distributed deep learning on asymmetric network topology. In: 2020 57th ACM/IEEE design automation conference (DAC). IEEE, pp 1–6

  19. Sanghoon J, Son H, Kim J (2023) Logical/physical topology-aware collective communication in deep learning training. In: 2023 IEEE International symposium on high-performance computer architecture (HPCA). IEEE, pp 56–68

  20. França G, Bento J (2020) Distributed optimization, averaging via admm, and network topology. Proc IEEE 108(11):1939–1952

    Article  Google Scholar 

  21. Tavara S, Schliep A (2018) Effect of network topology on the performance of admm-based svms. In: 2018 30th international symposium on computer architecture and high performance computing (SBAC-PAD). IEEE, pp 388–393

  22. Wang D, Lei Y, Zhou J (2021) Hybrid mpi/openmp parallel asynchronous distributed alternating direction method of multipliers. Computing 103(12):2737–2762

    Article  MathSciNet  Google Scholar 

  23. Xie J, Lei Y (2019) Admmlib: a library of communication-efficient ad-admm for distributed machine learning. In: IFIP international conference on network and parallel computing. Springer, pp 322–326

  24. Wang Q, Wu W, Wang B, Wang G, Xi Y, Liu H, Wang S, Zhang J (2022)Asynchronous decomposition method for the coordinated operation of virtual power plants. IEEE Trans Power Syst

  25. Li M, Andersen DG, Smola AJ, Yu K (2014) Communication efficient distributed machine learning with the parameter server. Adv Neural Inf Process Syst 27

  26. Zhang Z, Yang S, Xu W, Di K (2022) Privacy-preserving distributed admm with event-triggered communication. IEEE Trans Neural Netw Learn Syst

  27. Huang J, Majumder P, Kim S, Muzahid A, Yum KH, Kim EJ (2021) “Communication algorithm-architecture co-design for distributed deep learning. In: 2021 ACM/IEEE 48th annual international symposium on computer architecture (ISCA). IEEE, pp 181–194

  28. Mikami H, Suganuma H, Tanaka Y, Kageyama Y et al (2018) Massively distributed sgd: Imagenet/resnet-50 training in a flash. arXiv preprint arXiv:1811.05233

  29. Cho M, Finkler U, Kung D, Hunter H (2019) Blueconnect: decomposing all-reduce for deep learning on heterogeneous network hierarchy. Proc Mach Learn Syst 1:241–251

    Google Scholar 

  30. Wang G, Venkataraman S, Phanishayee A, Devanur N, Thelin J, Stoica I (2020) Blink: fast and generic collectives for distributed ml. Proc Mach Learn Syst 2:172–186

    Google Scholar 

  31. Kielmann T, Hofman RF, Bal HE, Plaat A, Bhoedjang RA (1999) Magpie: Mpi’s collective communication operations for clustered wide area systems. In: Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pp 131–140

  32. Zhu H, Goodell D, Gropp W, Thakur R (2009) Hierarchical collectives in mpich2. In: European parallel virtual machine/message passing interface users’ group meeting. Springer, pp 325–326

  33. Bayatpour M, Chakraborty S, Subramoni H, Lu X, Panda DK (2017) Scalable reduction collectives with data partitioning-based multi-leader design. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, pp 1–11

  34. Jia X, Song S, He W, Wang Y, Rong H, Zhou F, Xie L, Guo Z, Yang Y, Yu L et al (2018) Highly scalable deep learning training system with mixed-precision: Training imagenet in four minutes. arXiv preprint arXiv:1807.11205

  35. Ryabinin M, Gorbunov E, Plokhotnyuk V, Pekhimenko G (2021) Moshpit sgd: communication-efficient decentralized training on heterogeneous unreliable devices. Adv Neural Inf Process Syst 34:18195–18211

    Google Scholar 

  36. Lin CJ, Weng RC, Keerthi SS (2008) Trust region newton method for large-scale logistic regression. J Mach Learn Res 9(4)

  37. Mamidala AR, Liu J, Panda DK (2004) Efficient barrier and allreduce on infiniband clusters using multicast and adaptive algorithms. In: 2004 IEEE international conference on cluster computing (IEEE Cat. No. 04EX935). IEEE, pp 135–144

  38. Ho Q, Cipar J, Cui H, Lee S, Kim JK, Gibbons PB, Gibson GA, Ganger G, Xing EP (2013) More effective distributed ml via a stale synchronous parallel parameter server. In: Advances in neural information processing systems, pp 1223–1231

  39. Zhang R, Kwok J (2014) Asynchronous distributed admm for consensus optimization. In: International conference on machine learning. PMLR, pp 1701–1709

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongmei Lei.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, G., Lei, Y., Zhang, Z. et al. 2D-THA-ADMM: communication efficient distributed ADMM algorithm framework based on two-dimensional torus hierarchical AllReduce. Int. J. Mach. Learn. & Cyber. 15, 207–226 (2024). https://doi.org/10.1007/s13042-023-01903-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01903-9

Keywords

Navigation