skip to main content
10.1145/3605573.3605610acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

PSRA-HGADMM: A Communication Efficient Distributed ADMM Algorithm

Published:13 September 2023Publication History

ABSTRACT

Among distributed machine learning algorithms, the global consensus alternating direction method of multipliers (ADMM) has attracted much attention because it can effectively solve large-scale optimization problems. However, the high communication cost slows its convergence and limits scalability. To solve the problem, we propose a hierarchical grouping ADMM algorithm (PSRA-HGADMM) with a novel Ring-Allreduce communication model in this paper. Firstly, we optimize the parameter exchange of the ADMM algorithm and implement the global consensus ADMM algorithm in the decentralized architecture. Secondly, to improve the communication efficiency of the distributed system, we propose a novel Ring-Allreduce communication model (PSR-Allreduce) based on the idea of parameter server architecture. Finally, a Worker-Leader-Group generator (WLG) framework is designed to solve the problem of inconsistency of cluster nodes. This framework combines hierarchical parameter aggregation and adopts the grouping strategy to improve the scalability of the distributed system. Experiments show that PSRA-HGADMM has better convergence performance and better scalability than ADMMLib and AD-ADMM. Compared with ADMMLib, the overall communication cost of PSRA-HGADMM is reduced by 32%.

References

  1. Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, 2016. Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning. PMLR, 173–182.Google ScholarGoogle Scholar
  2. Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning 3, 1 (2011), 1–122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Anis Elgabli, Jihong Park, Amrit S Bedi, Mehdi Bennis, and Vaneet Aggarwal. 2020. GADMM: Fast and communication efficient framework for distributed machine learning.J. Mach. Learn. Res. 21, 76 (2020), 1–39.Google ScholarGoogle Scholar
  4. Anis Elgabli, Jihong Park, Amrit Singh Bedi, Chaouki Ben Issaid, Mehdi Bennis, and Vaneet Aggarwal. 2020. Q-GADMM: Quantized group ADMM for communication efficient decentralized machine learning. IEEE Transactions on Communications 69, 1 (2020), 164–181.Google ScholarGoogle ScholarCross RefCross Ref
  5. Andrew Gibiansky. 2017. Bringing HPC techniques to deep learning. Baidu Research, Tech. Rep. (2017).Google ScholarGoogle Scholar
  6. William Gropp, William D Gropp, Ewing Lusk, Anthony Skjellum, and Argonne Distinguished Fellow Emeritus Ewing Lusk. 1999. Using MPI: portable parallel programming with the message-passing interface. Vol. 1. MIT press.Google ScholarGoogle Scholar
  7. Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md Patwary, Mostofa Ali, Yang Yang, and Yanqi Zhou. 2017. Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409 (2017).Google ScholarGoogle Scholar
  8. Qirong Ho, James Cipar, Henggang Cui, Jin Kyu Kim, and Eric P Xing. 2013. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server. Advances in Neural Information Processing Systems 2013, 2013 (2013), 1223.Google ScholarGoogle Scholar
  9. Xin Huang, Guozheng Wang, and Yongmei Lei. 2021. GR-ADMM: A Communication Efficient Algorithm Based on ADMM. In 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom). IEEE, 220–227.Google ScholarGoogle Scholar
  10. Melvin Johnson, Mike Schuster, Quoc V Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5 (2017), 339–351.Google ScholarGoogle ScholarCross RefCross Ref
  11. Arun Kumar, Matthias Boehm, and Jun Yang. 2017. Data management in machine learning: Challenges, techniques, and systems. In Proceedings of the 2017 ACM International Conference on Management of Data. 1717–1722.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mu Li, David G Andersen, Alexander J Smola, and Kai Yu. 2014. Communication efficient distributed machine learning with the parameter server. Advances in Neural Information Processing Systems 27 (2014).Google ScholarGoogle Scholar
  13. Weiyu Li, Yaohua Liu, Zhi Tian, and Qing Ling. 2019. Communication-censored linearized ADMM for decentralized consensus optimization. IEEE Transactions on Signal and Information Processing over Networks 6 (2019), 18–34.Google ScholarGoogle ScholarCross RefCross Ref
  14. Chih-Jen Lin, Ruby C Weng, and S Sathiya Keerthi. 2007. Trust region newton methods for large-scale logistic regression. In Proceedings of the 24th international conference on Machine learning. 561–568.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Qinyi Luo, Jinkun Lin, Youwei Zhuo, and Xuehai Qian. 2019. Hop: Heterogeneity-aware decentralized training. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 893–907.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484–489.Google ScholarGoogle Scholar
  17. K Sohn, H Lee, X Yan, C Cortes, N Lawrence, and D Lee. 2015. Advances in neural information processing systems. Neural Information Processing Systems Foundation, Curran Associates, Inc (2015), 3483–3491.Google ScholarGoogle Scholar
  18. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1–9.Google ScholarGoogle ScholarCross RefCross Ref
  19. Zhuojun Tian, Zhaoyang Zhang, Jue Wang, Xiaoming Chen, Wei Wang, and Huaiyu Dai. 2020. Distributed ADMM with synergetic communication and computation. IEEE Transactions on Communications 69, 1 (2020), 501–517.Google ScholarGoogle ScholarCross RefCross Ref
  20. Leslie G Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (1990), 103–111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Dongxia Wang, Yongmei Lei, Jinyang Xie, and Guozheng Wang. 2021. HSAC-ALADMM: an asynchronous lazy ADMM algorithm based on hierarchical sparse allreduce communication. The Journal of Supercomputing 77, 8 (2021), 8111–8134.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jinyang Xie and Yongmei Lei. 2019. ADMMLIB: A library of communication-efficient AD-ADMM for distributed machine learning. In IFIP International Conference on Network and Parallel Computing. Springer, 322–326.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Zheng Xu, Mario Figueiredo, and Tom Goldstein. 2017. Adaptive ADMM with spectral penalty parameter selection. In Artificial Intelligence and Statistics. PMLR, 718–727.Google ScholarGoogle Scholar
  24. Kun-Hsing Yu, Andrew L Beam, and Isaac S Kohane. 2018. Artificial intelligence in healthcare. Nature biomedical engineering 2, 10 (2018), 719–731.Google ScholarGoogle Scholar
  25. Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, and Eric P Xing. 2017. Poseidon: An efficient communication architecture for distributed deep learning on { GPU} clusters. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). 181–193.Google ScholarGoogle Scholar
  26. Ruiliang Zhang and James Kwok. 2014. Asynchronous distributed ADMM for consensus optimization. In International conference on machine learning. PMLR, 1701–1709.Google ScholarGoogle Scholar

Index Terms

  1. PSRA-HGADMM: A Communication Efficient Distributed ADMM Algorithm

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing
        August 2023
        858 pages
        ISBN:9798400708435
        DOI:10.1145/3605573

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 September 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate91of313submissions,29%
      • Article Metrics

        • Downloads (Last 12 months)75
        • Downloads (Last 6 weeks)22

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format