ABSTRACT
Among distributed machine learning algorithms, the global consensus alternating direction method of multipliers (ADMM) has attracted much attention because it can effectively solve large-scale optimization problems. However, the high communication cost slows its convergence and limits scalability. To solve the problem, we propose a hierarchical grouping ADMM algorithm (PSRA-HGADMM) with a novel Ring-Allreduce communication model in this paper. Firstly, we optimize the parameter exchange of the ADMM algorithm and implement the global consensus ADMM algorithm in the decentralized architecture. Secondly, to improve the communication efficiency of the distributed system, we propose a novel Ring-Allreduce communication model (PSR-Allreduce) based on the idea of parameter server architecture. Finally, a Worker-Leader-Group generator (WLG) framework is designed to solve the problem of inconsistency of cluster nodes. This framework combines hierarchical parameter aggregation and adopts the grouping strategy to improve the scalability of the distributed system. Experiments show that PSRA-HGADMM has better convergence performance and better scalability than ADMMLib and AD-ADMM. Compared with ADMMLib, the overall communication cost of PSRA-HGADMM is reduced by 32%.
- Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, 2016. Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning. PMLR, 173–182.Google Scholar
- Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning 3, 1 (2011), 1–122.Google ScholarDigital Library
- Anis Elgabli, Jihong Park, Amrit S Bedi, Mehdi Bennis, and Vaneet Aggarwal. 2020. GADMM: Fast and communication efficient framework for distributed machine learning.J. Mach. Learn. Res. 21, 76 (2020), 1–39.Google Scholar
- Anis Elgabli, Jihong Park, Amrit Singh Bedi, Chaouki Ben Issaid, Mehdi Bennis, and Vaneet Aggarwal. 2020. Q-GADMM: Quantized group ADMM for communication efficient decentralized machine learning. IEEE Transactions on Communications 69, 1 (2020), 164–181.Google ScholarCross Ref
- Andrew Gibiansky. 2017. Bringing HPC techniques to deep learning. Baidu Research, Tech. Rep. (2017).Google Scholar
- William Gropp, William D Gropp, Ewing Lusk, Anthony Skjellum, and Argonne Distinguished Fellow Emeritus Ewing Lusk. 1999. Using MPI: portable parallel programming with the message-passing interface. Vol. 1. MIT press.Google Scholar
- Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md Patwary, Mostofa Ali, Yang Yang, and Yanqi Zhou. 2017. Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409 (2017).Google Scholar
- Qirong Ho, James Cipar, Henggang Cui, Jin Kyu Kim, and Eric P Xing. 2013. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server. Advances in Neural Information Processing Systems 2013, 2013 (2013), 1223.Google Scholar
- Xin Huang, Guozheng Wang, and Yongmei Lei. 2021. GR-ADMM: A Communication Efficient Algorithm Based on ADMM. In 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom). IEEE, 220–227.Google Scholar
- Melvin Johnson, Mike Schuster, Quoc V Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5 (2017), 339–351.Google ScholarCross Ref
- Arun Kumar, Matthias Boehm, and Jun Yang. 2017. Data management in machine learning: Challenges, techniques, and systems. In Proceedings of the 2017 ACM International Conference on Management of Data. 1717–1722.Google ScholarDigital Library
- Mu Li, David G Andersen, Alexander J Smola, and Kai Yu. 2014. Communication efficient distributed machine learning with the parameter server. Advances in Neural Information Processing Systems 27 (2014).Google Scholar
- Weiyu Li, Yaohua Liu, Zhi Tian, and Qing Ling. 2019. Communication-censored linearized ADMM for decentralized consensus optimization. IEEE Transactions on Signal and Information Processing over Networks 6 (2019), 18–34.Google ScholarCross Ref
- Chih-Jen Lin, Ruby C Weng, and S Sathiya Keerthi. 2007. Trust region newton methods for large-scale logistic regression. In Proceedings of the 24th international conference on Machine learning. 561–568.Google ScholarDigital Library
- Qinyi Luo, Jinkun Lin, Youwei Zhuo, and Xuehai Qian. 2019. Hop: Heterogeneity-aware decentralized training. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 893–907.Google ScholarDigital Library
- David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484–489.Google Scholar
- K Sohn, H Lee, X Yan, C Cortes, N Lawrence, and D Lee. 2015. Advances in neural information processing systems. Neural Information Processing Systems Foundation, Curran Associates, Inc (2015), 3483–3491.Google Scholar
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1–9.Google ScholarCross Ref
- Zhuojun Tian, Zhaoyang Zhang, Jue Wang, Xiaoming Chen, Wei Wang, and Huaiyu Dai. 2020. Distributed ADMM with synergetic communication and computation. IEEE Transactions on Communications 69, 1 (2020), 501–517.Google ScholarCross Ref
- Leslie G Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (1990), 103–111.Google ScholarDigital Library
- Dongxia Wang, Yongmei Lei, Jinyang Xie, and Guozheng Wang. 2021. HSAC-ALADMM: an asynchronous lazy ADMM algorithm based on hierarchical sparse allreduce communication. The Journal of Supercomputing 77, 8 (2021), 8111–8134.Google ScholarDigital Library
- Jinyang Xie and Yongmei Lei. 2019. ADMMLIB: A library of communication-efficient AD-ADMM for distributed machine learning. In IFIP International Conference on Network and Parallel Computing. Springer, 322–326.Google ScholarDigital Library
- Zheng Xu, Mario Figueiredo, and Tom Goldstein. 2017. Adaptive ADMM with spectral penalty parameter selection. In Artificial Intelligence and Statistics. PMLR, 718–727.Google Scholar
- Kun-Hsing Yu, Andrew L Beam, and Isaac S Kohane. 2018. Artificial intelligence in healthcare. Nature biomedical engineering 2, 10 (2018), 719–731.Google Scholar
- Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, and Eric P Xing. 2017. Poseidon: An efficient communication architecture for distributed deep learning on { GPU} clusters. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). 181–193.Google Scholar
- Ruiliang Zhang and James Kwok. 2014. Asynchronous distributed ADMM for consensus optimization. In International conference on machine learning. PMLR, 1701–1709.Google Scholar
Index Terms
- PSRA-HGADMM: A Communication Efficient Distributed ADMM Algorithm
Recommendations
Communication-efficient ADMM-based distributed algorithms for sparse training
AbstractIn large-scale distributed machine learning (DML), the synchronization efficiency of the distributed algorithm becomes a critical factor that affects the training time of machine learning models as the computing scale increases. To ...
An efficient ADMM algorithm for multidimensional anisotropic total variation regularization problems
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data miningTotal variation (TV) regularization has important applications in signal processing including image denoising, image deblurring, and image reconstruction. A significant challenge in the practical use of TV regularization lies in the nondifferentiable ...
DN-ADMM: Distributed Newton ADMM for Multi-agent Optimization
2021 60th IEEE Conference on Decision and Control (CDC)In a multi-agent network, we consider the problem of minimizing an objective function that is expressed as the sum of private convex and smooth functions, and a (possibly) non-differentiable convex regularizer. We propose a novel distributed second-order ...
Comments