skip to main content
10.1145/3565287.3610273acmconferencesArticle/Chapter ViewAbstractPublication PagesmobihocConference Proceedingsconference-collections
research-article

Anarchic Federated learning with Delayed Gradient Averaging

Published: 16 October 2023 Publication History

Abstract

The rapid advances in federated learning (FL) in the past few years have recently inspired a great deal of research on this emerging topic. Existing work on FL often assume that clients participate in the learning process with some particular pattern (such as balanced participation), and/or in a synchronous manner, and/or with the same number of local iterations, while these assumptions can be hard to hold in practice. In this paper, we propose AFL-DGA, an Anarchic Federated Learning algorithm with Delayed Gradient Averaging, which gives maximum freedom to clients. In particular, AFL-DGA allows clients to 1) participate in any rounds; 2) participate asynchronously; 3) participate with any number of local iterations; 4) perform gradient computations and gradient communications in parallel. The proposed AFL-DGA algorithm enables clients to participate in FL flexibly according to their heterogeneous and time-varying computation and communication capabilities, and also efficiently by improving utilization of their computation and communication resources. We characterize performance bounds on the learning loss of AFL-DGA as a function of clients' local iteration numbers, local model delays, and global model delays. Our results show that the AFL-DGA algorithm can achieve a convergence rate of [EQUATION] and also a linear convergence speedup, which matches that of existing benchmarks. The results also characterize the impacts of various system parameters on the learning loss, which provide useful insights. Numerical results demonstrate the efficiency of the proposed algorithm.

References

[1]
Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. 2017. QSGD: Communication-efficient SGD via gradient quantization and encoding. In Advances in Neural Information Processing Systems (NIPS).
[2]
Dmitrii Avdiukhin and Shiva Kasiviswanathan. 2021. Federated learning under arbitrary communication patterns. In International Conference on Machine Learning. PMLR, 425--435.
[3]
Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konecny, Stefano Mazzocchi, H Brendan McMahan, et al. 2019. Towards federated learning at scale: System design. In Machine Learning and Systems (MLSys).
[4]
Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar. 2020. Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach. In Advances in Neural Information Processing Systems (NIPS).
[5]
Yuanxiong Guo, Ying Sun, Rui Hu, and Yanmin Gong. 2022. Hybrid local SGD for federated learning with heterogeneous communications. In International Conference on Learning Representations (ICLR).
[6]
Yan Huang, Ying Sun, Zehan Zhu, Changzhi Yan, and Jinming Xu. 2022. Tackling data heterogeneity: A new unified framework for decentralized sgd with sample-induced topology. International Conferene on Machine Learning (ICML) (2022).
[7]
Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh. 2020. Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning (ICML).
[8]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
[9]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.
[10]
Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and Zhihua Zhang. 2020. On the convergence of FedAvg on non-IID data. In International Conference on Learning Representations (ICLR).
[11]
Xiangru Lian, Yijun Huang, Yuncheng Li, and Ji Liu. 2015. Asynchronous parallel stochastic gradient for nonconvex optimization. In Advances in Neural Information Processing Systems (NIPS).
[12]
Brendan McMahan and Daniel Ramage. 2017. Federated learning: Collaborative machine learning without centralized training data. Google Research Blog 3 (2017).
[13]
Aritra Mitra, Rayana Jaafar, George J Pappas, and Hamed Hassani. 2021. Linear convergence in federated learning: Tackling client heterogeneity and sparse gradients. Advances in Neural Information Processing Systems (NeurIPS) (2021).
[14]
mysun95. 2020. Federated-Learning-master. https://github.com/mysun95/Federated-Learning-master.
[15]
Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečnỳ, Sanjiv Kumar, and H Brendan McMahan. 2021. Adaptive federated optimization. In International Conference on Learning Representations (ICLR).
[16]
Sebastian U Stich. 2019. Local SGD converges fast and communicates little. In International Conference on Learning Representations (ICLR).
[17]
Ananda Theertha Suresh, Felix X Yu, Sanjiv Kumar, and H Brendan McMahan. 2017. Distributed mean estimation with limited communication. In International Conference on Machine Learning (ICML).
[18]
Nguyen H Tran, Wei Bao, Albert Zomaya, Nguyen Minh NH, and Choong Seon Hong. 2019. Federated learning over wireless networks: Optimization model design and analysis. In IEEE International Conference on Computer Communications (INFOCOM).
[19]
Jianyu Wang, Qinghua Liu, Hao Liang, Gauri Joshi, and H. Vincent Poor. 2020. Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization. Advances in Neural Information Processing Systems (NIPS) (2020).
[20]
Jianyu Wang, Qinghua Liu, Hao Liang, Gauri Joshi, and H Vincent Poor. 2021. A novel framework for the analysis and design of heterogeneous federated learning. IEEE Transactions on Signal Processing 69 (2021), 5234--5249.
[21]
Shiqiang Wang and Mingyue Ji. 2022. A Unified Analysis of Federated Learning with Arbitrary Client Participation. In Advances in Neural Information Processing Systems (NIPS).
[22]
Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2017. Terngrad: Ternary gradients to reduce communication in distributed deep learning. In Advances in Neural Information Processing Systems (NIPS).
[23]
Haibo Yang, Minghong Fang, and Jia Liu. 2021. Achieving linear speedup with partial worker participation in non-iid federated learning. In International Conference on Learning Representations (ICLR).
[24]
Haibo Yang, Xin Zhang, Prashant Khanduri, and Jia Liu. 2022. Anarchic federated learning. In International Conference on Machine Learning (ICML).
[25]
Xin Zhang, Minghong Fang, Zhuqing Liu, Haibo Yang, Jia Liu, and Zhengyuan Zhu. 2022. Net-Fleet: Achieving linear convergence speedup for fully decentralized federated learning with heterogeneous data. In ACM International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing (MobiHoc).
[26]
Xin Zhang, Jia Liu, Zhengyuan Zhu, and Elizabeth Serena Bentley. 2021. GT-Storm: Taming sample, communication, and memory complexities in decentralized non-convex learning. In ACM International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing (MobiHoc).
[27]
Ligeng Zhu, Hongzhou Lin, Yao Lu, Yujun Lin, and Song Han. 2021. Delayed gradient averaging: Tolerate the communication latency for federated learning. In Advances in Neural Information Processing Systems (NIPS).

Index Terms

  1. Anarchic Federated learning with Delayed Gradient Averaging

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MobiHoc '23: Proceedings of the Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing
      October 2023
      621 pages
      ISBN:9781450399265
      DOI:10.1145/3565287
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 October 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. federated learning
      2. delayed gradient
      3. asynchronous

      Qualifiers

      • Research-article

      Conference

      MobiHoc '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 296 of 1,843 submissions, 16%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 250
        Total Downloads
      • Downloads (Last 12 months)136
      • Downloads (Last 6 weeks)10
      Reflects downloads up to 07 Mar 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media