research-article

Anarchic Federated learning with Delayed Gradient Averaging

Authors:

Xiaowen GongAuthors Info & Claims

MobiHoc '23: Proceedings of the Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

Pages 21 - 30

https://doi.org/10.1145/3565287.3610273

Published: 16 October 2023 Publication History

Abstract

The rapid advances in federated learning (FL) in the past few years have recently inspired a great deal of research on this emerging topic. Existing work on FL often assume that clients participate in the learning process with some particular pattern (such as balanced participation), and/or in a synchronous manner, and/or with the same number of local iterations, while these assumptions can be hard to hold in practice. In this paper, we propose AFL-DGA, an Anarchic Federated Learning algorithm with Delayed Gradient Averaging, which gives maximum freedom to clients. In particular, AFL-DGA allows clients to 1) participate in any rounds; 2) participate asynchronously; 3) participate with any number of local iterations; 4) perform gradient computations and gradient communications in parallel. The proposed AFL-DGA algorithm enables clients to participate in FL flexibly according to their heterogeneous and time-varying computation and communication capabilities, and also efficiently by improving utilization of their computation and communication resources. We characterize performance bounds on the learning loss of AFL-DGA as a function of clients' local iteration numbers, local model delays, and global model delays. Our results show that the AFL-DGA algorithm can achieve a convergence rate of [EQUATION] and also a linear convergence speedup, which matches that of existing benchmarks. The results also characterize the impacts of various system parameters on the learning loss, which provide useful insights. Numerical results demonstrate the efficiency of the proposed algorithm.

References

[1]

Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. 2017. QSGD: Communication-efficient SGD via gradient quantization and encoding. In Advances in Neural Information Processing Systems (NIPS).

[2]

Dmitrii Avdiukhin and Shiva Kasiviswanathan. 2021. Federated learning under arbitrary communication patterns. In International Conference on Machine Learning. PMLR, 425--435.

[3]

Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konecny, Stefano Mazzocchi, H Brendan McMahan, et al. 2019. Towards federated learning at scale: System design. In Machine Learning and Systems (MLSys).

[4]

Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar. 2020. Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach. In Advances in Neural Information Processing Systems (NIPS).

[5]

Yuanxiong Guo, Ying Sun, Rui Hu, and Yanmin Gong. 2022. Hybrid local SGD for federated learning with heterogeneous communications. In International Conference on Learning Representations (ICLR).

[6]

Yan Huang, Ying Sun, Zehan Zhu, Changzhi Yan, and Jinming Xu. 2022. Tackling data heterogeneity: A new unified framework for decentralized sgd with sample-induced topology. International Conferene on Machine Learning (ICML) (2022).

[7]

Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh. 2020. Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning (ICML).

[8]

Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).

[9]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.

[10]

Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and Zhihua Zhang. 2020. On the convergence of FedAvg on non-IID data. In International Conference on Learning Representations (ICLR).

[11]

Xiangru Lian, Yijun Huang, Yuncheng Li, and Ji Liu. 2015. Asynchronous parallel stochastic gradient for nonconvex optimization. In Advances in Neural Information Processing Systems (NIPS).

[12]

Brendan McMahan and Daniel Ramage. 2017. Federated learning: Collaborative machine learning without centralized training data. Google Research Blog 3 (2017).

[13]

Aritra Mitra, Rayana Jaafar, George J Pappas, and Hamed Hassani. 2021. Linear convergence in federated learning: Tackling client heterogeneity and sparse gradients. Advances in Neural Information Processing Systems (NeurIPS) (2021).

[14]

mysun95. 2020. Federated-Learning-master. https://github.com/mysun95/Federated-Learning-master.

[15]

Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečnỳ, Sanjiv Kumar, and H Brendan McMahan. 2021. Adaptive federated optimization. In International Conference on Learning Representations (ICLR).

[16]

Sebastian U Stich. 2019. Local SGD converges fast and communicates little. In International Conference on Learning Representations (ICLR).

[17]

Ananda Theertha Suresh, Felix X Yu, Sanjiv Kumar, and H Brendan McMahan. 2017. Distributed mean estimation with limited communication. In International Conference on Machine Learning (ICML).

[18]

Nguyen H Tran, Wei Bao, Albert Zomaya, Nguyen Minh NH, and Choong Seon Hong. 2019. Federated learning over wireless networks: Optimization model design and analysis. In IEEE International Conference on Computer Communications (INFOCOM).

Digital Library

[19]

Jianyu Wang, Qinghua Liu, Hao Liang, Gauri Joshi, and H. Vincent Poor. 2020. Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization. Advances in Neural Information Processing Systems (NIPS) (2020).

[20]

Jianyu Wang, Qinghua Liu, Hao Liang, Gauri Joshi, and H Vincent Poor. 2021. A novel framework for the analysis and design of heterogeneous federated learning. IEEE Transactions on Signal Processing 69 (2021), 5234--5249.

[21]

Shiqiang Wang and Mingyue Ji. 2022. A Unified Analysis of Federated Learning with Arbitrary Client Participation. In Advances in Neural Information Processing Systems (NIPS).

[22]

Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2017. Terngrad: Ternary gradients to reduce communication in distributed deep learning. In Advances in Neural Information Processing Systems (NIPS).

[23]

Haibo Yang, Minghong Fang, and Jia Liu. 2021. Achieving linear speedup with partial worker participation in non-iid federated learning. In International Conference on Learning Representations (ICLR).

[24]

Haibo Yang, Xin Zhang, Prashant Khanduri, and Jia Liu. 2022. Anarchic federated learning. In International Conference on Machine Learning (ICML).

[25]

Xin Zhang, Minghong Fang, Zhuqing Liu, Haibo Yang, Jia Liu, and Zhengyuan Zhu. 2022. Net-Fleet: Achieving linear convergence speedup for fully decentralized federated learning with heterogeneous data. In ACM International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing (MobiHoc).

Digital Library

[26]

Xin Zhang, Jia Liu, Zhengyuan Zhu, and Elizabeth Serena Bentley. 2021. GT-Storm: Taming sample, communication, and memory complexities in decentralized non-convex learning. In ACM International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing (MobiHoc).

Digital Library

[27]

Ligeng Zhu, Hongzhou Lin, Yao Lu, Yujun Lin, and Song Han. 2021. Delayed gradient averaging: Tolerate the communication latency for federated learning. In Advances in Neural Information Processing Systems (NIPS).

Index Terms

Anarchic Federated learning with Delayed Gradient Averaging
1. Networks
  1. Network services
2. Theory of computation
  1. Design and analysis of algorithms
    1. Distributed algorithms

Recommendations

Adaptive asynchronous federated learning
Abstract
Federated Learning enables data owners to train an artificial intelligence model collaboratively while keeping all the training data locally, reducing the possibility of personal data breaches. However, the heterogeneity of local resources and ...
Highlights
- Design a novel adaptive asynchronous federated learning framework with momentum.
- Propose an adaptive weight allocation algorithm for the asynchronous model update.
- Investigate the impact of dynamic environments on federated ...
Multimodal federated learning: Concept, methods, applications and future directions
Abstract
Multimodal learning mines and analyzes multimodal data in reality to better understand and appreciate the world around people. However, how to exploit this rich multimodal data without violating user privacy is a key issue. Federated learning is ...
Highlights
- The three different modes in the multimodal federated learning model are summarized.
- Multimodal fusion based on the federated learning framework is also specified.
- The difficulties and some ideas of multimodal federated learning ...
Gradient Calibration for Non-I.I.D. Federated Learning
FedEdge '23: Proceedings of the 2nd ACM Workshop on Data Privacy and Federated Learning Technologies for Mobile Edge Network

Federated learning (FL) has yielded impressive results in recent years. However, its effectiveness on non-independently and identically distributed (non-i.i.d) data remains challenging. Existing work aims to address this challenge through client ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MobiHoc '23: Proceedings of the Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

October 2023

621 pages

ISBN:9781450399265

DOI:10.1145/3565287

General Chairs:
Jie Wu,
Suresh Subramaniam,
Program Chairs:
Bo Ji,
Carla Fabiana Chiasserini

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMOBILE: ACM Special Interest Group on Mobility of Systems, Users, Data and Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MobiHoc '23

Sponsor:

SIGMOBILE

MobiHoc '23: Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

October 23 - 26, 2023

DC, Washington, USA

Acceptance Rates

Overall Acceptance Rate 296 of 1,843 submissions, 16%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
250
Total Downloads

Downloads (Last 12 months)136
Downloads (Last 6 weeks)10

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten