ABSTRACT
Federated learning (FL) is a distributed optimization paradigm that learns from data samples distributed across a number of clients. Adaptive client selection that is cognizant of the training progress of clients has become a major trend to improve FL efficiency but not yet well-understood. Most existing FL methods such as FedAvg and its state-of-the-art variants implicitly assume that all learning phases during the FL training process are equally important. Unfortunately, this assumption has been revealed to be invalid due to recent findings on critical learning periods (CLP), in which small gradient errors may lead to an irrecoverable deficiency on final test accuracy. In this paper, we develop CriticalFL, a CLP augmented FL framework to reveal that adaptively augmenting exiting FL methods with CLP, the resultant performance is significantly improved when the client selection is guided by the discovered CLP. Experiments based on various machine learning models and datasets validate that the proposed CriticalFL framework consistently achieves an improved model accuracy while maintains better communication efficiency as compared to state-of-the-art methods, demonstrating a promising and easily adopted method for tackling the heterogeneity of FL training.
Supplemental Material
- Reisizadeh A., Mokhtari A., and Hassani H. 2020. Fedpaq: A communicationefficient federated learning method with periodic averaging and quantization. In Proc. of AISTATS.Google Scholar
- Alessandro Achille, Matteo Rovere, and Stefano Soatto. 2019. Critical Learning Periods in Deep Networks. In Proc. of ICLR.Google Scholar
- Idan Achituve, Aviv Shamsian, Aviv Navon, Gal Chechik, and Ethan Fetaya. 2021. Personalized Federated Learning with Gaussian Processes. Proc. of NeurIPS (2021).Google Scholar
- Debraj Basu, Deepesh Data, Can Karakus, and Suhas Diggavi. 2019. Qsparse-local- SGD: Distributed SGD with Quantization, Sparsification and Local Computations. In Proc. of NeurIPS.Google Scholar
- Yae Jee Cho, Jianyu Wang, and Gauri Joshi. 2020. Client Selection in Federated Learning: Convergence Analysis and Power-of-Choice Selection Strategies. arXiv preprint arXiv:2010.01243 (2020).Google Scholar
- Yae Jee Cho, Jianyu Wang, and Gauri Joshi. 2022. Towards Understanding Biased Client Selection in Federated Learning. In Proc. of AISTATS.Google Scholar
- Jonathan Frankle, David J Schwab, and Ari S Morcos. 2020. The Early Phase of Neural Network Training. In Proc. of ICLR.Google Scholar
- Aditya Sharad Golatkar, Alessandro Achille, and Stefano Soatto. 2019. Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence. Proc. of NeurIPS (2019).Google Scholar
- Farzin Haddadpour, Mohammad Mahdi Kamani, Aryan Mokhtari, and Mehrdad Mahdavi. 2021. Federated Learning with Compression: Unified Analysis and Sharp Guarantees. In Proc. of AISTATS.Google Scholar
- Jenny Hamer, Mehryar Mohri, and Ananda Theertha Suresh. 2020. FedBoost: A Communication-Efficient Algorithm for Federated Learning. In Proc. of ICML.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proc. of IEEE CVPR.Google ScholarCross Ref
- Samuel Horváth and Peter Richtarik. 2021. A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning. In Proc. of ICLR.Google Scholar
- Ahmed Imteaj, Khandaker Mamun Ahmed, Urmish Thakker, Shiqiang Wang, Jian Li, and M Hadi Amini. 2022. Federated Learning for Resource-Constrained IoT Devices: Panoramas and State of the Art. Federated and Transfer Learning (2022), 7--27.Google Scholar
- Ahmed Imteaj, Urmish Thakker, Shiqiang Wang, Jian Li, and M Hadi Amini. 2021. A survey on federated learning for resource-constrained IoT devices. IEEE Internet of Things Journal 9, 1 (2021), 1--24.Google ScholarCross Ref
- Stanislaw Jastrzebski, Devansh Arpit, Oliver Astrand, Giancarlo B Kerg, Huan Wang, Caiming Xiong, Richard Socher, Kyunghyun Cho, and Krzysztof J Geras. 2021. Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization. In Proc. of ICML.Google Scholar
- Stanislaw Jastrzebski, Zachary Kenton, Nicolas Ballas, Asja Fischer, Yoshua Bengio, and Amos J Storkey. 2019. On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length. In Proc. of ICLR.Google Scholar
- Stanislaw Jastrzebski, Maciej Szymczak, Stanislav Fort, Devansh Arpit, Jacek Tabor, Kyunghyun Cho, and Krzysztof Geras. 2020. The Break-Even Point on Optimization Trajectories of Deep Neural Networks. In Proc. of ICLR.Google Scholar
- Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. 2019. Advances and Open Problems in Federated Learning. arXiv preprint arXiv:1912.04977 (2019).Google Scholar
- Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh. 2020. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning. In Proc. of ICML.Google Scholar
- Angelos Katharopoulos and François Fleuret. 2018. Not All Samples Are Created Equal: Deep Learning with Importance Sampling. In Proc. of ICML.Google Scholar
- Ahmed Khaled, Konstantin Mishchenko, and Peter Richtárik. 2020. Tighter theory for local SGD on identical and heterogeneous data. In Proc. of AISTATS.Google Scholar
- Yoon Kim, Yacine Jernite, David Sontag, and Alexander M Rush. 2016. Characteraware neural language models. In Proc. of AAAI.Google Scholar
- Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning Multiple Layers of Features from Tiny Images. (2009).Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet Classification with Deep Convolutional Neural Networks. Proc. of NIPS (2012).Google ScholarDigital Library
- Fan Lai, Xiangfeng Zhu, Harsha V Madhyastha, and Mosharaf Chowdhury. 2021. Oort: Efficient Federated Learning via Guided Participant Selection. In Proc. of USENIX OSDI.Google Scholar
- Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. 2020. Federated Optimization in Heterogeneous Networks. In Proc. of MLSys.Google Scholar
- Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and Zhihua Zhang. 2020. On the Convergence of FedAvg on Non-IID Data. In Proc. of ICLR.Google Scholar
- Xianfeng Liang, Shuheng Shen, Jingchang Liu, Zhen Pan, Enhong Chen, and Yifei Cheng. 2019. Variance Reduced Local SGD with Lower Communication Complexity. arXiv preprint arXiv:1912.12844 (2019).Google Scholar
- Tao Lin, Lingjing Kong, Sebastian U Stich, and Martin Jaggi. 2020. Ensemble distillation for robust model fusion in federated learning. Proc. of NeurIPS (2020).Google Scholar
- Amiri M. M., Gunduz D., and Kulkarni S. R. 2020. Federated learning with quantized global model updates. arXiv preprint arXiv:2006.10672 (2020).Google Scholar
- Grigory Malinovskiy, Dmitry Kovalev, Elnur Gasanov, Laurent Condat, and Peter Richtarik. 2020. From local SGD to local fixed-point methods for federated learning. In Proc. of ICML.Google Scholar
- Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proc. of AISTATS.Google Scholar
- Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. In NIPS-W.Google Scholar
- Reese Pathak and Martin JWainwright. 2020. FedSplit: An algorithmic framework for fast federated optimization. Proc. of NeurIPS (2020).Google Scholar
- Hönig R., Zhao Y., and Mullins R. 2022. DAdaQuant: Doubly-adaptive quantization for communication-efficient Federated Learning. In Proc. of ICML.Google Scholar
- Sashank J. Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečný, Sanjiv Kumar, and Hugh Brendan McMahan. 2021. Adaptive Federated Optimization. In Proc. of ICLR.Google Scholar
- Monica Ribero and Haris Vikalo. 2020. Communication-Efficient Federated Learning via Optimal Client Sampling. arXiv preprint arXiv:2007.15197 (2020).Google Scholar
- Daniel Rothchild, Ashwinee Panda, Enayat Ullah, Nikita Ivkin, Ion Stoica, Vladimir Braverman, Joseph Gonzalez, and Raman Arora. 2020. FetchSGD: Communication-Efficient Federated Learning with Sketching. In Proc. of ICML.Google Scholar
- Yichen Ruan, Xiaoxi Zhang, Shu-Che Liang, and Carlee Joe-Wong. 2021. Towards Flexible Device Participation in Federated Learning. In Proc. of AISTATS.Google Scholar
- Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-scale Image Recognition. In Proc. of ICLR.Google Scholar
- Sebastian U Stich and Sai Praneeth Karimireddy. 2020. The error-feedback framework: Better rates for sgd with delayed gradients and compressed updates. Journal of Machine Learning Research 21 (2020), 1--36.Google Scholar
- Minxue Tang, Xuefei Ning, Yitu Wang, Jingwei Sun, Yu Wang, Hai Li, and Yiran Chen. 2022. FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning. In Proc. of IEEE/CVF CVPR.Google ScholarCross Ref
- Hao Wang, Zakhary Kaplan, Di Niu, and Baochun Li. 2020. Optimizing Federated Learning on Non-IID Data With Reinforcement Learning. In Proc. of IEEE INFOCOM.Google ScholarDigital Library
- Hongyi Wang, Mikhail Yurochkin, Yuekai Sun, Dimitris Papailiopoulos, and Yasaman Khazaeni. 2020. Federated Learning with Matched Averaging. In Proc. of ICLR.Google Scholar
- Jianyu Wang and Gauri Joshi. 2019. Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-update SGD. In Proc. of SysML.Google Scholar
- Jianyu Wang and Gauri Joshi. 2021. Cooperative SGD: A Unified Framework for the Design and Analysis of Local-Update SGD Algorithms. Journal of Machine Learning Research 22, 213 (2021), 1--50.Google Scholar
- Jianyu Wang, Qinghua Liu, Hao Liang, Gauri Joshi, and H Vincent Poor. 2020. Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization. Proc. of NeurIPS (2020).Google Scholar
- Jianqiao Wangni, Jialei Wang, Ji Liu, and Tong Zhang. 2018. Gradient sparsification for communication-efficient distributed optimization. Advances in Neural Information Processing Systems 31 (2018).Google Scholar
- Blake Woodworth, Kumar Kshitij Patel, Sebastian Stich, Zhen Dai, Brian Bullins, Brendan Mcmahan, Ohad Shamir, and Nathan Srebro. 2020. Is local SGD better than minibatch SGD?. In Proc. of ICML.Google Scholar
- Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv preprint arXiv:1708.07747 (2017).Google Scholar
- Guojun Xiong, Gang Yan, Rahul Singh, and Jian Li. 2021. Straggler-resilient distributed machine learning with dynamic backup workers. arXiv preprint arXiv:2102.06280 (2021).Google Scholar
- Gang Yan, Hao Wang, and Jian Li. 2022. Seizing Critical Learning Periods in Federated Learning. In Proc. of AAAI.Google ScholarCross Ref
- Gang Yan, Hao Wang, Xu Yuan, and Jian Li. 2023. CriticalFL: A Critical Learning Periods Augmented Client Selection Framework for Efficient Federated Learning. (2023). https://www.dropbox.com/s/m501qs0pppmgu9y/main.pdf?dl=0Google Scholar
- Gang Yan, Hao Wang, Xu Yuan, and Jian Li. 2023. DeFL: Defending Against Model Poisoning Attacks in Federated Learning via Critical Learning Periods Awareness. In Proc. of AAAI.Google ScholarCross Ref
- Zezhang Yang, Jian Li, and Ping Yang. 2021. Fedadmp: A joint anomaly detection and mobility prediction framework via federated learning. ICST Transactions on Security and Safety 8, 29 (2021).Google ScholarCross Ref
- Hao Yu, Sen Yang, and Shenghuo Zhu. 2019. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning. In Proc. of AAAI.Google ScholarDigital Library
Index Terms
- CriticalFL: A Critical Learning Periods Augmented Client Selection Framework for Efficient Federated Learning
Recommendations
A Review of Client Selection Mechanisms in Heterogeneous Federated Learning
Advanced Intelligent Computing Technology and ApplicationsAbstractFederated learning is a distributed machine learning approach that keeps data locally while achieving the utilization of fragmented data and protecting client privacy to a certain extent. However, the existence of data heterogeneity may cause ...
Towards an Efficient Client Selection System for Federated Learning
Cloud Computing – CLOUD 2022AbstractFederated learning is a popular distributed machine learning model where a centralized server orchestrates many distributed clients to coordinate the completion of model training or evaluation without sharing private or local data. More and more ...
On-device federated learning with fuzzy logic based client selection
RACS '22: Proceedings of the Conference on Research in Adaptive and Convergent SystemsWith the rapid development of IoT, more and more advanced sensor devices are used to collect and process a large number of datasets. In order to protect user privacy while training machine learning models with multiple datasets, federated learning (FL) ...
Comments