skip to main content
10.1145/3542929.3563463acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Pisces: efficient federated learning via guided asynchronous training

Published: 07 November 2022 Publication History

Abstract

Federated learning (FL) is typically performed in a synchronous parallel manner, and the involvement of a slow client delays the training progress. Current FL systems employ a participant selection strategy to select fast clients with quality data in each iteration. However, this is not always possible in practice, and the selection strategy has to navigate a knotty tradeoff between the speed and the data quality.
This paper makes a case for asynchronous FL by presenting Pisces, a new FL system with intelligent participant selection and model aggregation for accelerated training despite slow clients. To avoid incurring excessive resource cost and stale training computation, Pisces uses a novel scoring mechanism to identify suitable clients to participate in each training iteration. It also adapts the aggregation pace dynamically to bound the progress gap between the participating clients and the server, with a provable convergence guarantee in a smooth non-convex setting. We have implemented Pisces in an open-source FL platform, Plato, and evaluated its performance in large-scale experiments with popular vision and language models. Pisces outperforms the state-of-the-art synchronous and asynchronous alternatives, reducing the time-to-accuracy by up to 2.0X and 1.9X, respectively.

References

[1]
[n.d.]. Pisces codebase. https://github.com/SamuelGong/Pisces.
[2]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In OSDI.
[3]
Durmus Alp Emre Acar, Yue Zhao, Ramon Matas Navarro, Matthew Mattina, Paul N Whatmough, and Venkatesh Saligrama. 2021. Federated learning based on dynamic regularization. In ICLR.
[4]
Naman Agarwal, Peter Kairouz, and Ziyu Liu. 2021. The skellam mechanism for differentially private federated learning. In NeurIPS.
[5]
Maruan Al-Shedivat, Jennifer Gillenwater, Eric Xing, and Afshin Rostamizadeh. 2020. Federated Learning via Posterior Averaging: A New Perspective and Practical Algorithms. In ICLR.
[6]
Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečnỳ, Stefano Mazzocchi, H Brendan McMahan, et al. 2019. Towards federated learning at scale: System design. In MLSys.
[7]
Sebastian Caldas, Sai Meher Karthik Duddu, Peter Wu, Tian Li, Jakub Konečnỳ, H Brendan McMahan, Virginia Smith, and Ameet Talwalkar. 2019. Leaf: A benchmark for federated settings. In NeurIPS Workshop.
[8]
Zheng Chai, Ahsan Ali, Syed Zawad, Stacey Truex, Ali Anwar, Nathalie Baracaldo, Yi Zhou, Heiko Ludwig, Feng Yan, and Yue Cheng. 2020. Tifl: A tier-based federated learning system. In HPDC.
[9]
Mingqing Chen, Rajiv Mathews, Tom Ouyang, and Françoise Beaufays. 2019. Federated learning of out-of-vocabulary words. arXiv:1903.10635 (2019).
[10]
Yujing Chen, Yue Ning, Martin Slawski, and Huzefa Rangwala. 2020. Asynchronous online federated learning for edge devices with non-iid data. In Big Data.
[11]
Zheyi Chen, Weixian Liao, Kun Hua, Chao Lu, and Wei Yu. 2021. Towards asynchronous federated learning for heterogeneous edge-powered internet of things. In DCN.
[12]
Henggang Cui, James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu Kumar, Jinliang Wei, Wei Dai, Gregory R Ganger, Phillip B Gibbons, et al. 2014. Exploiting bounded staleness to speed up big data analytics. In ATC.
[13]
Li Deng. 2012. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine 29, 6 (2012), 141--142.
[14]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference. Springer.
[15]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD.
[16]
Minghong Fang, Xiaoyu Cao, Jinyuan Jia, and Neil Gong. 2020. Local model poisoning attacks to {Byzantine-Robust} federated learning. In USENIX Security.
[17]
Google. 2021. Your chats stay private while messages improves suggestions. https://support.google.com/messages/answer/9327902.
[18]
Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, and Daniel Ramage. 2018. Federated learning for mobile keyboard prediction. arXiv:1811.03604 (2018).
[19]
Florian Hartmann, Sunah Suh, Arkadiusz Komarzewski, Tim D Smith, and Ilana Segall. 2019. Federated learning for ranking browser history suggestions. arXiv:1911.11807 (2019).
[20]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.
[21]
Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B Gibbons, Garth A Gibson, Greg Ganger, and Eric P Xing. 2013. More effective distributed ml via a stale synchronous parallel parameter server. In NeurIPS.
[22]
Kevin Hsieh, Amar Phanishayee, Onur Mutlu, and Phillip Gibbons. 2020. The non-iid data quagmire of decentralized machine learning. In ICML.
[23]
Tzu-Ming Harry Hsu, Hang Qi, and Matthew Brown. 2019. Measuring the effects of non-identical data distribution for federated visual classification. arXiv:1909.06335 (2019).
[24]
Dzmitry Huba, John Nguyen, Kshitiz Malik, Ruiyu Zhu, Mike Rabbat, Ashkan Yousefpour, Carole-Jean Wu, Hongyuan Zhan, Pavel Ustinov, Harish Srinivas, et al. 2022. Papaya: Practical, Private, and Scalable Federated Learning. In MLSys.
[25]
Ahmed Imteaj and M Hadi Amini. 2020. Fedar: Activity and resource-aware federated learning model for distributed mobile robots. In 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA).
[26]
Zhipeng Jia and Emmett Witchel. 2021. Boki: Stateful Serverless Computing with Shared Logs. In SOSP.
[27]
Zhifeng Jiang, Wei Wang, and Ruichuan Chen. 2022. Taming Client Dropout for Distributed Differential Privacy in Federated Learning. arXiv:2209.12528 (2022).
[28]
Zhifeng Jiang, Wei Wang, Bo Li, and Qiang Yang. 2022. Towards Efficient Synchronous Federated Training: A Survey on System Optimization Strategies. IEEE Transactions on Big Data (2022).
[29]
Tyler B Johnson and Carlos Guestrin. 2018. Training deep models faster with robust, approximate importance sampling. NeurIPS.
[30]
Kaggle. 2021. Stack Overflow Data. https://www.kaggle.com/stackoverflow/stackoverflow.
[31]
Peter Kairouz, Ziyu Liu, and Thomas Steinke. 2021. The distributed discrete gaussian mechanism for federated learning with secure aggregation. In ICML.
[32]
Peter Kairouz, Brendan McMahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, and Zheng Xu. 2021. Practical and private (deep) learning without sampling or shuffling. In ICML.
[33]
Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. 2021. Advances and open problems in federated learning. Foundations and Trends in Machine Learning 14, 1 (2021).
[34]
Angelos Katharopoulos and François Fleuret. 2018. Not all samples are created equal: Deep learning with importance sampling. In ICML.
[35]
Young Geun Kim and Carole-Jean Wu. 2021. Autofl: Enabling heterogeneity-aware energy efficient federated learning. In MICRO.
[36]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980 (2014).
[37]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
[38]
Fan Lai, Xiangfeng Zhu, Harsha V Madhyastha, and Mosharaf Chowdhury. 2021. Oort: Efficient Federated Learning via Guided Participant Selection. In OSDI.
[39]
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. Albert: A lite bert for self-supervised learning of language representations. In ICLR.
[40]
Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. 1989. Backpropagation applied to handwritten zip code recognition. Neural computation 1, 4 (1989), 541--551.
[41]
Yunseong Lee, Alberto Scolari, Byung-Gon Chun, Marco Domenico Santambrogio, Markus Weimer, and Matteo Interlandi. 2018. PRETZEL: Opening the black box of machine learning prediction serving systems. In OSDI.
[42]
Mu Li, David G Andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J Shekita, and Bor-Yiing Su. 2014. Scaling distributed machine learning with the parameter server. In OSDI.
[43]
Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. 2020. Federated optimization in heterogeneous networks. MLSys.
[44]
Wenqi Li, Fausto Milletarì, Daguang Xu, Nicola Rieke, Jonny Hancox, Wentao Zhu, Maximilian Baust, Yan Cheng, Sébastien Ourselin, M Jorge Cardoso, et al. 2019. Privacy-preserving federated brain tumour segmentation. In MLMI Workshop.
[45]
Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and Zhihua Zhang. 2020. On the convergence of fedavg on non-iid data. In ICLR.
[46]
Heiko Ludwig, Nathalie Baracaldo, Gegi Thomas, Yi Zhou, Ali Anwar, Shashank Rajamoni, Yuya Ong, Jayaram Radhakrishnan, Ashish Verma, Mathieu Sinn, et al. 2020. Ibm federated learning: an enterprise framework white paper v0. 1. arXiv:2007.10987 (2020).
[47]
Horia Mania, Xinghao Pan, Dimitris Papailiopoulos, Benjamin Recht, Kannan Ramchandran, and Michael I Jordan. 2017. Perturbed iterate analysis for asynchronous stochastic optimization. SIAM Journal on Optimization 27, 4 (2017), 2202--2229.
[48]
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In AISTATS.
[49]
MindSpore-AI. 2021. MindSpore. https://gitee.com/mindspore/mindspore.
[50]
John Nguyen, Kshitiz Malik, Hongyuan Zhan, Ashkan Yousefpour, Michael Rabbat, Mani Malek Esmaeili, and Dzmitry Huba. 2022. Federated Learning with Buffered Asynchronous Aggregation. In AISTATS.
[51]
Takayuki Nishio and Ryo Yonetani. 2019. Client selection for federated learning with heterogeneous resources in mobile edge. In ICC.
[52]
NVIDIA. 2020. Triaging COVID-19 Patients: 20 Hospitals in 20 Days Build AI Model that Predicts Oxygen Needs. https://blogs.nvidia.com/blog/2020/10/05/federated-learning-covid-oxygen-needs/.
[53]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS.
[54]
Matthias Paulik, Matt Seigel, Henry Mason, Dominic Telaar, Joris Kluivers, Rogier van Dalen, Chi Wai Lau, Luke Carlson, Filip Granqvist, Chris Vandevelde, et al. 2021. Federated Evaluation and Tuning for On-Device Personalization: System Design & Applications. arXiv:2102.08503 (2021).
[55]
Swaroop Ramaswamy, Rajiv Mathews, Kanishka Rao, and Françoise Beaufays. 2019. Federated learning for emoji prediction in a mobile keyboard. arXiv:1906.04329 (2019).
[56]
Sashank J Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečnỳ, Sanjiv Kumar, and Hugh Brendan McMahan. 2020. Adaptive Federated Optimization. In ICLR.
[57]
Guomei Shi, Li Li, Jun Wang, Wenyan Chen, Kejiang Ye, and ChengZhong Xu. 2020. HySync: Hybrid Federated Learning with Effective Synchronization. In HPCC/SmartCity/DSS.
[58]
Samuel L Smith, Pieter-Jan Kindermans, Chris Ying, and Quoc V Le. 2018. Don't decay the learning rate, increase the batch size. In ICLR.
[59]
Alexander Smola and Shravan Narayanamurthy. 2010. An architecture for parallel topic models. In VLDB.
[60]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929--1958.
[61]
Sebastian U Stich. 2018. Local SGD converges fast and communicates little. arXiv:1805.09767 (2018).
[62]
Ningxin Su and Baochun Li. 2022. How Asynchronous can Federated Learning Be?. In IWQoS.
[63]
Huangshi Tian, Minchen Yu, and Wei Wang. 2021. CrystalPerf: Learning to Characterize the Performance of Dataflow Computation through Code Analysis. In ATC.
[64]
TL-System. 2021. Plato: A New Framework for Federated Learning Research. https://github.com/TL-System/plato.
[65]
WeBank. 2020. Utilization of FATE in Anti Money Laundering Through Multiple Banks. https://www.fedai.org/cases/utilization-of-fate-in-anti-money-laundering-through-multiple-banks/.
[66]
Ashia C Wilson, Rebecca Roelofs, Mitchell Stern, Nati Srebro, and Benjamin Recht. 2017. The marginal value of adaptive gradient methods in machine learning. In NeurIPS.
[67]
Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, et al. 2019. Machine learning at facebook: Understanding inference at the edge. In HPCA.
[68]
Cong Xie, Sanmi Koyejo, and Indranil Gupta. 2020. Asynchronous federated optimization. In OPT.
[69]
Chengxu Yang, QiPeng Wang, Mengwei Xu, Zhenpeng Chen, Kaigui Bian, Yunxin Liu, and Xuanzhe Liu. 2021. Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data. In WWW.
[70]
Timothy Yang, Galen Andrew, Hubert Eichner, Haicheng Sun, Wei Li, Nicholas Kong, Daniel Ramage, and Françoise Beaufays. 2018. Applied federated learning: Improving google keyboard query suggestions. arXiv:1812.02903 (2018).
[71]
Hao Yu, Sen Yang, and Shenghuo Zhu. 2019. Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning. In AAAI.

Cited By

View all
  • (2024)FedASMUProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i12.29297(13900-13908)Online publication date: 20-Feb-2024
  • (2024)FedRoLA: Robust Federated Learning Against Model Poisoning via Layer-based AggregationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671906(3667-3678)Online publication date: 25-Aug-2024
  • (2024)Staleness-Controlled Asynchronous Federated Learning: Accuracy and Efficiency TradeoffIEEE Transactions on Mobile Computing10.1109/TMC.2024.341621623:12(12621-12634)Online publication date: Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SoCC '22: Proceedings of the 13th Symposium on Cloud Computing
November 2022
574 pages
ISBN:9781450394147
DOI:10.1145/3542929
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. asynchronous training
  2. efficiency
  3. federated learning

Qualifiers

  • Research-article

Funding Sources

  • Research Grant Council & ACCESS ? AI Chip Center for Emerging Smart Systems, HKSAR
  • Research Grant Council

Conference

SoCC '22
Sponsor:
SoCC '22: ACM Symposium on Cloud Computing
November 7 - 11, 2022
California, San Francisco

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)166
  • Downloads (Last 6 weeks)3
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)FedASMUProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i12.29297(13900-13908)Online publication date: 20-Feb-2024
  • (2024)FedRoLA: Robust Federated Learning Against Model Poisoning via Layer-based AggregationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671906(3667-3678)Online publication date: 25-Aug-2024
  • (2024)Staleness-Controlled Asynchronous Federated Learning: Accuracy and Efficiency TradeoffIEEE Transactions on Mobile Computing10.1109/TMC.2024.341621623:12(12621-12634)Online publication date: Dec-2024
  • (2024) Polaris: Accelerating Asynchronous Federated Learning With Client Selection IEEE Transactions on Cloud Computing10.1109/TCC.2024.337068812:2(446-458)Online publication date: Apr-2024
  • (2024)An Accurate and Efficient Clustered Federated Learning for Mobile Edge Devices2024 IEEE/ACM Symposium on Edge Computing (SEC)10.1109/SEC62691.2024.00017(110-122)Online publication date: 4-Dec-2024
  • (2024)Democratizing the Federation in Federated Learning2024 IEEE 21st International Conference on Mobile Ad-Hoc and Smart Systems (MASS)10.1109/MASS62177.2024.00017(38-46)Online publication date: 23-Sep-2024
  • (2024)Asynchronous Federated Multi Microgrid Energy Management Method Considering Carbon Trading2024 IEEE 4th International Conference on Electronic Technology, Communication and Information (ICETCI)10.1109/ICETCI61221.2024.10594337(224-229)Online publication date: 24-May-2024
  • (2024)Apodotiko: Enabling Efficient Serverless Federated Learning in Heterogeneous Environments2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00032(206-215)Online publication date: 6-May-2024
  • (2024)An Efficient Privacy-Preserving Asynchronous Federated Approach for Intelligent Decision Making in Equipment Maintenance2024 IEEE 7th International Conference on Big Data and Artificial Intelligence (BDAI)10.1109/BDAI62182.2024.10692438(136-141)Online publication date: 5-Jul-2024
  • (2023)AuxoProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624651(125-141)Online publication date: 30-Oct-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media