research-article

Spread: Decentralized Model Aggregation for Scalable Federated Learning

Authors:

Huang Huang Liang,

Dan WangAuthors Info & Claims

ICPP '22: Proceedings of the 51st International Conference on Parallel Processing

Article No.: 75, Pages 1 - 12

https://doi.org/10.1145/3545008.3545030

Published: 13 January 2023 Publication History

Abstract

Federated learning (FL) is a new distributed machine learning paradigm that enables machine learning on edge devices. One unique feature of FL is that edge devices belong to individuals; and since they are not “owned” by the FL coordinator, but can be “federated” instead, there can potentially be a huge number of edge devices. In the current distributed ML architecture, the parameter server (PS) architecture, model aggregation is centralized. When facing a large number of edge devices, the centralized model aggregation becomes the bottleneck and fundamentally restricts system scalability.

In this paper, we present Spread to decentralize model aggregation. Spread is a tiered architecture where nodes are organized into clusters so that model aggregation can be offloaded to certain edge devices. We design a Spread-based FL system: it employs a new algorithm for cluster construction and an adaptive algorithm that regulates, in runtime, inter-cluster model training and intra-cluster model training. We present an implementation of a functional system by extending the Federated Learning system. Our evaluation shows that Spread can resolve the bottleneck of centralized model aggregation. Spread yields an 8.05 × and a 5.58 × model training speedup as compared to existing FL systems supported by the PS and allReduce architecture.

References

[1]

Yehya Abouelnaga, Ola S Ali, Hager Rady, and Mohamed Moustafa. 2016. Cifar-10: Knn-based ensemble of classifiers. In 2016 International Conference on Computational Science and Computational Intelligence (CSCI). IEEE, 1192–1195.

[2]

Hamed Hassani Ali Jadbabaie Ramtin Pedarsani Amirhossein Reisizadeh, Aryan Mokhtari. Aug., 2020. FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization. In Proc.of AISTATS’20. virtual.

[3]

Lukas Balles, Javier Romero, and Philipp Hennig. 2016. Coupling adaptive batch sizes with learning rates. arXiv preprint arXiv:1612.05086(2016).

[4]

Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konecny, Stefano Mazzocchi, H Brendan McMahan, 2019. Towards federated learning at scale: System design. arXiv preprint arXiv:1902.01046(2019).

[5]

Léon Bottou. Aug. 2010. Large-scale machine learning with stochastic gradient descent. In Proc. of Springer COMPSTAT’10. Paris, France.

[6]

Zheng Chai, Hannan Fayyaz, Zeshan Fayyaz, Ali Anwar, Yi Zhou, Nathalie Baracaldo, Heiko Ludwig, and Yue Cheng. May, 2019. Towards Taming the Resource and Data Heterogeneity in Federated Learning. In Proc. of USENIX OpML’19. Santa Clara, CA.

[7]

Chen Chen, Wei Wang, and Bo Li. Apr. 2019. Round-robin synchronization: Mitigating communication bottlenecks in parameter servers. In Proc. of IEEE INFOCOM’19. Paris, France.

Digital Library

[8]

Chi-Chung Chen, Chia-Lin Yang, and Hsiang-Yun Cheng. 2018. Efficient and robust parallel dnn training through model parallelism on multi-gpu platform. arXiv preprint arXiv:1809.02839(2018).

[9]

Yujing Chen, Yue Ning, Martin Slawski, and Huzefa Rangwala. Dec., 2020. Asynchronous Online Federated Learning for Edge Devices with Non-IID Data. In Proc. of IEEE BigData’20. Virtual.

[10]

Ashok Cutkosky and Róbert Busa-Fekete. Dec. 2018. Distributed stochastic optimization via adaptive SGD. In Proc. of NIPS’18. Montreal.

[11]

Jeff Daily, Abhinav Vishnu, Charles Siegel, Thomas Warfel, and Vinay Amatya. 2018. Gossipgrad: Scalable deep learning using gossip communication based asynchronous gradient descent. arXiv preprint arXiv:1803.05880(2018).

[12]

Sercan Demirci, Asil Yardimci, Muge Sayit, E Turhan Tunali, and Hasan Bulut. 2017. A hierarchical P2P clustering framework for video streaming systems. Computer Standards & Interfaces 49 (2017), 44–58.

[13]

Rainer Gemulla, Erik Nijkamp, Peter J Haas, and Yannis Sismanis. Aug. 2011. Large-scale matrix factorization with distributed stochastic gradient descent. In Proc. of ACM SIGKDD’11. San Diego, CA.

Digital Library

[14]

Jinkun Geng, Dan Li, and Shuai Wang. 2019. Elasticpipe: An efficient and dynamic model-parallel solution to dnn training. In Proceedings of the 10th Workshop on Scientific Cloud Computing. 5–9.

Digital Library

[15]

Andrew Gibiansky. 2017. Bringing HPC techniques to deep learning.URL http://research. baidu. com/bringing-hpc-techniques-deep-learning (2017).

[16]

Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. Sep.2018. Amc: Automl for model compression and acceleration on mobile devices. In Proc.of Springer ECCV’18. Munich, Germany.

[17]

István Hegedűs, Gábor Danner, and Márk Jelasity. Jun. 2019. Gossip Learning as a Decentralized Alternative to Federated Learning. In Proc. of Springer DAIS’19. Kongens Lyngby, Denmark.

Digital Library

[18]

Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B Gibbons, Garth A Gibson, Greg Ganger, and Eric P Xing. Dec. 2013. More effective distributed ml via a stale synchronous parallel parameter server. In Proc. of NIPS’13. Lake Tahoe.

[19]

Hanpeng Hu, Dan Wang, and Chuan Wu. Feb. 2020. Distributed Machine Learning through Heterogeneous Edge Systems. In Proc. of IEEE AAAI’20. New York, NY.

[20]

Patrick Hunt, Mahadev Konar, Flavio Paiva Junqueira, and Benjamin Reed. Jun. 2010. ZooKeeper: Wait-free Coordination for Internet-scale Systems. In Proc. of USENIX ATC’10. Boston, MA.

[21]

Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, and Fan Yang. 2019. Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads. arXiv preprint arXiv:1901.05758(2019).

[22]

Eunjeong Jeong, Seungeun Oh, Hyesung Kim, Jihong Park, Mehdi Bennis, and Seong-Lyun Kim. 2018. Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data. https://arxiv.org/abs/1811.11479

[23]

Shuang Jiang, Dong He, Chenxi Yang, Chenren Xu, Guojie Luo, Yang Chen, Yunlu Liu, and Jiangwei Jiang. Apr. 2018. Accelerating mobile applications at the network edge with software-programmable fpgas. In Proc. of IEEE INFOCOM’18. Honolulu, HI.

Digital Library

[24]

Jakub Konečnỳ, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492(2016).

[25]

Hao Li, Asim Kadav, Erik Kruus, and Cristian Ungureanu. Apr. 2015. Malt: distributed data-parallelism for existing ml applications. In Proc. of ACM EuroSys’15. New York, NY.

Digital Library

[26]

Mu Li, David G Andersen, Alexander J Smola, and Kai Yu. Dec. 2014. Communication efficient distributed machine learning with the parameter server. In Proc. of NIPS’14. Quebec, Canada.

[27]

Songze Li, Seyed Mohammadreza Mousavi Kalan, A Salman Avestimehr, and Mahdi Soltanolkotabi. May 2018. Near-optimal straggler mitigation for distributed gradient methods. In Proc. of IEEE IPDPSW’18. British Columbia, Canada.

[28]

Ji Liu, Stephen J Wright, Christopher Ré, Victor Bittorf, and Srikrishna Sridhar. 2015. An asynchronous parallel stochastic coordinate descent algorithm. The Journal of Machine Learning Research 16, 1 (2015), 285–322.

Digital Library

[29]

Lumin Liu, Jun Zhang, SH Song, and Khaled B Letaief. 2019. Edge-Assisted Hierarchical Federated Learning with Non-IID Data. arXiv preprint arXiv:1905.06641(2019).

[30]

H Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, 2016. Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629(2016).

[31]

C Merriman. 2017. “Google announces tpu 2.0 with 180 teraflop max out for ai acceleration.

[32]

Jed Mills, Jia Hu, and Geyong Min. 2020. Communication-Efficient Federated Learning for Wireless Edge Intelligence in IoT. IEEE Internet of Things Journal 7, 7 (2020), 5986–5994. https://doi.org/10.1109/JIOT.2019.2956615

[33]

Truong Thao Nguyen, Mohamed Wahib, and Ryousei Takano. Nov. 2018. Hierarchical Distributed-Memory Multi-Leader MPI-Allreduce for Deep Learning Workloads. In Proc. of IEEE CANDARW’18. Takayama.

[34]

Stefanos Nikolaou, Christos Anagnostopoulos, and Dimitrios Pezaros. 2019. Communication-aware edge-centric knowledge dissemination in edge computing environments. Real-Time Data Analytics for Large Scale Sensor Data (2019), 139.

[35]

Takayuki Nishio and Ryo Yonetani. May, 2019. Client selection for federated learning with heterogeneous resources in mobile edge. In Proc.of IEEE ICC’19. Shanghai, China.

[36]

Róbert Ormándi, István Hegedűs, and Márk Jelasity. 2013. Gossip learning with linear models on fully distributed data. Concurrency and Computation: Practice and Experience 25, 4(2013), 556–571.

[37]

Pitch Patarasuk and Xin Yuan. 2009. Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel and Distrib. Comput. 69, 2 (2009), 117–124.

Digital Library

[38]

Paraskevi Raftopoulou and Euripides GM Petrakis. Mar. 2008. iCluster: a self-organizing overlay network for P2P information retrieval. In Proc. of Springer ECIR. Glasgow, UK.

[39]

Hang Shi, Yue Zhao, Bofeng Zhang, Kenji Yoshigoe, and Athanasios V Vasilakos. Jun. 2019. A free stale synchronous parallel strategy for distributed machine learning. In Proc. of ACM BDE’19. New York, NY.

Digital Library

[40]

Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2019. Megatron-lm: Training multi-billion parameter language models using gpu model parallelism. arXiv preprint arXiv:1909.08053(2019).

[41]

Petr Slavık. 1997. A tight analysis of the greedy algorithm for set cover. Journal of Algorithms 25, 2 (1997), 237–254.

Digital Library

[42]

Suraj Srinivas and R Venkatesh Babu. 2015. Data-free parameter pruning for deep neural networks. arXiv preprint arXiv:1507.06149(2015).

[43]

Rashish Tandon, Qi Lei, Alexandros G Dimakis, and Nikos Karampatziakis. Jul. 2017. Gradient coding: Avoiding stragglers in distributed learning. In Proc. of ICML’17. Sydney.

[44]

Leslie G Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (1990), 103–111.

Digital Library

[45]

Jianyu Wang and Gauri Joshi. 2018. Adaptive communication strategies to achieve the best error-runtime trade-off in local-update SGD. arXiv preprint arXiv:1810.08313(2018).

[46]

Liang Wang, Ben Catterall, and Richard Mortier. 2017. Probabilistic Synchronous Parallel. arXiv preprint arXiv:1709.07772(2017).

[47]

Shiqiang Wang, Tiffany Tuor, Theodoros Salonidis, Kin K Leung, Christian Makaya, Ting He, and Kevin Chan. Apr. 2018. When edge meets learning: Adaptive control for resource-constrained distributed machine learning. In Proc. of IEEE INFOCOM’18. Honolulu, HI.

Digital Library

[48]

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Dec. 2016. Learning structured sparsity in deep neural networks. In Proc. of NIPS’16. Barcelona, Spanish.

[49]

Zhaoxiong Yang, Shuihai Hu, and Kai Chen. 2020. FPGA-Based Hardware Accelerator of Homomorphic Encryption for Efficient Federated Learning. https://arxiv.org/abs/2007.10560

[50]

Ali Anwar Liang Zhao Yue Cheng Zheng Chai, Yujing Chen and Huzefa Rangwala. Nov. 2021. FedAT: a high-performance and communication-efficient federated learning system with asynchronous tiers. In Proc. of ACM SC’21. St. Louis, MO.

Cited By

Liang HYang XHan XLiu BHu CWang DZhou XCheng D(2025)Spread+: Scalable Model Aggregation in Federated Learning With Non-IID DataIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2025.353973836:4(701-716)Online publication date: Apr-2025
https://doi.org/10.1109/TPDS.2025.3539738
Liang HZhang ZHu CGong YCheng D(2024)A Survey on Spatio-Temporal Big Data Analytics Ecosystem: Resource Management, Processing Platform, and ApplicationsIEEE Transactions on Big Data10.1109/TBDATA.2023.334261910:2(174-193)Online publication date: Apr-2024
https://doi.org/10.1109/TBDATA.2023.3342619
Guo WZhuang FZhang XTong YDong J(2024)A comprehensive survey of federated transfer learning: challenges, methods and applicationsFrontiers of Computer Science10.1007/s11704-024-40065-x18:6Online publication date: 23-Jul-2024
https://doi.org/10.1007/s11704-024-40065-x
Show More Cited By

Index Terms

Spread: Decentralized Model Aggregation for Scalable Federated Learning
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. n-tier architectures

Recommendations

FedPS: Model Aggregation with Pseudo Samples
Knowledge Science, Engineering and Management
Abstract
Federated learning (FL) is an emerging machine learning task that allows many clients to train a global model collaboratively while keeping their respective data nondisclosure. The regular method of federated learning is to average the parameters ...
Immuno-inspired Selective Aggregation for Decentralized Federated Deep Reinforcement Learning
GECCO '24 Companion: Proceedings of the Genetic and Evolutionary Computation Conference Companion

Conventional approaches to Federated Deep Reinforcement Learning (FDRL) often mandate the participation of all the associated devices and perform indiscriminate aggregation of the models. This can, at times, culminate in a low-performance global model ...
Fast-convergent federated learning with class-weighted aggregation
Abstract
Recently, federated learning has attracted great attention due to its advantage of enabling model training in a distributed manner. Instead of uploading data for centralized training, it allows devices to keep local data private and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '22: Proceedings of the 51st International Conference on Parallel Processing

August 2022

976 pages

ISBN:9781450397339

DOI:10.1145/3545008

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPP '22

ICPP '22: 51st International Conference on Parallel Processing

August 29 - September 1, 2022

Bordeaux, France

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
241
Total Downloads

Downloads (Last 12 months)57
Downloads (Last 6 weeks)6

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liang HYang XHan XLiu BHu CWang DZhou XCheng D(2025)Spread+: Scalable Model Aggregation in Federated Learning With Non-IID DataIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2025.353973836:4(701-716)Online publication date: Apr-2025
https://doi.org/10.1109/TPDS.2025.3539738
Liang HZhang ZHu CGong YCheng D(2024)A Survey on Spatio-Temporal Big Data Analytics Ecosystem: Resource Management, Processing Platform, and ApplicationsIEEE Transactions on Big Data10.1109/TBDATA.2023.334261910:2(174-193)Online publication date: Apr-2024
https://doi.org/10.1109/TBDATA.2023.3342619
Guo WZhuang FZhang XTong YDong J(2024)A comprehensive survey of federated transfer learning: challenges, methods and applicationsFrontiers of Computer Science10.1007/s11704-024-40065-x18:6Online publication date: 23-Jul-2024
https://doi.org/10.1007/s11704-024-40065-x
Zhang SZheng ZWu FLi BShao YChen G(2023)Learning From Your Neighbours: Mobility-Driven Device-Edge-Cloud Federated LearningProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605643(462-471)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605573.3605643
Nabavirazavi STaheri RGhahremani MIyengar S(2023)Model Poisoning Attack Against Federated Learning with Adaptive AggregationAdversarial Multimedia Forensics10.1007/978-3-031-49803-9_1(1-27)Online publication date: 15-Nov-2023
https://doi.org/10.1007/978-3-031-49803-9_1

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten