skip to main content
10.1145/3545008.3545030acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Spread: Decentralized Model Aggregation for Scalable Federated Learning

Published: 13 January 2023 Publication History

Abstract

Federated learning (FL) is a new distributed machine learning paradigm that enables machine learning on edge devices. One unique feature of FL is that edge devices belong to individuals; and since they are not “owned” by the FL coordinator, but can be “federated” instead, there can potentially be a huge number of edge devices. In the current distributed ML architecture, the parameter server (PS) architecture, model aggregation is centralized. When facing a large number of edge devices, the centralized model aggregation becomes the bottleneck and fundamentally restricts system scalability.
In this paper, we present Spread to decentralize model aggregation. Spread is a tiered architecture where nodes are organized into clusters so that model aggregation can be offloaded to certain edge devices. We design a Spread-based FL system: it employs a new algorithm for cluster construction and an adaptive algorithm that regulates, in runtime, inter-cluster model training and intra-cluster model training. We present an implementation of a functional system by extending the Federated Learning system. Our evaluation shows that Spread can resolve the bottleneck of centralized model aggregation. Spread yields an 8.05 × and a 5.58 × model training speedup as compared to existing FL systems supported by the PS and allReduce architecture.

References

[1]
Yehya Abouelnaga, Ola S Ali, Hager Rady, and Mohamed Moustafa. 2016. Cifar-10: Knn-based ensemble of classifiers. In 2016 International Conference on Computational Science and Computational Intelligence (CSCI). IEEE, 1192–1195.
[2]
Hamed Hassani Ali Jadbabaie Ramtin Pedarsani Amirhossein Reisizadeh, Aryan Mokhtari. Aug., 2020. FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization. In Proc.of AISTATS’20. virtual.
[3]
Lukas Balles, Javier Romero, and Philipp Hennig. 2016. Coupling adaptive batch sizes with learning rates. arXiv preprint arXiv:1612.05086(2016).
[4]
Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konecny, Stefano Mazzocchi, H Brendan McMahan, 2019. Towards federated learning at scale: System design. arXiv preprint arXiv:1902.01046(2019).
[5]
Léon Bottou. Aug. 2010. Large-scale machine learning with stochastic gradient descent. In Proc. of Springer COMPSTAT’10. Paris, France.
[6]
Zheng Chai, Hannan Fayyaz, Zeshan Fayyaz, Ali Anwar, Yi Zhou, Nathalie Baracaldo, Heiko Ludwig, and Yue Cheng. May, 2019. Towards Taming the Resource and Data Heterogeneity in Federated Learning. In Proc. of USENIX OpML’19. Santa Clara, CA.
[7]
Chen Chen, Wei Wang, and Bo Li. Apr. 2019. Round-robin synchronization: Mitigating communication bottlenecks in parameter servers. In Proc. of IEEE INFOCOM’19. Paris, France.
[8]
Chi-Chung Chen, Chia-Lin Yang, and Hsiang-Yun Cheng. 2018. Efficient and robust parallel dnn training through model parallelism on multi-gpu platform. arXiv preprint arXiv:1809.02839(2018).
[9]
Yujing Chen, Yue Ning, Martin Slawski, and Huzefa Rangwala. Dec., 2020. Asynchronous Online Federated Learning for Edge Devices with Non-IID Data. In Proc. of IEEE BigData’20. Virtual.
[10]
Ashok Cutkosky and Róbert Busa-Fekete. Dec. 2018. Distributed stochastic optimization via adaptive SGD. In Proc. of NIPS’18. Montreal.
[11]
Jeff Daily, Abhinav Vishnu, Charles Siegel, Thomas Warfel, and Vinay Amatya. 2018. Gossipgrad: Scalable deep learning using gossip communication based asynchronous gradient descent. arXiv preprint arXiv:1803.05880(2018).
[12]
Sercan Demirci, Asil Yardimci, Muge Sayit, E Turhan Tunali, and Hasan Bulut. 2017. A hierarchical P2P clustering framework for video streaming systems. Computer Standards & Interfaces 49 (2017), 44–58.
[13]
Rainer Gemulla, Erik Nijkamp, Peter J Haas, and Yannis Sismanis. Aug. 2011. Large-scale matrix factorization with distributed stochastic gradient descent. In Proc. of ACM SIGKDD’11. San Diego, CA.
[14]
Jinkun Geng, Dan Li, and Shuai Wang. 2019. Elasticpipe: An efficient and dynamic model-parallel solution to dnn training. In Proceedings of the 10th Workshop on Scientific Cloud Computing. 5–9.
[15]
Andrew Gibiansky. 2017. Bringing HPC techniques to deep learning.URL http://research. baidu. com/bringing-hpc-techniques-deep-learning (2017).
[16]
Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. Sep.2018. Amc: Automl for model compression and acceleration on mobile devices. In Proc.of Springer ECCV’18. Munich, Germany.
[17]
István Hegedűs, Gábor Danner, and Márk Jelasity. Jun. 2019. Gossip Learning as a Decentralized Alternative to Federated Learning. In Proc. of Springer DAIS’19. Kongens Lyngby, Denmark.
[18]
Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B Gibbons, Garth A Gibson, Greg Ganger, and Eric P Xing. Dec. 2013. More effective distributed ml via a stale synchronous parallel parameter server. In Proc. of NIPS’13. Lake Tahoe.
[19]
Hanpeng Hu, Dan Wang, and Chuan Wu. Feb. 2020. Distributed Machine Learning through Heterogeneous Edge Systems. In Proc. of IEEE AAAI’20. New York, NY.
[20]
Patrick Hunt, Mahadev Konar, Flavio Paiva Junqueira, and Benjamin Reed. Jun. 2010. ZooKeeper: Wait-free Coordination for Internet-scale Systems. In Proc. of USENIX ATC’10. Boston, MA.
[21]
Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, and Fan Yang. 2019. Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads. arXiv preprint arXiv:1901.05758(2019).
[22]
Eunjeong Jeong, Seungeun Oh, Hyesung Kim, Jihong Park, Mehdi Bennis, and Seong-Lyun Kim. 2018. Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data. https://arxiv.org/abs/1811.11479
[23]
Shuang Jiang, Dong He, Chenxi Yang, Chenren Xu, Guojie Luo, Yang Chen, Yunlu Liu, and Jiangwei Jiang. Apr. 2018. Accelerating mobile applications at the network edge with software-programmable fpgas. In Proc. of IEEE INFOCOM’18. Honolulu, HI.
[24]
Jakub Konečnỳ, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492(2016).
[25]
Hao Li, Asim Kadav, Erik Kruus, and Cristian Ungureanu. Apr. 2015. Malt: distributed data-parallelism for existing ml applications. In Proc. of ACM EuroSys’15. New York, NY.
[26]
Mu Li, David G Andersen, Alexander J Smola, and Kai Yu. Dec. 2014. Communication efficient distributed machine learning with the parameter server. In Proc. of NIPS’14. Quebec, Canada.
[27]
Songze Li, Seyed Mohammadreza Mousavi Kalan, A Salman Avestimehr, and Mahdi Soltanolkotabi. May 2018. Near-optimal straggler mitigation for distributed gradient methods. In Proc. of IEEE IPDPSW’18. British Columbia, Canada.
[28]
Ji Liu, Stephen J Wright, Christopher Ré, Victor Bittorf, and Srikrishna Sridhar. 2015. An asynchronous parallel stochastic coordinate descent algorithm. The Journal of Machine Learning Research 16, 1 (2015), 285–322.
[29]
Lumin Liu, Jun Zhang, SH Song, and Khaled B Letaief. 2019. Edge-Assisted Hierarchical Federated Learning with Non-IID Data. arXiv preprint arXiv:1905.06641(2019).
[30]
H Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, 2016. Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629(2016).
[31]
C Merriman. 2017. “Google announces tpu 2.0 with 180 teraflop max out for ai acceleration.
[32]
Jed Mills, Jia Hu, and Geyong Min. 2020. Communication-Efficient Federated Learning for Wireless Edge Intelligence in IoT. IEEE Internet of Things Journal 7, 7 (2020), 5986–5994. https://doi.org/10.1109/JIOT.2019.2956615
[33]
Truong Thao Nguyen, Mohamed Wahib, and Ryousei Takano. Nov. 2018. Hierarchical Distributed-Memory Multi-Leader MPI-Allreduce for Deep Learning Workloads. In Proc. of IEEE CANDARW’18. Takayama.
[34]
Stefanos Nikolaou, Christos Anagnostopoulos, and Dimitrios Pezaros. 2019. Communication-aware edge-centric knowledge dissemination in edge computing environments. Real-Time Data Analytics for Large Scale Sensor Data (2019), 139.
[35]
Takayuki Nishio and Ryo Yonetani. May, 2019. Client selection for federated learning with heterogeneous resources in mobile edge. In Proc.of IEEE ICC’19. Shanghai, China.
[36]
Róbert Ormándi, István Hegedűs, and Márk Jelasity. 2013. Gossip learning with linear models on fully distributed data. Concurrency and Computation: Practice and Experience 25, 4(2013), 556–571.
[37]
Pitch Patarasuk and Xin Yuan. 2009. Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel and Distrib. Comput. 69, 2 (2009), 117–124.
[38]
Paraskevi Raftopoulou and Euripides GM Petrakis. Mar. 2008. iCluster: a self-organizing overlay network for P2P information retrieval. In Proc. of Springer ECIR. Glasgow, UK.
[39]
Hang Shi, Yue Zhao, Bofeng Zhang, Kenji Yoshigoe, and Athanasios V Vasilakos. Jun. 2019. A free stale synchronous parallel strategy for distributed machine learning. In Proc. of ACM BDE’19. New York, NY.
[40]
Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2019. Megatron-lm: Training multi-billion parameter language models using gpu model parallelism. arXiv preprint arXiv:1909.08053(2019).
[41]
Petr Slavık. 1997. A tight analysis of the greedy algorithm for set cover. Journal of Algorithms 25, 2 (1997), 237–254.
[42]
Suraj Srinivas and R Venkatesh Babu. 2015. Data-free parameter pruning for deep neural networks. arXiv preprint arXiv:1507.06149(2015).
[43]
Rashish Tandon, Qi Lei, Alexandros G Dimakis, and Nikos Karampatziakis. Jul. 2017. Gradient coding: Avoiding stragglers in distributed learning. In Proc. of ICML’17. Sydney.
[44]
Leslie G Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (1990), 103–111.
[45]
Jianyu Wang and Gauri Joshi. 2018. Adaptive communication strategies to achieve the best error-runtime trade-off in local-update SGD. arXiv preprint arXiv:1810.08313(2018).
[46]
Liang Wang, Ben Catterall, and Richard Mortier. 2017. Probabilistic Synchronous Parallel. arXiv preprint arXiv:1709.07772(2017).
[47]
Shiqiang Wang, Tiffany Tuor, Theodoros Salonidis, Kin K Leung, Christian Makaya, Ting He, and Kevin Chan. Apr. 2018. When edge meets learning: Adaptive control for resource-constrained distributed machine learning. In Proc. of IEEE INFOCOM’18. Honolulu, HI.
[48]
Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Dec. 2016. Learning structured sparsity in deep neural networks. In Proc. of NIPS’16. Barcelona, Spanish.
[49]
Zhaoxiong Yang, Shuihai Hu, and Kai Chen. 2020. FPGA-Based Hardware Accelerator of Homomorphic Encryption for Efficient Federated Learning. https://arxiv.org/abs/2007.10560
[50]
Ali Anwar Liang Zhao Yue Cheng Zheng Chai, Yujing Chen and Huzefa Rangwala. Nov. 2021. FedAT: a high-performance and communication-efficient federated learning system with asynchronous tiers. In Proc. of ACM SC’21. St. Louis, MO.

Cited By

View all
  • (2025)Spread+: Scalable Model Aggregation in Federated Learning With Non-IID DataIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2025.353973836:4(701-716)Online publication date: Apr-2025
  • (2024)A Survey on Spatio-Temporal Big Data Analytics Ecosystem: Resource Management, Processing Platform, and ApplicationsIEEE Transactions on Big Data10.1109/TBDATA.2023.334261910:2(174-193)Online publication date: Apr-2024
  • (2024)A comprehensive survey of federated transfer learning: challenges, methods and applicationsFrontiers of Computer Science10.1007/s11704-024-40065-x18:6Online publication date: 23-Jul-2024
  • Show More Cited By

Index Terms

  1. Spread: Decentralized Model Aggregation for Scalable Federated Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICPP '22: Proceedings of the 51st International Conference on Parallel Processing
    August 2022
    976 pages
    ISBN:9781450397339
    DOI:10.1145/3545008
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 January 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Federated learning
    2. model aggregation
    3. scalability
    4. tiering

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICPP '22
    ICPP '22: 51st International Conference on Parallel Processing
    August 29 - September 1, 2022
    Bordeaux, France

    Acceptance Rates

    Overall Acceptance Rate 91 of 313 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)57
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Spread+: Scalable Model Aggregation in Federated Learning With Non-IID DataIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2025.353973836:4(701-716)Online publication date: Apr-2025
    • (2024)A Survey on Spatio-Temporal Big Data Analytics Ecosystem: Resource Management, Processing Platform, and ApplicationsIEEE Transactions on Big Data10.1109/TBDATA.2023.334261910:2(174-193)Online publication date: Apr-2024
    • (2024)A comprehensive survey of federated transfer learning: challenges, methods and applicationsFrontiers of Computer Science10.1007/s11704-024-40065-x18:6Online publication date: 23-Jul-2024
    • (2023)Learning From Your Neighbours: Mobility-Driven Device-Edge-Cloud Federated LearningProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605643(462-471)Online publication date: 7-Aug-2023
    • (2023)Model Poisoning Attack Against Federated Learning with Adaptive AggregationAdversarial Multimedia Forensics10.1007/978-3-031-49803-9_1(1-27)Online publication date: 15-Nov-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media