Abstract
Federated learning faces challenges in real-world deployment scenarios due to limited client resources and the problem of stragglers caused by high heterogeneity. Despite efforts to reduce the training and communication overhead of federated learning through model pruning, a uniform pruning ratio fundamentally fails to address the efficiency impact of stragglers in heterogeneous systems. Therefore, adapting the pruned sub-models to individual device capabilities is crucial yet remains under-researched. In this work, we propose AdaPruneFL, a data-free adaptive structured pruning algorithm, which formulates the adaptive pruning problem in federated learning as an optimization problem constrained by aligning the response latency of the client’s local training, to identify an adaptive fine-grained model compression ratio. Combining sequential structured pruning, we extract heterogeneous but aggregable sub-model structures based on the capabilities of client devices, achieving training acceleration in a hardware-friendly manner while mitigating the straggler effect. Our extensive experiments demonstrate that, compared to FedAvg, AdaPruneFL achieves 1.38–3.88x faster training on general-purpose hardware platforms while maintaining comparable convergence accuracy.













Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Wang X, Garg S, Lin H, Hu J, Kaddoum G, Piran MJ, Hossain MS (2021) Toward accurate anomaly detection in industrial internet of things using hierarchical federated learning. IEEE Int Things J 9(10):7110–7119
Xu J, Glicksberg BS, Su C, Walker P, Bian J, Wang F (2021) Federated learning for healthcare informatics. J Healthc Inform Res 5:1–19
Liu Q, Chen C, Qin J, Dou Q, Heng P-A (2021) Feddg: federated domain generalization on medical image segmentation via episodic learning in continuous frequency space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1013–1023
Barbieri L, Savazzi S, Brambilla M, Nicoli M (2022) Decentralized federated learning for extended sensing in 6g connected vehicles. Veh Commun 33:100396
Liang X, Liu Y, Chen T, Liu M, Yang Q (2022) Federated transfer reinforcement learning for autonomous driving. In: Federated and Transfer Learning, pp. 357–371
Yarradoddi S, Gadekallu TR (2022) Federated learning role in big data, jot services and applications security, privacy and trust in jot a survey. In: Trust, Security and Privacy for Big Data, pp. 28–49
McMahan B, Moore E, Ramage D, Hampson S, Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282. PMLR
Jiang Y, Wang S, Valls V, Ko BJ, Lee W-H, Leung KK, Tassiulas L (2022) Model pruning enables efficient federated learning on edge devices. IEEE Trans Neural Netw Learn Syst 12:10374–10386. https://doi.org/10.1109/TNNLS.2022.3166101
Bibikar S, Vikalo H, Wang Z, Chen X (2022) Federated dynamic sparse training: Computing less, communicating less, yet learning better. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 6080–6088
Li A, Sun J, Wang B, Duan L, Li S, Chen Y, Li H (2020) Lotteryfl: Personalized and communication-efficient federated learning with lottery ticket hypothesis on non-iid datasets. arXiv preprint arXiv:2008.03371
Qiu X, Fernandez-Marques J, Gusmao PP, Gao Y, Parcollet T, Lane ND (2022) Zerofl: efficient on-device training for federated learning with local sparsity. arXiv preprint arXiv:2208.02507
Diao E, Ding J, Tarokh V (2020) Heterofl: computation and communication efficient federated learning for heterogeneous clients. arXiv preprint arXiv:2010.01264
Horvath S, Laskaridis S, Almeida M, Leontiadis I, Venieris S, Lane N (2021) Fjord: Fair and accurate federated learning under heterogeneous targets with ordered dropout. Adv Neural Inf Process Syst 34:12876–12889
Zhou G, Xu K, Li Q, Liu Y, Zhao Y (2021) Adaptcl: efficient collaborative learning with dynamic and adaptive pruning. arXiv preprint arXiv:2106.14126
Xie C, Koyejo S, Gupta I (2019) Asynchronous federated optimization. arXiv preprint arXiv:1903.03934
Chen Y, Ning Y, Slawski M, Rangwala H (2020) Asynchronous online federated learning for edge devices with non-iid data. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 15–24 . IEEE
Cai Y, Hua W, Chen H, Suh GE, De Sa C, Zhang Z (2022) Structured pruning is all you need for pruning cnns at initialization. arXiv preprint arXiv:2203.02549
Tanaka H, Kunin D, Yamins DL, Ganguli S (2020) Pruning neural networks without any data by iteratively conserving synaptic flow. Adv Neural Inf Process Syst 33:6377–6389
Frankle J, Dziugaite GK, Roy DM, Carbin M (2020) Pruning neural networks at initialization: Why are we missing the mark? arXiv preprint arXiv:2009.08576
Su J, Chen Y, Cai T, Wu T, Gao R, Wang L, Lee JD (2020) Sanity-checking pruning methods: Random tickets can win the jackpot. Adv Neural Inf Process Syst 33:20390–20401
Li T, Sahu AK, Zaheer M, Sanjabi M, Talwalkar A, Smith V (2020) Federated optimization in heterogeneous networks. Proc Mach Learn Syst 2:429–450
Janowsky SA (1989) Pruning versus clipping in neural networks. Phys Rev A 39(12):6600
Molchanov P, Tyree S, Karras T, Aila T, Kautz J (2016) Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440
Hassibi B, Stork D (1992) Second order derivatives for network pruning: Optimal brain surgeon. In: Proceedings of the 5th international conference on Neural Information Processing Systems. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 164–171
Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Proceedings of the 28th international conference on Neural Information Processing Systems. Montreal, Canada, pp 1135–1143
Molchanov P, Mallya A, Tyree S, Frosio I, Kautz J (2019) Importance estimation for neural network pruning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11264–11272
Frankle J, Carbin M (2018) The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635
Lee N, Ajanthan T, Torr PH (2018) Snip: Single-shot network pruning based on connection sensitivity. arXiv preprint arXiv:1810.02340
Wang C, Zhang G, Grosse R (2020) Picking winning tickets before training by preserving gradient flow. arXiv preprint arXiv:2002.07376
Liu S, Yu G, Yin R, Yuan J, Qu F (2020) Adaptive batchsize selection and gradient compression for wireless federated learning. In: GLOBECOM 2020-2020 IEEE Global Communications Conference, pp. 1–6. IEEE
Liu S, Yu G, Yin R, Yuan J, Shen L, Liu C (2021) Joint model pruning and device selection for communication-efficient federated edge learning. IEEE Trans Commun 70(1):231–244
Liu X, Wang S, Deng Y, Nallanathan A (2023) Adaptive federated pruning in hierarchical wireless networks. IEEE Trans Wirel Commun. https://doi.org/10.1109/TWC.2023.3329450
Chen Z, Yi W, Shin H, Nallanathan A (2023) Adaptive model pruning for communication and computation efficient wireless federated learning. IEEE Trans Wirel Commun. https://doi.org/10.1109/TWC.2023.3342626
Krizhevsky A, Hinton G et al. (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto. https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
Tan M, Le QV (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: Chaudhuri K, Salakhutdinov R (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 6105–6114. http://proceedings.mlr.press/v97/tan19a.html
Hsieh K, Phanishayee A, Mutlu O, Gibbons PB (2020) The non-iid data quagmire of decentralized machine learning. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 4387–4398. http://proceedings.mlr.press/v119/hsieh20a.html
Wu Y, He K (2018) Group normalization. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds.) Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XIII. Lecture Notes in Computer Science, vol. 11217, pp. 3–19. https://doi.org/10.1007/978-3-030-01261-8_1
Wang L, Xu S, Wang X, Zhu Q (2021) Addressing class imbalance in federated learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10165–10173
Li Q, He B, Song D (2021) Model-contrastive federated learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10713–10722
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: An imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, d’Alch’e F, Fox E, Garnett R (eds) Advances in neural information processing systems. https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
Series M (2009) Guidelines for evaluation of radio interface technologies for imt-advanced. Rep ITU 638:1–72
Acknowledgements
This paper is supported by the Strategic Priority Research Program of Chinese Academy of Sciences under grant No. XDA19020102.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
Experiment setup details
Data partition We use a Dirichlet (\(\gamma \)) distribution to generate the non-IID data partition among clients, where \(\gamma \) controls the skewness of the data distribution. In our experiments, we partitioned the Cifra-10/100 dataset using \(\gamma =0.5\). Due to the larger number of classes in the TinyImagenet dataset, we used a smaller value of \(\gamma =0.2\) to partition the heterogeneous data distribution. Specifically, the results of data partitioning in different training scenarios are illustrated in Fig. 13:
Model architecture details Details of the model architecture used in our experiments are shown in Table 9.
Bandwidth allocation In our experiments, we simulate the heterogeneity of device communication capabilities by randomly allocating communication bandwidth within the 4 G range. The detailed client bandwidth allocation is shown in Table 10. We allow random perturbations within 2MB/s to simulate network fluctuations.
Solution of the optimization problem
When using a linear form to approximate the relationship between computational complexity and computational latency, combined with equation (4) and equation (5), we have the following optimization problem:
According to the Lagrange multiplier method, the Lagrangian function for (8) is constructed as:
where \(\mu \ge 0\) is the Lagrange multiplier. According to the Karush–Kuhn–Tucker (KKT) conditions, we have:
By rearranging (10), we can conclude:
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fan, W., Yang, K., Wang, Y. et al. Data-free adaptive structured pruning for federated learning. J Supercomput 80, 18600–18626 (2024). https://doi.org/10.1007/s11227-024-06162-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-024-06162-1