Data-free adaptive structured pruning for federated learning

Fan, Wei; Yang, Keke; Wang, Yifan; Chen, Cong; Li, Jing

doi:10.1007/s11227-024-06162-1

Data-free adaptive structured pruning for federated learning

Published: 19 May 2024

Volume 80, pages 18600–18626, (2024)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Wei Fan¹,
Keke Yang¹,
Yifan Wang¹,
Cong Chen¹ &
…
Jing Li¹

403 Accesses
Explore all metrics

Abstract

Federated learning faces challenges in real-world deployment scenarios due to limited client resources and the problem of stragglers caused by high heterogeneity. Despite efforts to reduce the training and communication overhead of federated learning through model pruning, a uniform pruning ratio fundamentally fails to address the efficiency impact of stragglers in heterogeneous systems. Therefore, adapting the pruned sub-models to individual device capabilities is crucial yet remains under-researched. In this work, we propose AdaPruneFL, a data-free adaptive structured pruning algorithm, which formulates the adaptive pruning problem in federated learning as an optimization problem constrained by aligning the response latency of the client’s local training, to identify an adaptive fine-grained model compression ratio. Combining sequential structured pruning, we extract heterogeneous but aggregable sub-model structures based on the capabilities of client devices, achieving training acceleration in a hardware-friendly manner while mitigating the straggler effect. Our extensive experiments demonstrate that, compared to FedAvg, AdaPruneFL achieves 1.38–3.88x faster training on general-purpose hardware platforms while maintaining comparable convergence accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HFSL: heterogeneity split federated learning based on client computing capabilities

Article 22 November 2024

A Novel Method for Communication-Efficient and Privacy-Preserving AI Model Generation and Optimization Through Federated Learning

Federated Learning with Flexible Architectures

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Notes

https://www.kaggle.com/c/tiny-imagenet.

References

Wang X, Garg S, Lin H, Hu J, Kaddoum G, Piran MJ, Hossain MS (2021) Toward accurate anomaly detection in industrial internet of things using hierarchical federated learning. IEEE Int Things J 9(10):7110–7119
Article Google Scholar
Xu J, Glicksberg BS, Su C, Walker P, Bian J, Wang F (2021) Federated learning for healthcare informatics. J Healthc Inform Res 5:1–19
Article Google Scholar
Liu Q, Chen C, Qin J, Dou Q, Heng P-A (2021) Feddg: federated domain generalization on medical image segmentation via episodic learning in continuous frequency space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1013–1023
Barbieri L, Savazzi S, Brambilla M, Nicoli M (2022) Decentralized federated learning for extended sensing in 6g connected vehicles. Veh Commun 33:100396
Google Scholar
Liang X, Liu Y, Chen T, Liu M, Yang Q (2022) Federated transfer reinforcement learning for autonomous driving. In: Federated and Transfer Learning, pp. 357–371
Yarradoddi S, Gadekallu TR (2022) Federated learning role in big data, jot services and applications security, privacy and trust in jot a survey. In: Trust, Security and Privacy for Big Data, pp. 28–49
McMahan B, Moore E, Ramage D, Hampson S, Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282. PMLR
Jiang Y, Wang S, Valls V, Ko BJ, Lee W-H, Leung KK, Tassiulas L (2022) Model pruning enables efficient federated learning on edge devices. IEEE Trans Neural Netw Learn Syst 12:10374–10386. https://doi.org/10.1109/TNNLS.2022.3166101
Article Google Scholar
Bibikar S, Vikalo H, Wang Z, Chen X (2022) Federated dynamic sparse training: Computing less, communicating less, yet learning better. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 6080–6088
Li A, Sun J, Wang B, Duan L, Li S, Chen Y, Li H (2020) Lotteryfl: Personalized and communication-efficient federated learning with lottery ticket hypothesis on non-iid datasets. arXiv preprint arXiv:2008.03371
Qiu X, Fernandez-Marques J, Gusmao PP, Gao Y, Parcollet T, Lane ND (2022) Zerofl: efficient on-device training for federated learning with local sparsity. arXiv preprint arXiv:2208.02507
Diao E, Ding J, Tarokh V (2020) Heterofl: computation and communication efficient federated learning for heterogeneous clients. arXiv preprint arXiv:2010.01264
Horvath S, Laskaridis S, Almeida M, Leontiadis I, Venieris S, Lane N (2021) Fjord: Fair and accurate federated learning under heterogeneous targets with ordered dropout. Adv Neural Inf Process Syst 34:12876–12889
Google Scholar
Zhou G, Xu K, Li Q, Liu Y, Zhao Y (2021) Adaptcl: efficient collaborative learning with dynamic and adaptive pruning. arXiv preprint arXiv:2106.14126
Xie C, Koyejo S, Gupta I (2019) Asynchronous federated optimization. arXiv preprint arXiv:1903.03934
Chen Y, Ning Y, Slawski M, Rangwala H (2020) Asynchronous online federated learning for edge devices with non-iid data. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 15–24 . IEEE
Cai Y, Hua W, Chen H, Suh GE, De Sa C, Zhang Z (2022) Structured pruning is all you need for pruning cnns at initialization. arXiv preprint arXiv:2203.02549
Tanaka H, Kunin D, Yamins DL, Ganguli S (2020) Pruning neural networks without any data by iteratively conserving synaptic flow. Adv Neural Inf Process Syst 33:6377–6389
Google Scholar
Frankle J, Dziugaite GK, Roy DM, Carbin M (2020) Pruning neural networks at initialization: Why are we missing the mark? arXiv preprint arXiv:2009.08576
Su J, Chen Y, Cai T, Wu T, Gao R, Wang L, Lee JD (2020) Sanity-checking pruning methods: Random tickets can win the jackpot. Adv Neural Inf Process Syst 33:20390–20401
Google Scholar
Li T, Sahu AK, Zaheer M, Sanjabi M, Talwalkar A, Smith V (2020) Federated optimization in heterogeneous networks. Proc Mach Learn Syst 2:429–450
Google Scholar
Janowsky SA (1989) Pruning versus clipping in neural networks. Phys Rev A 39(12):6600
Article Google Scholar
Molchanov P, Tyree S, Karras T, Aila T, Kautz J (2016) Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440
Hassibi B, Stork D (1992) Second order derivatives for network pruning: Optimal brain surgeon. In: Proceedings of the 5th international conference on Neural Information Processing Systems. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 164–171
Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Proceedings of the 28th international conference on Neural Information Processing Systems. Montreal, Canada, pp 1135–1143
Molchanov P, Mallya A, Tyree S, Frosio I, Kautz J (2019) Importance estimation for neural network pruning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11264–11272
Frankle J, Carbin M (2018) The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635
Lee N, Ajanthan T, Torr PH (2018) Snip: Single-shot network pruning based on connection sensitivity. arXiv preprint arXiv:1810.02340
Wang C, Zhang G, Grosse R (2020) Picking winning tickets before training by preserving gradient flow. arXiv preprint arXiv:2002.07376
Liu S, Yu G, Yin R, Yuan J, Qu F (2020) Adaptive batchsize selection and gradient compression for wireless federated learning. In: GLOBECOM 2020-2020 IEEE Global Communications Conference, pp. 1–6. IEEE
Liu S, Yu G, Yin R, Yuan J, Shen L, Liu C (2021) Joint model pruning and device selection for communication-efficient federated edge learning. IEEE Trans Commun 70(1):231–244
Article Google Scholar
Liu X, Wang S, Deng Y, Nallanathan A (2023) Adaptive federated pruning in hierarchical wireless networks. IEEE Trans Wirel Commun. https://doi.org/10.1109/TWC.2023.3329450
Article Google Scholar
Chen Z, Yi W, Shin H, Nallanathan A (2023) Adaptive model pruning for communication and computation efficient wireless federated learning. IEEE Trans Wirel Commun. https://doi.org/10.1109/TWC.2023.3342626
Article Google Scholar
Krizhevsky A, Hinton G et al. (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto. https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
Tan M, Le QV (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: Chaudhuri K, Salakhutdinov R (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 6105–6114. http://proceedings.mlr.press/v97/tan19a.html
Hsieh K, Phanishayee A, Mutlu O, Gibbons PB (2020) The non-iid data quagmire of decentralized machine learning. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 4387–4398. http://proceedings.mlr.press/v119/hsieh20a.html
Wu Y, He K (2018) Group normalization. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds.) Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XIII. Lecture Notes in Computer Science, vol. 11217, pp. 3–19. https://doi.org/10.1007/978-3-030-01261-8_1
Wang L, Xu S, Wang X, Zhu Q (2021) Addressing class imbalance in federated learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10165–10173
Li Q, He B, Song D (2021) Model-contrastive federated learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10713–10722
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: An imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, d’Alch’e F, Fox E, Garnett R (eds) Advances in neural information processing systems. https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
Series M (2009) Guidelines for evaluation of radio interface technologies for imt-advanced. Rep ITU 638:1–72
Google Scholar

Download references

Acknowledgements

This paper is supported by the Strategic Priority Research Program of Chinese Academy of Sciences under grant No. XDA19020102.

Author information

Authors and Affiliations

School of Computer Science and Technology, University of Science and Technology of China, Jinzhai Road, 230026, Hefei, Anhui, China
Wei Fan, Keke Yang, Yifan Wang, Cong Chen & Jing Li

Authors

Wei Fan
View author publications
You can also search for this author inPubMed Google Scholar
Keke Yang
View author publications
You can also search for this author inPubMed Google Scholar
Yifan Wang
View author publications
You can also search for this author inPubMed Google Scholar
Cong Chen
View author publications
You can also search for this author inPubMed Google Scholar
Jing Li
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jing Li.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Experiment setup details

Data partition We use a Dirichlet ($\gamma $) distribution to generate the non-IID data partition among clients, where $\gamma $ controls the skewness of the data distribution. In our experiments, we partitioned the Cifra-10/100 dataset using $\gamma =0.5$. Due to the larger number of classes in the TinyImagenet dataset, we used a smaller value of $\gamma =0.2$ to partition the heterogeneous data distribution. Specifically, the results of data partitioning in different training scenarios are illustrated in Fig. 13:

Model architecture details Details of the model architecture used in our experiments are shown in Table 9.

Bandwidth allocation In our experiments, we simulate the heterogeneity of device communication capabilities by randomly allocating communication bandwidth within the 4 G range. The detailed client bandwidth allocation is shown in Table 10. We allow random perturbations within 2MB/s to simulate network fluctuations.

Table 9 Model architectures

Full size table

Table 10 Bandwidth allocation of clients

Full size table

Solution of the optimization problem

When using a linear form to approximate the relationship between computational complexity and computational latency, combined with equation (4) and equation (5), we have the following optimization problem:

$$\begin{aligned} \begin{aligned} \min _{p_l}-\sum ^N_{l=1}\log {p^c_l} \ \ \textrm{subject to}\, \,&|\mathcal {D}^c|\cdot \left( k \sum ^N_{l=1}\beta _l\cdot p_l^c + b\right) +\frac{\sum ^N_{l=1}\alpha _l\cdot p_l^c}{B^c} \le t_{target}, \\ 0<p^c_l\le 1,\,&\forall 1\le l\le N \end{aligned} \end{aligned}$$

(8)

According to the Lagrange multiplier method, the Lagrangian function for (8) is constructed as:

$$\begin{aligned} \begin{aligned} L(p^c_l, \mu ) = -\sum ^N_{l=1}\log {p^c_l} + \mu&\left( |\mathcal {D}^c|\cdot \left( k \sum ^N_{l=1}\beta _l\cdot p_l^c + b\right) +\frac{\sum ^N_{l=1}\alpha _l\cdot p_l^c}{B^c} - t_{target}\right) \\ 0<p^c_l\le 1,\,&\forall 1\le l\le N \end{aligned} \end{aligned}$$

(9)

where $\mu \ge 0$ is the Lagrange multiplier. According to the Karush–Kuhn–Tucker (KKT) conditions, we have:

$$\begin{aligned} \begin{aligned}&\frac{\partial L}{\partial p^c_l} = -\frac{1}{p^c_l} + \mu \left( |\mathcal {D}^c| k \beta _l + \frac{\alpha _l}{B^c}\right) = 0\\&\frac{\partial L}{\mu } = |\mathcal {D}^c|\left( k \sum ^N_{l=1}\beta _l\cdot p_l^c + b\right) + \frac{\sum ^N_{l=1}\alpha _l\cdot p_l^c}{B^c} - t_{target} = 0\\&\mu \left( |\mathcal {D}^c|\left( k \sum ^N_{l=1}\beta _l\cdot p_l^c + b\right) + \frac{\sum ^N_{l=1}\alpha _l\cdot p_l^c}{B^c} - t_{target}\right) = 0\\&0<p^c_l\le 1,\ \forall 1\le l\le N \end{aligned} \end{aligned}$$

(10)

By rearranging (10), we can conclude:

$$\begin{aligned} \left\{ \begin{aligned}&p_l=min\left( \frac{B^c}{\mu (|\mathcal {D}^c|\cdot k \cdot \beta _l \cdot B^c+\alpha _l)},1\right) \\&|\mathcal {D}^c|\left( k \sum ^N_{l=1}\beta _lp_l+b\right) +\frac{\sum ^N_{l=1}\alpha _lp_l}{B^c}=t_{target}\\&\mu >0 \end{aligned} \right. \end{aligned}$$

(11)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fan, W., Yang, K., Wang, Y. et al. Data-free adaptive structured pruning for federated learning. J Supercomput 80, 18600–18626 (2024). https://doi.org/10.1007/s11227-024-06162-1

Download citation

Accepted: 21 April 2024
Published: 19 May 2024
Issue Date: September 2024
DOI: https://doi.org/10.1007/s11227-024-06162-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data-free adaptive structured pruning for federated learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

HFSL: heterogeneity split federated learning based on client computing capabilities

A Novel Method for Communication-Efficient and Privacy-Preserving AI Model Generation and Optimization Through Federated Learning

Federated Learning with Flexible Architectures

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix

Experiment setup details

Solution of the optimization problem

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now