Skip to main content

Advertisement

Log in

FedBat: a self-adapting bat algorithm-based federated learning approach

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Federated learning (FL) is an advanced distributed machine learning (ML) framework designed to address issues related to data silos and data privacy. In real-world applications, common problems like non-convex optimization and nonindependent and identically distributed (Non-IID) client data reduce training efficiency, cause local optima, and degrade performance. Therefore, we propose a FL scheme based on the bat algorithm (FedBat), which leverages the echolocation mechanism of bats to effectively balance global and local search capabilities, enabling the algorithm to escape local optima with a certain probability. By combining global optimal model weight optimization with dynamically adjusted search strategies, FedBat guides weaker client models toward the global optimum, thereby accelerating convergence. Additionally, FedBat allows for adaptive parameter adjustments across various datasets. To mitigate client drift, we extend FedBat with Jensen–Shannon (JS) divergence to quantify differences between local and global models. Clients decide whether to upload their local models based on this divergence, to enhance the global model’s generalization and minimize communication overhead. Experimental results demonstrate that FedBat converges 5 times faster and enhances test accuracy by more than 40\(\%\) compared to FedAvg. The extended FedBat effectively mitigates the decrease in the generalization performance of the global model and reduces communication costs by approximately 20\(\%\). Comparing FedPso, FedGwo, and FedProx shows that FedBat demonstrates superior performance in terms of convergence speed and test accuracy. Furthermore, we derive the formula for the expected convergence rate of FedBat, analyze the impact of various parameters on FL performance, and establish the upper bound of FedBat to evaluate its model divergence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

No datasets were generated or analysed during the current study.

Notes

  1. The velocity update in FedBat can be viewed as a form of momentum, which relates to the historical gradients. Similar assumptions to those in eqn. (20)-(25) are discussed in [38, 39].

References

  1. Brynjolfsson E, Li D, Raymond LR (2023) Generative ai at work. Technical report, National Bureau of Economic Research

  2. Zheng Z, Pan Z, Wang D, Zhu K, Zhao W, Guo T, Qiu X, Sun M, Bai J, Zhang F et al (2023) Bladedisc: optimizing dynamic shape machine learning workloads via compiler approach. Proce ACM Manage Data 1(3):1–29

    Article  Google Scholar 

  3. Akarvardar K, Wong H-SP (2023) Technology prospects for data-intensive computing. Procee IEEE 111(1):92–112

    Article  Google Scholar 

  4. Al-Maroof RS, Alhumaid K, Alshaafi A, Akour I, Bettayeb A, Alfaisal R, Salloum SA (2024) A comparative analysis of chatgpt and google in educational settings: understanding the influence of mediators on learning platform adoption. Artificial Intelligence in Education The Power and Dangers of ChatGPT in the Classroom 386:365

    MathSciNet  Google Scholar 

  5. Frey CB, Presidente G (2024) Privacy regulation and firm performance: estimating the gdpr effect globally. Economic Inquiry. https://doi.org/10.1111/ecin.13213

    Article  Google Scholar 

  6. McMahan B, Moore E, Ramage D, Hampson S, Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data In: Artificial Intelligence and Statistics. PMLR, Westminster, p 24

    Google Scholar 

  7. Myrzashova R, Alsamhi SH, Shvetsov AV, Hawbani A, Wei X (2023) Blockchain meets federated learning in healthcare: a systematic review with challenges and opportunities. IEEE Internet Things J 10(16):14418–14437

    Article  Google Scholar 

  8. Rahman A, Hossain MS, Muhammad G, Kundu D, Debnath T, Rahman M, Khan MSI, Tiwari P, Band SS (2023) Federated learning-based ai approaches in smart healthcare: concepts, taxonomies, challenges and open issues. Cluster comput 26(4):2271–2311

    Article  Google Scholar 

  9. Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN, Bonawitz K, Charles Z, Cormode G, Cummings R et al (2021) Advances and open problems in federated learning. Foundat Trends Machine Learn 14(1):14–210

    Google Scholar 

  10. Nguyen H, Wu P, Chang JM (2023) Federated learning for distribution skewed data using sample weights. IEEE Trans Artific Intell. https://doi.org/10.1109/TAI.2023.3348073

    Article  Google Scholar 

  11. Zhao Y, Li M, Lai L, Suda N, Civin D, Chandra V (2018) Federated learning with non-iid data. arXiv Preprint. https://doi.org/10.48550/arXiv.1806.005

    Article  Google Scholar 

  12. Zhu H, Zhang H, Jin Y (2021) From federated learning to federated neural architecture search: a survey. Complex Intell Syst 7(2):639–657

    Article  Google Scholar 

  13. Zhang H, Jin Y, Cheng R, Hao K (2020) Efficient evolutionary search of attention convolutional networks via sampled training and node inheritance. IEEE Trans Evol Comput 25(2):371–385

    Article  Google Scholar 

  14. Zeng N, Li H, Wang Z, Liu W, Liu S, Alsaadi FE, Liu X (2021) Deep-reinforcement-learning-based images segmentation for quantitative analysis of gold immunochromatographic strip. Neurocomputing 425:173–180

    Article  Google Scholar 

  15. Yang X-S (2010) A new metaheuristic bat-inspired algorithm. In: Nature Inspired Cooperative Strategies for Optimization (NICSO 2010), pp. 65-74. Springer, ???

  16. Khanduja N, Bhushan B (2021) Metaheurist Evol Comput Algorith Appl. Recent advances and application of metaheuristicalgorithms: A survey (2014–2020)., Metaheuristic and Evolutionary Computation, pp 207–228

    Google Scholar 

  17. Woodworth B, Patel KK, Stich S, Dai Z, Bullins B, Mcmahan B, Shamir O, Srebro N (2020) Is local sgd better than minibatch sgd? In: International Conference on Machine Learning, pp. 10334–10343 . PMLR

  18. Wu X, Huang F, Hu Z, Huang H (2023) Faster adaptive federated learning. Proce AAAI Conf Artific Intell 37:10379–10387

    Google Scholar 

  19. Tuor T, Wang S, Ko BJ, Liu C, Leung KK (2021) Overcoming noisy and irrelevant data in federated learning. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 5020–5027. IEEE

  20. Yoshida N, Nishio T, Morikura M, Yamamoto K, Yonetani R (2020) Hybrid-fl for wireless networks: Cooperative learning mechanism using non-iid data. In: ICC 2020-2020 IEEE International Conference on Communications (ICC), pp. 1–7. IEEE

  21. Duan M, Liu D, Chen X, Tan Y, Ren J, Qiao L, Liang L (2019) Astraea: Self-balancing federated learning for improving classification accuracy of mobile deep learning applications. In: 2019 IEEE 37th International Conference on Computer Design (ICCD), pp. 246–254. IEEE

  22. Zhang H (2017) Mixup Beyond empirical risk minimization. Preprint, arXiv:1710.09412

  23. Li Z, Shao J, Mao Y, Wang JH, Zhang J (2022) Federated learning with gan-based data synthesis for non-iid clients. In: International workshop on trustworthy federated learning, pp. 17–32. Springer

  24. Reddi S, Charles Z, Zaheer M, Garrett Z, Rush K, Konečnỳ J, Kumar S, McMahan HB (2020) Adaptive federated optimization. arXiv preprint arXiv:2003.00295

  25. Li T, Sahu AK, Zaheer M, Sanjabi M, Talwalkar A, Smith V (2020) Federated optimization in heterogeneous networks. Proce Mach Learn syst 2:429–450

    Google Scholar 

  26. Wu H, Wang P (2021) Fast-convergent federated learning with adaptive weighting. IEEE Trans Cognit Commun Network 7(4):1078–1088

    Article  Google Scholar 

  27. Akay B, Karaboga D, Akay R (2022) A comprehensive survey on optimizing deep learning models by metaheuristics. Artificial Intelligence Review 55(2):829–894

    Article  Google Scholar 

  28. Park S, Suh Y, Lee J (2021) Fedpso: federated learning using particle swarm optimization to reduce communication costs. Sensors 21(2):600

    Article  Google Scholar 

  29. Houssein EH, Gad AG, Hussain K, Suganthan PN (2021) Major advances in particle swarm optimization: theory, analysis, and application. Swarm Evol Comput 63:100868

    Article  Google Scholar 

  30. Nguyen LT, Kim J, Shim B (2021) Gradual federated learning with simulated annealing. IEEE Trans Signal Proces 69:6299–6313

    Article  MathSciNet  Google Scholar 

  31. Kumbhare S, Kathole AB, Shinde S (2023) Federated learning aided breast cancer detection with intelligent heuristic-based deep learning framework. Biomed Signal Proces Control 86:105080

    Article  Google Scholar 

  32. Wang H, Kaplan Z, Niu D, Li B (2020) Optimizing federated learning on non-iid data with reinforcement learning. In: IEEE Infocom 2020-IEEE Conference on Computer Communications, pp. 1698–1707. IEEE

  33. Luping W, Wei W, Bo L (2019) Cmfl: Mitigating communication overhead for federated learning. In: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pp. 954–964. IEEE

  34. Nguyen HT, Sehwag V, Hosseinalipour S, Brinton CG, Chiang M, Poor HV (2020) Fast-convergent federated learning. IEEE J Select Areas Commun 39(1):201–218

    Article  Google Scholar 

  35. Kaveh M, Mesgari MS (2023) Application of meta-heuristic algorithms for training neural networks and deep learning architectures: a comprehensive review. Neural Proces Lett 55(4):4519–4622

    Article  Google Scholar 

  36. Khan MS, Jabeen F, Ghouzali S, Rehman Z, Naz S, Abdul W (2021) Metaheuristic algorithms in optimizing deep neural network model for software effort estimation. Ieee Access 9:60309–60327

    Article  Google Scholar 

  37. Zeng N, Song D, Li H, You Y, Liu Y, Alsaadi FE (2021) A competitive mechanism integrated multi-objective whale optimization algorithm with differential evolution. Neurocomputing 432:170–182

    Article  Google Scholar 

  38. Liu W, Chen L, Chen Y, Zhang W (2020) Accelerating federated learning via momentum gradient descent. IEEE Trans Parallel Distribut Syst 31(8):1754–1766

    Article  Google Scholar 

  39. Fan X, Wang Y, Huo Y, Tian Z (2023) Cb-dsl: Communication-efficient and byzantine-robust distributed swarm learning on non-iid data. IEEE Transactions on Cognitive Communications and Networking

  40. Wang J, Joshi G (2021) Cooperative sgd: a unified framework for the design and analysis of local-update sgd algorithms. J Mach Learn Res 22(213):1–50

    MathSciNet  Google Scholar 

  41. Bernstein J, Wang Y-X, Azizzadenesheli K, Anandkumar A (2018) signsgd: Compressed optimisation for non-convex problems. In: International Conference on Machine Learning, pp. 560–569. PMLR

  42. Zhang C, Cai Y, Lin G, Shen C (2022) Deepemd: Differentiable earth mover’s distance for few-shot learning. IEEE Trans Pattern Anal Mach Intell 45(5):5632–5648

    Google Scholar 

  43. Rubner Y, Tomasi C, Guibas LJ (2000) The earth mover’s distance as a metric for image retrieval. Int J Comput Vsion 40:99–121

    Article  Google Scholar 

  44. Tan AZ, Yu H, Cui L, Yang Q (2022) Towards personalized federated learning. IEEE Tansa Nural Ntworks Larn Syts 34(12):9587–9603

    MathSciNet  Google Scholar 

  45. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223. PMLR

  46. Bobbia B, Picard M (2024) Active learning for regression based on wasserstein distance and groupsort neural networks. arXiv preprint arXiv:2403.15108

  47. Zhang Y, Pan J, Li LK, Liu W, Chen Z, Liu X, Wang J (2024) On the properties of kullback-leibler divergence between multivariate gaussian distributions. Advances in Neural Information Processing Systems 36

  48. Lin J (1991) Divergence measures based on the shannon entropy. IEEE Trans Inf Theory 37(1):145–151

    Article  MathSciNet  Google Scholar 

  49. Abasi AK, Aloqaily M, Guizani M (2022) Grey wolf optimizer for reducing communication cost of federated learning. In: GLOBECOM 2022-2022 IEEE Global Communications Conference, pp. 1049–1054. IEEE

  50. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proce EEE 86(11):2278–2324

    Google Scholar 

  51. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

Download references

Author information

Authors and Affiliations

Authors

Contributions

J.W., C.S., and Y.P. equally contributed to the conception and design of the study, data collection and analysis, and writing of the manuscript. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Chaochao Sun.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by the National Natural Science Foundation of China (No. 41672114).

Appendix 1

Appendix 1

In this section, we perform a convergence analysis of the proposed FedBat algorithm. Each participating client uses the bat algorithm combined with SGD to update its local model. We will repeatedly use the following facts and illustrate some assumptions throughout the text.

  • Fact 1: \(\langle \textbf{p}, \textbf{q} \rangle = \Vert \textbf{p} \Vert \Vert \textbf{p} \Vert \cos \theta \ge -\Vert \textbf{p} \Vert \Vert \textbf{q} \Vert \)

  • Fact 2: \(\Vert \textbf{p}+\textbf{q}\Vert \le \Vert \textbf{p}\Vert +\Vert \textbf{q}\Vert\)

  • Fact 3: \(\Vert \textbf{pq}\Vert \le \Vert \textbf{p}\Vert \Vert \textbf{q}\Vert\)

  • Fact 4: For any set of vectors\(\left\{ \textbf{p}_i\right\} _{i=1}^n\) and nonnegative scalars \(\left\{ q_i\right\} _{i=1}^n\), define the weighted sum \(s=\sum _{i=1}^nq_i\). Then, according to Jensen’s inequality, we have: \(\left| \left| \sum _{i=1}^n\frac{q_i}{s}\textbf{p}_i\right| \right| ^2\le \sum _{i=1}^n\frac{q_i}{s}\Vert \textbf{p}_i\Vert ^2\)

  • Fact 5: For any set of vectors \(\left\{ \textbf{p}_i\right\} _{i=1}^n\), we have \(\left\| \sum _{i=1}^n\textbf{p}_i\right\| ^2\le n\sum _{i=1}^n\Vert \textbf{p}_i\Vert ^2\).

Proof of Theorem 1:

According to Assumption 1, where \(F_i(\cdot )\) is L-smooth, we have:

$$F_{i} (\user2{u}) \le F_{i} (\user2{v}) + <\user2{u} - \user2{v},\nabla F_{i} (\user2{v})>+ \frac{L}{2}\left\| {\user2{u} - \user2{v}} \right\|^{2}$$
(38)
$$F_{i} ({\mathbf{w}}_{{t + 1}} ) \le F_{i} ({\mathbf{w}}_{t} ) +<{\mathbf{w}}_{{i,t + 1}} - {\mathbf{w}}_{{i,t}} ,\nabla F({\mathbf{w}}_{{i,t}} )> + \frac{L}{2}\left\| {{\mathbf{w}}_{{i,t + 1}} - {\mathbf{w}}_{{i,t}} } \right\|^{2}$$
(39)
$$F_{i} (\mathbf{w}_{{t + 1}} ) - F_{i} (\mathbf{w}_{t} )\underbrace {{ - \langle \mathbf{v}{\mathbf{R}}_{{i,t + 1}} + p\epsilon {\mathbf{AR}}_{{t + 1}} - \eta \nabla F_{i} (\mathbf{w}_{{i,t}} ),\nabla F_{i} (\mathbf{w}_{t} )\rangle }}_{A} \le \underbrace {{\frac{L}{2}\left\| {\mathbf{w}_{{i,t + 1}} - \mathbf{w}_{{i,t}} } \right\|^{2} }}_{B}$$
(40)

We next focus on bounding A. According to the Fact 1 and eqn. (6), we have:

$$\begin{aligned}&A \Leftrightarrow \langle -\mathbf{vR}_{i,t+1} - p\epsilon\mathbf{AR}_{t+1} + \eta\nabla F_{i}(\mathbf{w}_{i,t}) , \nabla F_{i}(\mathbf{w}_{t}) \rangle \\&\Leftrightarrow (\|-\mathbf{vR}_{i,t+1} - p\epsilon\mathbf{AR}_{t+1}+\eta\nabla F_{i}(\mathbf{w}_{i,t})\|\|\nabla F_{i}(\mathbf{w}_{t})\|\cos\theta) \\&\Leftrightarrow (\|-\mathbf{vR}_{i,t} - (\mathbf{w}_{i,t}-\mathbf{w}_{*,t})f_{i} - p\epsilon\mathbf{AR}_{t+1}+\eta\nabla F_{i}(\mathbf{w}_{i,t})\|\|\nabla F_{i}(\mathbf{w}_{t})\|\cos\theta)\end{aligned}$$
(41)

By combining eqn. (14) and eqn. (15), it follows that:

$$\begin{aligned} \underbrace{(\Vert -\textbf{vR}_{i,t} - (\textbf{vR}_{i,t}-\textbf{vR}_{*,t})f_{i} - p\epsilon \textbf{AR}_{t+1} + \eta \nabla F_{i}(\textbf{w}_{i,t})\Vert \Vert \nabla F_{i}(\textbf{w}_{t})\Vert \cos \theta )}_{\text {A1}} \end{aligned}$$
(42)

By combining the Fact 2 and eqn. (10), we have:

$$\begin{aligned} &A1\le [(\Vert \textbf{vR}_{i,t}\Vert +\Vert (\textbf{vR}_{i,t}-\textbf{vR}_{*,t})f_{i}\Vert +p\epsilon \varphi \Vert \textbf{AR}_{t}\Vert +\eta \Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert )\Vert \nabla F_{i}(\textbf{w}_{t})\Vert \cos \theta ] \\&{\mathop {\le }\limits ^{(a)}}\underbrace{[(\Vert \textbf{vR}_{i,t}\Vert +\Vert \textbf{vR}_{i,t}\Vert f_{i}+\Vert \textbf{vR}_{*,t}\Vert f_{i}+p\epsilon \varphi \Vert \textbf{AR}_{t}\Vert +\eta \Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert )\Vert \nabla F_{i}(\textbf{w}_{t})\Vert \cos \theta ]}_{\text {A2}} \end{aligned}$$
(43)

where (a) uses Face 2 again.

According to the eqn. (20)-(25), we have:

$$\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{u} \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{q} \left\| {\nabla F_{i} ({\mathbf{w}}_{{i,t}} )} \right\|^{2} \le \left\| {{\mathbf{vR}}_{{i,t}} } \right\|\left\| {\nabla F_{i} ({\mathbf{w}}_{{i,t}} )} \right\|\cos \theta _{{i,t}} \le \bar{u}\bar{q}\left\| {\nabla F_{i} ({\mathbf{w}}_{{i,t}} )} \right\|^{2}$$
(44)
$${\underline{u}}_{*}{\underline{q}}_{*}\left\| {\nabla F_{i} ({\mathbf{w}}_{{i,t}} )} \right\|^{2} \le \left\| {{\mathbf{vR}}_{{*,t}} } \right\|\left\| {\nabla F_{i} ({\mathbf{w}}_{{i,t}} )} \right\|\cos \theta _{{*,t}} \le \bar{u}_{*} \bar{q}_{*} \left\| {\nabla F_{i} ({\mathbf{w}}_{{i,t}} )} \right\|^{2}$$
(45)
$$\begin{aligned} &{\underline{u}}_A{\underline{q}}_A\Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2\le \Vert \textbf{A}\textbf{R}_{t}\Vert \Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert \cos \theta _{A,t} \le {\overline{u}}_A{\overline{q}}_A\Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2 \end{aligned}$$
(46)

Substituting eqn. (44)-(46) to A2, we have:

$$\begin{aligned} &A2 \Leftrightarrow [(\Vert \textbf{vR}_{i,t}\Vert (1+f_{i})+\Vert \textbf{vR}_{*,t}\Vert f_{i}+p\epsilon \varphi \Vert \textbf{AR}_{t}\Vert +\eta \Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert )\Vert \nabla F_{i}(\textbf{w}_{t})\Vert \cos \theta ] \\&\le [({\underline{u}}{\underline{q}}(1+f_{i})+{\underline{u}}_{*}{\underline{q}}_{*}+ \eta +p\epsilon \varphi {\underline{u}}_{A}{\underline{q}}_{A})\Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2] \end{aligned}$$
(47)

We next aim to bound B.

$$\begin{aligned} &B \Leftrightarrow \frac{L}{2}\Vert \textbf{w}_{i,t+1}-\textbf{w}_{i,t}\Vert ^2 \\&\Leftrightarrow \frac{L}{2}\Vert \textbf{vR}_{i,t}+(\textbf{vR}_{i,t}-\textbf{v}_{*,t})f_{i}+p\epsilon \varphi \textbf{AR}_{t}-\eta \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2 \end{aligned}$$
(48)

By combining the Fact 2 and eqn. (11), we have:

$$\begin{aligned} &B\le \frac{L}{2}(\Vert \textbf{vR}_{i,t}\Vert +\Vert \textbf{vR}_{i,t}\Vert f_{i}+\Vert \textbf{vR}_{*,t}\Vert f_{i}+p\epsilon \varphi \Vert \textbf{AR}_{t}\Vert +\eta \Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert )^2 \\&{\mathop {\le }\limits ^{(b)}}\underbrace{\frac{5L}{2}\left( \left\| \textbf{vR}_{i,t}\right\| ^2 + {f_i}^2 \Vert \textbf{vR}_{i,t}^2\Vert + {f_i}^2\Vert \textbf{vR}_{*,t}\Vert ^2 + \eta ^2 \left\| \nabla F_i(\textbf{w}_{i,t})\right\| ^2 + p^2 \epsilon ^2 \varphi ^2 \left\| \textbf{AR}_t\right\| ^2\right) }_{\text {B1}} \end{aligned}$$
(49)

where (b) uses Jensen’s inequality in Fact 5, the quadratic term is expanded.

According to the eqn. (23)-(25), we have:

$$\left\| {{\mathbf{vR}}_{{i,t - 1}} } \right\| \le \bar{u}\left\| {\nabla F_{i} ({\mathbf{w}}_{{i,t - 1}} )} \right\|$$
(50)
$$\left\| {{\mathbf{vR}}_{{*,t - 1}} } \right\| \le \bar{u}_{*} \left\| {\nabla F_{i} ({\mathbf{w}}_{{i,t - 1}} )} \right\|$$
(51)
$$\begin{aligned} \left\| \textbf{A}\textbf{R}_{t-1}\right\| \le {\overline{u}}_A\left\| \nabla F_{i}(\textbf{w}_{i,t-1})\right\| \end{aligned}$$
(52)

Substituting eqn. (50)-(52) to B1, we have:

$$\begin{aligned} B1\le \frac{5L}{2} \left( {\overline{u}}^2(1 + {f_i}^2) + {f_i}^2{\overline{u}}_*^2 + \eta ^2 + p^2 \epsilon ^2 \varphi ^2 {\overline{u}}_A^2 \right) \Vert \nabla F_i(\textbf{w}_{i,t})\Vert ^2 \end{aligned}$$
(53)

Using eqn. (53) and (47), we complete the proof.

$$\begin{aligned} &F_{i}(\textbf{w}_{t+1}) - F_{i}(\textbf{w}_{t}) +[{\underline{u}}{\underline{q}}(1+f_{i})+{\underline{u}}_{*}{\underline{q}}_{*}+ \eta +p\epsilon \varphi {\underline{u}}_{A}{\underline{q}}_{A}]\Vert \nabla F_i(\textbf{w}_{i,t})\Vert ^2- \\&[\frac{5L}{2} \left( {\overline{u}}^2(1 + {f_i}^2) + {f_i}^2{\overline{u}}_*^2 + \eta ^2 + p^2 \epsilon ^2 \varphi ^2 {\overline{u}}_A^2 \right) ]\Vert \nabla F_i(\textbf{w}_{i,t})\Vert ^2\le 0 \end{aligned}$$
(54)

Let \(\Phi ={\underline{u}}{\underline{q}}(1+f_{i})+{\underline{u}}_{*}{\underline{q}}_{*}+\eta +p\epsilon \varphi {\underline{u}}_{A}{\underline{q}}_{A}-\frac{5\,L}{2}[{\overline{u}}^2(1+{f_i}^2)+{f_i}^2{\overline{u}}_*^2 + \eta ^2 + p^2 \epsilon ^2 \varphi ^2{\overline{u}}_A^2]\), we have:

$$\begin{aligned} &F_{i}(\textbf{w}_{t+1})-F_{i}(\textbf{w}_{t})+\Phi \Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2\le 0 \\&\Leftrightarrow \Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2 \le \frac{[F_{i}(\textbf{w}_{t})-F_{i}(\textbf{w}_{t+1})]}{\Phi } \end{aligned}$$
(55)

Take expected values on both sides:

$$\begin{aligned} {\mathbb {E}}\Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2\le \frac{{\mathbb {E}}[F_{i}(\textbf{w}_{t})-F_{i}(\textbf{w}_{t+1})]}{\Phi } \end{aligned}$$
(56)

Now we apply eqn. (56) over and over again:

$$\begin{aligned} &{\mathbb {E}}\Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2\le \frac{{\mathbb {E}}[F_{i}(\textbf{w}_{1})-F_{i}(\textbf{w}_{2})]}{\Phi }\\&{\mathbb {E}}\Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2\le \frac{{\mathbb {E}}[F_{i}(\textbf{w}_{2})-F_{i}(\textbf{w}_{3})]}{\Phi }\\&\hspace{7em} \vdots \nonumber \end{aligned}$$

Sum the above up, we have:

$$\begin{aligned} \sum _{t=1}^T{\mathbb {E}}\left\| \nabla F_i(\textbf{w}_{t})\right\| ^2\quad \le \frac{F_{1}-F_{*}}{\Phi } \end{aligned}$$
(57)

Divide both sides by T, we have:

$$\begin{aligned} \min _{t=1:T}{\mathbb {E}}\left\| \nabla F_i(\textbf{w}_t)\right\| ^2\le \frac{\sum _{t=1}^T{\mathbb {E}}\left\| \nabla F_i(\textbf{w}_t)\right\| ^2}{T}&\le \frac{F_{1}-F_{*}}{\Phi T} \end{aligned}$$
(58)

Let us assume that \(F_{1}-F_{*}\le D\). Then if we set \(\eta \le \frac{1}{5\,L}\), we can get:

$$\begin{aligned} \min _{t=1:T}{\mathbb {E}}\left\| \nabla F_i(\textbf{w}_t)\right\| ^2\le \frac{5LD}{T} \end{aligned}$$
(59)

We can see that the convergence rate of FedBat satisfies \({\mathcal {O}}(\frac{1}{T})\).

Proof of Theorem 2:

In the proof process, we derived eqn. (59), which provides the convergence bound for FedBat. To enhance these guarantees, we introduce the Polyak-Lojasiewicz (PL) condition. Starting from equ. (56), we have:

$$\begin{aligned} &{\mathbb {E}}[F(\textbf{w}_{t+1})-F(\textbf{w}_{t})]+\Phi {\mathbb {E}}[\Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2]\le 0 \\&\Leftrightarrow {\mathbb {E}}[F(\textbf{w}_{t+1})-F(\textbf{w}_{*})-(F(\textbf{w}_{t})-F(\textbf{w}_{*}))]+\Phi {\mathbb {E}}[\Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2]\le 0 \end{aligned}$$
(60)

By combining the Assumption 2:

$$\begin{aligned} &{\mathbb {E}}[F(\textbf{w}_{t+1})-F(\textbf{w}_{*})]\le {\mathbb {E}}[F(\textbf{w}_{t})-F(\textbf{w}_{*})]-\Phi \mu {\mathbb {E}}[F(\textbf{w}_t)-F(\textbf{w}_{*})]\\&\hspace{9em}\le (1-\Phi \mu ){\mathbb {E}}[F(\textbf{w}_t)-F(\textbf{w}_{*})]\\&\hspace{9em}\le (1-\Phi \mu )^2{\mathbb {E}}[F(\textbf{w}_{t-1})-F(\textbf{w}_{*})]\\&\hspace{12em} \vdots \nonumber \\&\hspace{9em}\le (1-\Phi \mu )^{k+1}{\mathbb {E}}[F(\textbf{w}_0)-F(\textbf{w}_{*})] \end{aligned}$$

To make this value less than or equal to \(\Lambda\), we have:

$$\begin{aligned}&(1-\Phi \mu )^{k+1}{\mathbb {E}}[F(\textbf{w}_0)-F(\textbf{w}_{*})] \le \Lambda \\&T_{\Lambda }={\mathcal {O}}(\log {\frac{{\mathbb {E}}[F(\textbf{w}_0)-F(\textbf{w}_{*})]}{\Lambda }}\frac{1}{\Phi \mu }) \end{aligned}$$

Proof of Theorem 3:

$$\begin{aligned} \Vert \textbf{w}_{i,t+1}-\textbf{w}_{*,t+1}\Vert&=\Vert \textbf{w}_{i,t}-\textbf{w}_{*,t}+\textbf{vR}_{i,t+1}-\textbf{vR}_{*,t+1}-\eta \nabla F_{i}(\textbf{w}_{i,t})+\eta \nabla F(\textbf{w}_{*,t})\Vert \\&\le \Vert \textbf{w}_{i,t}-\textbf{w}_{*,t}\Vert +\Vert \textbf{vR}_{i,t+1}-\textbf{vR}_{*,t+1}\Vert + \left\| \eta \nabla F_i\big (\textbf{w}_{i,t}\big )-\eta \nabla F\big (\textbf{w}_{*,t}\big )\right\| \end{aligned}$$
(61)

Based on the definitions of gradients at both the local clients and the virtual client, the gradients are given by \(\nabla F_i(\textbf{w}_{i,t})=\sum _{s=1}^Sm_i(b{=}s)\nabla {\mathbb {E}}_{\textbf{a}|b{=}s}[I_s(\textbf{a},\textbf{w}_{i,t})]\) and \(\nabla F_i(\textbf{w}_{*,t})=\sum _{s=1}^Sm_i(b{=}s)\nabla {\mathbb {E}}_{\textbf{a}|b{=}s}[I_s(\textbf{a},\textbf{w}_{*,t})]\).

$$\begin{aligned} &\left\| \nabla F_i\big (\textbf{w}_{i,t}\big )-\nabla F\big (\textbf{w}_{*,t}\big )\right\| \\&=\left\| \sum _{s=1}^Sm_i(b=s)\nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a},\textbf{w}_{i,t})]-\sum _{s=1}^Sm(b=s)\nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a},\textbf{w}_{*,t})]\right\| \\&=\left\| \sum _{s=1}^Sm_i(b=s)\nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a},\textbf{w}_{i,t})]-\sum _{s=1}^Sm_i(b=s)\nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a},\textbf{w}_{*,t})] \right. \\&\quad \left. +\sum _{s=1}^Sm_i(b=s)\nabla {\mathbb {E}}_{\textbf{a}|b=s}\left[ I_s\left( \textbf{a},\textbf{w}_{*,t}\right) \right] -\sum _{s=1}^Sm(b=s)\nabla {\mathbb {E}}_{\textbf{a}|b=s}\left[ I_s\left( \textbf{a},\textbf{w}_{*,t}\right) \right] \right\| \\&\le \left\| \sum _{s=1}^Sm_i(b{=}s)(\nabla {\mathbb {E}}_{\textbf{a}|b{=}s}[I_s(\textbf{a},\textbf{w}_{i,t})]-\nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a},\textbf{w}_{*,t})])\right\| \\&\quad +\left\| \sum _{s=1}^S(m_i(b{=}s){-}m(b{=}s))\nabla {\mathbb {E}}_{\textbf{a}|b{=}s}[I_s(\textbf{a},\textbf{w}_{*,t})]\right\| \end{aligned}$$
(62)

Letting \(I_{\text {max}}(\textbf{w}_{*,t}) = \max \{\nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a}, \textbf{w}_{*,t})]\}_{s=1}^S\). Using the Lipschitz continuity\(\Vert \nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a},\mathbf {w_1})]-\nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a},\mathbf {w_2})]\Vert \le L_s\Vert \mathbf {w_1}-\mathbf {w_2}\Vert\), the equality of eqn. (62) can be further rewritten as:

$$\begin{aligned} &\Vert \nabla F_i\left( \textbf{w}_{i,t}\right) -\nabla F(\textbf{w}_{*,t})\Vert \\&\le \sum _{s=1}^Sm_i(b=s)L_s\Vert \textbf{w}_{i,t}-\textbf{w}_t\Vert + I_{max}\big (\textbf{w}_{*,t}\big )\sum _{s=1}^S|(m_i(b=s)-m(b=s))| \end{aligned}$$
(63)

Then, we have:

$$\begin{aligned}&\left\| \textbf{w}_{i,t+1}-\textbf{w}_{*,t+1}\right\| \\&\le \left\| \textbf{w}_{i,t}-\textbf{w}_{*,t}\right\| +\left\| \textbf{vR}_{i,t+1}-\textbf{vR}_{*,t+1}\right\| \\&\le \Vert \textbf{w}_{i,t}-\textbf{w}_{*,t}\Vert +|\left( 1+f_{i}\right) \textbf{vR}_{i,t}-\left( f_{i}+1\right) \textbf{vR}_{*,t}\Vert +\eta \Vert \nabla F_i\big (\textbf{w}_{i,t}\big )-\nabla F\big (\textbf{w}_{*,t}\big )\Vert \\&\le \Vert \textbf{w}_{i,t}-\textbf{w}_{*,t}\Vert +|\left( 1+f_{i}\right) \textbf{vR}_{i,t}-\left( f_{i}+1\right) \textbf{vR}_{*,t}\Vert \\ & +\eta \sum _{s=1}^Sm_i(b = s)L_s\Vert \textbf{w}_{i,t}-\textbf{w}_{*,t}\Vert +\eta I_{max}\big(\textbf{w}_{*,t}\big)\sum _{s=1}^S|m_i(b = s)-m(b=s)| \\ &=\left( 1+\eta \sum _{s=1}^Sm_i(b=s)L_s\right) \Vert \textbf{w}_{i,t}-\textbf{w}_{*,t}\Vert +\Vert \left( 1+f_{i}\right) \textbf{vR}_{i,t}-\left( f_{i}+1\right) \textbf{vR}_{*,t}\Vert \\&\quad+\eta I_{max}\big (\textbf{w}_{*,t}\big )\sum _{s=1}^S|m_i(b=s)-m(b=s)| \end{aligned}$$
(64)

Letting \(\beta = 1 + \eta \sum _{s=1}^S m_i(b = s)L_s\), we rewrite eqn. (64) as:

$$\begin{aligned} &\left\| \textbf{w}_{i,t+1}-\textbf{w}_{*,t+1}\right\| \le \beta \Vert \textbf{w}_{i,t}-\textbf{w}_{*,t}\Vert +\left( 1+f_{i}\right) (\Vert \textbf{vR}_{i,t}-\textbf{vR}_{*,t}\Vert ) \\&\quad+\eta I_{max}\left( \textbf{w}_{*,t}\right) \sum _{s=1}^S|m_i(b=s)-m(b=s)| \\&\le \beta ^2\Vert \textbf{w}_{i,t-1}-\textbf{w}_{*,t-1}\Vert +\beta \left( 1+f_{i}\right) (\Vert \textbf{vR}_{i,t}-\textbf{vR}_{*,t}\Vert ) \\&\quad+\beta \eta I_{max}\left( \textbf{w}_{*,t}\right) \sum _{s=1}^S|m_i(b=s)-m(b=s)|+\left( 1+f_{i}\right) (\Vert \textbf{vR}_{i,t}-\textbf{vR}_{*,t}\Vert ) \\&\quad+\eta I_{max}\left( \textbf{w}_{*,t}\right) \sum _{s=1}^S|m_i(b=s)-m(b=s)| \\&\le \beta ^{t+1}\Vert \textbf{w}_{i,0}-\textbf{w}_{*,0}\Vert +\left( 1+f_{i}\right) \sum _{j=0}^t\beta ^{t-j}\Vert \textbf{vR}_{i,j}-\textbf{vR}_{*,j}\Vert \\&\quad+\eta \sum _{s=1}^S|m_i(b=s)-m(b=s)|\sum _{j=0}^tI_{max}\left( \textbf{w}_{*,j}\right) \end{aligned}$$
(65)

In conclusion, the proof is complete.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Sun, C. & Peng, Y. FedBat: a self-adapting bat algorithm-based federated learning approach. J Supercomput 81, 137 (2025). https://doi.org/10.1007/s11227-024-06514-x

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-024-06514-x

Keywords