FedBat: a self-adapting bat algorithm-based federated learning approach

Wang, Jie; Sun, Chaochao; Peng, Yuan

doi:10.1007/s11227-024-06514-x

FedBat: a self-adapting bat algorithm-based federated learning approach

Published: 06 November 2024

Volume 81, article number 137, (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jie Wang¹,
Chaochao Sun¹ &
Yuan Peng¹

119 Accesses
Explore all metrics

Abstract

Federated learning (FL) is an advanced distributed machine learning (ML) framework designed to address issues related to data silos and data privacy. In real-world applications, common problems like non-convex optimization and nonindependent and identically distributed (Non-IID) client data reduce training efficiency, cause local optima, and degrade performance. Therefore, we propose a FL scheme based on the bat algorithm (FedBat), which leverages the echolocation mechanism of bats to effectively balance global and local search capabilities, enabling the algorithm to escape local optima with a certain probability. By combining global optimal model weight optimization with dynamically adjusted search strategies, FedBat guides weaker client models toward the global optimum, thereby accelerating convergence. Additionally, FedBat allows for adaptive parameter adjustments across various datasets. To mitigate client drift, we extend FedBat with Jensen–Shannon (JS) divergence to quantify differences between local and global models. Clients decide whether to upload their local models based on this divergence, to enhance the global model’s generalization and minimize communication overhead. Experimental results demonstrate that FedBat converges 5 times faster and enhances test accuracy by more than 40$\%$ compared to FedAvg. The extended FedBat effectively mitigates the decrease in the generalization performance of the global model and reduces communication costs by approximately 20$\%$. Comparing FedPso, FedGwo, and FedProx shows that FedBat demonstrates superior performance in terms of convergence speed and test accuracy. Furthermore, we derive the formula for the expected convergence rate of FedBat, analyze the impact of various parameters on FL performance, and establish the upper bound of FedBat to evaluate its model divergence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing federated learning with dynamic weight adjustment based on particle swarm optimization

Article Open access 28 October 2024

DDPG-FL: A Reinforcement Learning Approach for Data Balancing in Federated Learning

FopLAHD: Federated Optimization Using Locally Approximated Hessian Diagonal

Data availability

No datasets were generated or analysed during the current study.

Notes

The velocity update in FedBat can be viewed as a form of momentum, which relates to the historical gradients. Similar assumptions to those in eqn. (20)-(25) are discussed in [38, 39].

References

Brynjolfsson E, Li D, Raymond LR (2023) Generative ai at work. Technical report, National Bureau of Economic Research
Zheng Z, Pan Z, Wang D, Zhu K, Zhao W, Guo T, Qiu X, Sun M, Bai J, Zhang F et al (2023) Bladedisc: optimizing dynamic shape machine learning workloads via compiler approach. Proce ACM Manage Data 1(3):1–29
Article Google Scholar
Akarvardar K, Wong H-SP (2023) Technology prospects for data-intensive computing. Procee IEEE 111(1):92–112
Article Google Scholar
Al-Maroof RS, Alhumaid K, Alshaafi A, Akour I, Bettayeb A, Alfaisal R, Salloum SA (2024) A comparative analysis of chatgpt and google in educational settings: understanding the influence of mediators on learning platform adoption. Artificial Intelligence in Education The Power and Dangers of ChatGPT in the Classroom 386:365
MathSciNet Google Scholar
Frey CB, Presidente G (2024) Privacy regulation and firm performance: estimating the gdpr effect globally. Economic Inquiry. https://doi.org/10.1111/ecin.13213
Article Google Scholar
McMahan B, Moore E, Ramage D, Hampson S, Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data In: Artificial Intelligence and Statistics. PMLR, Westminster, p 24
Google Scholar
Myrzashova R, Alsamhi SH, Shvetsov AV, Hawbani A, Wei X (2023) Blockchain meets federated learning in healthcare: a systematic review with challenges and opportunities. IEEE Internet Things J 10(16):14418–14437
Article Google Scholar
Rahman A, Hossain MS, Muhammad G, Kundu D, Debnath T, Rahman M, Khan MSI, Tiwari P, Band SS (2023) Federated learning-based ai approaches in smart healthcare: concepts, taxonomies, challenges and open issues. Cluster comput 26(4):2271–2311
Article Google Scholar
Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN, Bonawitz K, Charles Z, Cormode G, Cummings R et al (2021) Advances and open problems in federated learning. Foundat Trends Machine Learn 14(1):14–210
Google Scholar
Nguyen H, Wu P, Chang JM (2023) Federated learning for distribution skewed data using sample weights. IEEE Trans Artific Intell. https://doi.org/10.1109/TAI.2023.3348073
Article Google Scholar
Zhao Y, Li M, Lai L, Suda N, Civin D, Chandra V (2018) Federated learning with non-iid data. arXiv Preprint. https://doi.org/10.48550/arXiv.1806.005
Article Google Scholar
Zhu H, Zhang H, Jin Y (2021) From federated learning to federated neural architecture search: a survey. Complex Intell Syst 7(2):639–657
Article Google Scholar
Zhang H, Jin Y, Cheng R, Hao K (2020) Efficient evolutionary search of attention convolutional networks via sampled training and node inheritance. IEEE Trans Evol Comput 25(2):371–385
Article Google Scholar
Zeng N, Li H, Wang Z, Liu W, Liu S, Alsaadi FE, Liu X (2021) Deep-reinforcement-learning-based images segmentation for quantitative analysis of gold immunochromatographic strip. Neurocomputing 425:173–180
Article Google Scholar
Yang X-S (2010) A new metaheuristic bat-inspired algorithm. In: Nature Inspired Cooperative Strategies for Optimization (NICSO 2010), pp. 65-74. Springer, ???
Khanduja N, Bhushan B (2021) Metaheurist Evol Comput Algorith Appl. Recent advances and application of metaheuristicalgorithms: A survey (2014–2020)., Metaheuristic and Evolutionary Computation, pp 207–228
Google Scholar
Woodworth B, Patel KK, Stich S, Dai Z, Bullins B, Mcmahan B, Shamir O, Srebro N (2020) Is local sgd better than minibatch sgd? In: International Conference on Machine Learning, pp. 10334–10343 . PMLR
Wu X, Huang F, Hu Z, Huang H (2023) Faster adaptive federated learning. Proce AAAI Conf Artific Intell 37:10379–10387
Google Scholar
Tuor T, Wang S, Ko BJ, Liu C, Leung KK (2021) Overcoming noisy and irrelevant data in federated learning. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 5020–5027. IEEE
Yoshida N, Nishio T, Morikura M, Yamamoto K, Yonetani R (2020) Hybrid-fl for wireless networks: Cooperative learning mechanism using non-iid data. In: ICC 2020-2020 IEEE International Conference on Communications (ICC), pp. 1–7. IEEE
Duan M, Liu D, Chen X, Tan Y, Ren J, Qiao L, Liang L (2019) Astraea: Self-balancing federated learning for improving classification accuracy of mobile deep learning applications. In: 2019 IEEE 37th International Conference on Computer Design (ICCD), pp. 246–254. IEEE
Zhang H (2017) Mixup Beyond empirical risk minimization. Preprint, arXiv:1710.09412
Li Z, Shao J, Mao Y, Wang JH, Zhang J (2022) Federated learning with gan-based data synthesis for non-iid clients. In: International workshop on trustworthy federated learning, pp. 17–32. Springer
Reddi S, Charles Z, Zaheer M, Garrett Z, Rush K, Konečnỳ J, Kumar S, McMahan HB (2020) Adaptive federated optimization. arXiv preprint arXiv:2003.00295
Li T, Sahu AK, Zaheer M, Sanjabi M, Talwalkar A, Smith V (2020) Federated optimization in heterogeneous networks. Proce Mach Learn syst 2:429–450
Google Scholar
Wu H, Wang P (2021) Fast-convergent federated learning with adaptive weighting. IEEE Trans Cognit Commun Network 7(4):1078–1088
Article Google Scholar
Akay B, Karaboga D, Akay R (2022) A comprehensive survey on optimizing deep learning models by metaheuristics. Artificial Intelligence Review 55(2):829–894
Article Google Scholar
Park S, Suh Y, Lee J (2021) Fedpso: federated learning using particle swarm optimization to reduce communication costs. Sensors 21(2):600
Article Google Scholar
Houssein EH, Gad AG, Hussain K, Suganthan PN (2021) Major advances in particle swarm optimization: theory, analysis, and application. Swarm Evol Comput 63:100868
Article Google Scholar
Nguyen LT, Kim J, Shim B (2021) Gradual federated learning with simulated annealing. IEEE Trans Signal Proces 69:6299–6313
Article MathSciNet Google Scholar
Kumbhare S, Kathole AB, Shinde S (2023) Federated learning aided breast cancer detection with intelligent heuristic-based deep learning framework. Biomed Signal Proces Control 86:105080
Article Google Scholar
Wang H, Kaplan Z, Niu D, Li B (2020) Optimizing federated learning on non-iid data with reinforcement learning. In: IEEE Infocom 2020-IEEE Conference on Computer Communications, pp. 1698–1707. IEEE
Luping W, Wei W, Bo L (2019) Cmfl: Mitigating communication overhead for federated learning. In: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pp. 954–964. IEEE
Nguyen HT, Sehwag V, Hosseinalipour S, Brinton CG, Chiang M, Poor HV (2020) Fast-convergent federated learning. IEEE J Select Areas Commun 39(1):201–218
Article Google Scholar
Kaveh M, Mesgari MS (2023) Application of meta-heuristic algorithms for training neural networks and deep learning architectures: a comprehensive review. Neural Proces Lett 55(4):4519–4622
Article Google Scholar
Khan MS, Jabeen F, Ghouzali S, Rehman Z, Naz S, Abdul W (2021) Metaheuristic algorithms in optimizing deep neural network model for software effort estimation. Ieee Access 9:60309–60327
Article Google Scholar
Zeng N, Song D, Li H, You Y, Liu Y, Alsaadi FE (2021) A competitive mechanism integrated multi-objective whale optimization algorithm with differential evolution. Neurocomputing 432:170–182
Article Google Scholar
Liu W, Chen L, Chen Y, Zhang W (2020) Accelerating federated learning via momentum gradient descent. IEEE Trans Parallel Distribut Syst 31(8):1754–1766
Article Google Scholar
Fan X, Wang Y, Huo Y, Tian Z (2023) Cb-dsl: Communication-efficient and byzantine-robust distributed swarm learning on non-iid data. IEEE Transactions on Cognitive Communications and Networking
Wang J, Joshi G (2021) Cooperative sgd: a unified framework for the design and analysis of local-update sgd algorithms. J Mach Learn Res 22(213):1–50
MathSciNet Google Scholar
Bernstein J, Wang Y-X, Azizzadenesheli K, Anandkumar A (2018) signsgd: Compressed optimisation for non-convex problems. In: International Conference on Machine Learning, pp. 560–569. PMLR
Zhang C, Cai Y, Lin G, Shen C (2022) Deepemd: Differentiable earth mover’s distance for few-shot learning. IEEE Trans Pattern Anal Mach Intell 45(5):5632–5648
Google Scholar
Rubner Y, Tomasi C, Guibas LJ (2000) The earth mover’s distance as a metric for image retrieval. Int J Comput Vsion 40:99–121
Article Google Scholar
Tan AZ, Yu H, Cui L, Yang Q (2022) Towards personalized federated learning. IEEE Tansa Nural Ntworks Larn Syts 34(12):9587–9603
MathSciNet Google Scholar
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223. PMLR
Bobbia B, Picard M (2024) Active learning for regression based on wasserstein distance and groupsort neural networks. arXiv preprint arXiv:2403.15108
Zhang Y, Pan J, Li LK, Liu W, Chen Z, Liu X, Wang J (2024) On the properties of kullback-leibler divergence between multivariate gaussian distributions. Advances in Neural Information Processing Systems 36
Lin J (1991) Divergence measures based on the shannon entropy. IEEE Trans Inf Theory 37(1):145–151
Article MathSciNet Google Scholar
Abasi AK, Aloqaily M, Guizani M (2022) Grey wolf optimizer for reducing communication cost of federated learning. In: GLOBECOM 2022-2022 IEEE Global Communications Conference, pp. 1049–1054. IEEE
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proce EEE 86(11):2278–2324
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Shanghai University of Electric Power, Shanghai, 201306, China
Jie Wang, Chaochao Sun & Yuan Peng

Authors

Jie Wang
View author publications
You can also search for this author inPubMed Google Scholar
Chaochao Sun
View author publications
You can also search for this author inPubMed Google Scholar
Yuan Peng
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

J.W., C.S., and Y.P. equally contributed to the conception and design of the study, data collection and analysis, and writing of the manuscript. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Chaochao Sun.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by the National Natural Science Foundation of China (No. 41672114).

Appendix 1

In this section, we perform a convergence analysis of the proposed FedBat algorithm. Each participating client uses the bat algorithm combined with SGD to update its local model. We will repeatedly use the following facts and illustrate some assumptions throughout the text.

Fact 1: $\langle \textbf{p}, \textbf{q} \rangle = \Vert \textbf{p} \Vert \Vert \textbf{p} \Vert \cos \theta \ge -\Vert \textbf{p} \Vert \Vert \textbf{q} \Vert $
Fact 2: $\Vert \textbf{p}+\textbf{q}\Vert \le \Vert \textbf{p}\Vert +\Vert \textbf{q}\Vert$
Fact 3: $\Vert \textbf{pq}\Vert \le \Vert \textbf{p}\Vert \Vert \textbf{q}\Vert$
Fact 4: For any set of vectors$\left\{ \textbf{p}_i\right\} _{i=1}^n$ and nonnegative scalars $\left\{ q_i\right\} _{i=1}^n$, define the weighted sum $s=\sum _{i=1}^nq_i$. Then, according to Jensen’s inequality, we have: $\left| \left| \sum _{i=1}^n\frac{q_i}{s}\textbf{p}_i\right| \right| ^2\le \sum _{i=1}^n\frac{q_i}{s}\Vert \textbf{p}_i\Vert ^2$
Fact 5: For any set of vectors $\left\{ \textbf{p}_i\right\} _{i=1}^n$, we have $\left\| \sum _{i=1}^n\textbf{p}_i\right\| ^2\le n\sum _{i=1}^n\Vert \textbf{p}_i\Vert ^2$.

Proof of Theorem 1:

According to Assumption 1, where $F_i(\cdot )$ is L-smooth, we have:

$$F_{i} (\user2{u}) \le F_{i} (\user2{v}) + <\user2{u} - \user2{v},\nabla F_{i} (\user2{v})>+ \frac{L}{2}\left\| {\user2{u} - \user2{v}} \right\|^{2}$$

(38)

$$F_{i} ({\mathbf{w}}_{{t + 1}} ) \le F_{i} ({\mathbf{w}}_{t} ) +<{\mathbf{w}}_{{i,t + 1}} - {\mathbf{w}}_{{i,t}} ,\nabla F({\mathbf{w}}_{{i,t}} )> + \frac{L}{2}\left\| {{\mathbf{w}}_{{i,t + 1}} - {\mathbf{w}}_{{i,t}} } \right\|^{2}$$

(39)

$$F_{i} (\mathbf{w}_{{t + 1}} ) - F_{i} (\mathbf{w}_{t} )\underbrace {{ - \langle \mathbf{v}{\mathbf{R}}_{{i,t + 1}} + p\epsilon {\mathbf{AR}}_{{t + 1}} - \eta \nabla F_{i} (\mathbf{w}_{{i,t}} ),\nabla F_{i} (\mathbf{w}_{t} )\rangle }}_{A} \le \underbrace {{\frac{L}{2}\left\| {\mathbf{w}_{{i,t + 1}} - \mathbf{w}_{{i,t}} } \right\|^{2} }}_{B}$$

(40)

We next focus on bounding A. According to the Fact 1 and eqn. (6), we have:

$$\begin{aligned}&A \Leftrightarrow \langle -\mathbf{vR}_{i,t+1} - p\epsilon\mathbf{AR}_{t+1} + \eta\nabla F_{i}(\mathbf{w}_{i,t}) , \nabla F_{i}(\mathbf{w}_{t}) \rangle \\&\Leftrightarrow (\|-\mathbf{vR}_{i,t+1} - p\epsilon\mathbf{AR}_{t+1}+\eta\nabla F_{i}(\mathbf{w}_{i,t})\|\|\nabla F_{i}(\mathbf{w}_{t})\|\cos\theta) \\&\Leftrightarrow (\|-\mathbf{vR}_{i,t} - (\mathbf{w}_{i,t}-\mathbf{w}_{*,t})f_{i} - p\epsilon\mathbf{AR}_{t+1}+\eta\nabla F_{i}(\mathbf{w}_{i,t})\|\|\nabla F_{i}(\mathbf{w}_{t})\|\cos\theta)\end{aligned}$$

(41)

By combining eqn. (14) and eqn. (15), it follows that:

$$\begin{aligned} \underbrace{(\Vert -\textbf{vR}_{i,t} - (\textbf{vR}_{i,t}-\textbf{vR}_{*,t})f_{i} - p\epsilon \textbf{AR}_{t+1} + \eta \nabla F_{i}(\textbf{w}_{i,t})\Vert \Vert \nabla F_{i}(\textbf{w}_{t})\Vert \cos \theta )}_{\text {A1}} \end{aligned}$$

(42)

By combining the Fact 2 and eqn. (10), we have:

$$\begin{aligned} &A1\le [(\Vert \textbf{vR}_{i,t}\Vert +\Vert (\textbf{vR}_{i,t}-\textbf{vR}_{*,t})f_{i}\Vert +p\epsilon \varphi \Vert \textbf{AR}_{t}\Vert +\eta \Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert )\Vert \nabla F_{i}(\textbf{w}_{t})\Vert \cos \theta ] \\&{\mathop {\le }\limits ^{(a)}}\underbrace{[(\Vert \textbf{vR}_{i,t}\Vert +\Vert \textbf{vR}_{i,t}\Vert f_{i}+\Vert \textbf{vR}_{*,t}\Vert f_{i}+p\epsilon \varphi \Vert \textbf{AR}_{t}\Vert +\eta \Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert )\Vert \nabla F_{i}(\textbf{w}_{t})\Vert \cos \theta ]}_{\text {A2}} \end{aligned}$$

(43)

where (a) uses Face 2 again.

According to the eqn. (20)-(25), we have:

$$\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{u} \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{q} \left\| {\nabla F_{i} ({\mathbf{w}}_{{i,t}} )} \right\|^{2} \le \left\| {{\mathbf{vR}}_{{i,t}} } \right\|\left\| {\nabla F_{i} ({\mathbf{w}}_{{i,t}} )} \right\|\cos \theta _{{i,t}} \le \bar{u}\bar{q}\left\| {\nabla F_{i} ({\mathbf{w}}_{{i,t}} )} \right\|^{2}$$

(44)

$${\underline{u}}_{*}{\underline{q}}_{*}\left\| {\nabla F_{i} ({\mathbf{w}}_{{i,t}} )} \right\|^{2} \le \left\| {{\mathbf{vR}}_{{*,t}} } \right\|\left\| {\nabla F_{i} ({\mathbf{w}}_{{i,t}} )} \right\|\cos \theta _{{*,t}} \le \bar{u}_{*} \bar{q}_{*} \left\| {\nabla F_{i} ({\mathbf{w}}_{{i,t}} )} \right\|^{2}$$

(45)

$$\begin{aligned} &{\underline{u}}_A{\underline{q}}_A\Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2\le \Vert \textbf{A}\textbf{R}_{t}\Vert \Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert \cos \theta _{A,t} \le {\overline{u}}_A{\overline{q}}_A\Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2 \end{aligned}$$

(46)

Substituting eqn. (44)-(46) to A2, we have:

$$\begin{aligned} &A2 \Leftrightarrow [(\Vert \textbf{vR}_{i,t}\Vert (1+f_{i})+\Vert \textbf{vR}_{*,t}\Vert f_{i}+p\epsilon \varphi \Vert \textbf{AR}_{t}\Vert +\eta \Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert )\Vert \nabla F_{i}(\textbf{w}_{t})\Vert \cos \theta ] \\&\le [({\underline{u}}{\underline{q}}(1+f_{i})+{\underline{u}}_{*}{\underline{q}}_{*}+ \eta +p\epsilon \varphi {\underline{u}}_{A}{\underline{q}}_{A})\Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2] \end{aligned}$$

(47)

We next aim to bound B.

$$\begin{aligned} &B \Leftrightarrow \frac{L}{2}\Vert \textbf{w}_{i,t+1}-\textbf{w}_{i,t}\Vert ^2 \\&\Leftrightarrow \frac{L}{2}\Vert \textbf{vR}_{i,t}+(\textbf{vR}_{i,t}-\textbf{v}_{*,t})f_{i}+p\epsilon \varphi \textbf{AR}_{t}-\eta \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2 \end{aligned}$$

(48)

By combining the Fact 2 and eqn. (11), we have:

$$\begin{aligned} &B\le \frac{L}{2}(\Vert \textbf{vR}_{i,t}\Vert +\Vert \textbf{vR}_{i,t}\Vert f_{i}+\Vert \textbf{vR}_{*,t}\Vert f_{i}+p\epsilon \varphi \Vert \textbf{AR}_{t}\Vert +\eta \Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert )^2 \\&{\mathop {\le }\limits ^{(b)}}\underbrace{\frac{5L}{2}\left( \left\| \textbf{vR}_{i,t}\right\| ^2 + {f_i}^2 \Vert \textbf{vR}_{i,t}^2\Vert + {f_i}^2\Vert \textbf{vR}_{*,t}\Vert ^2 + \eta ^2 \left\| \nabla F_i(\textbf{w}_{i,t})\right\| ^2 + p^2 \epsilon ^2 \varphi ^2 \left\| \textbf{AR}_t\right\| ^2\right) }_{\text {B1}} \end{aligned}$$

(49)

where (b) uses Jensen’s inequality in Fact 5, the quadratic term is expanded.

According to the eqn. (23)-(25), we have:

$$\left\| {{\mathbf{vR}}_{{i,t - 1}} } \right\| \le \bar{u}\left\| {\nabla F_{i} ({\mathbf{w}}_{{i,t - 1}} )} \right\|$$

(50)

$$\left\| {{\mathbf{vR}}_{{*,t - 1}} } \right\| \le \bar{u}_{*} \left\| {\nabla F_{i} ({\mathbf{w}}_{{i,t - 1}} )} \right\|$$

(51)

$$\begin{aligned} \left\| \textbf{A}\textbf{R}_{t-1}\right\| \le {\overline{u}}_A\left\| \nabla F_{i}(\textbf{w}_{i,t-1})\right\| \end{aligned}$$

(52)

Substituting eqn. (50)-(52) to B1, we have:

$$\begin{aligned} B1\le \frac{5L}{2} \left( {\overline{u}}^2(1 + {f_i}^2) + {f_i}^2{\overline{u}}_*^2 + \eta ^2 + p^2 \epsilon ^2 \varphi ^2 {\overline{u}}_A^2 \right) \Vert \nabla F_i(\textbf{w}_{i,t})\Vert ^2 \end{aligned}$$

(53)

Using eqn. (53) and (47), we complete the proof.

$$\begin{aligned} &F_{i}(\textbf{w}_{t+1}) - F_{i}(\textbf{w}_{t}) +[{\underline{u}}{\underline{q}}(1+f_{i})+{\underline{u}}_{*}{\underline{q}}_{*}+ \eta +p\epsilon \varphi {\underline{u}}_{A}{\underline{q}}_{A}]\Vert \nabla F_i(\textbf{w}_{i,t})\Vert ^2- \\&[\frac{5L}{2} \left( {\overline{u}}^2(1 + {f_i}^2) + {f_i}^2{\overline{u}}_*^2 + \eta ^2 + p^2 \epsilon ^2 \varphi ^2 {\overline{u}}_A^2 \right) ]\Vert \nabla F_i(\textbf{w}_{i,t})\Vert ^2\le 0 \end{aligned}$$

(54)

Let $\Phi ={\underline{u}}{\underline{q}}(1+f_{i})+{\underline{u}}_{*}{\underline{q}}_{*}+\eta +p\epsilon \varphi {\underline{u}}_{A}{\underline{q}}_{A}-\frac{5\,L}{2}[{\overline{u}}^2(1+{f_i}^2)+{f_i}^2{\overline{u}}_*^2 + \eta ^2 + p^2 \epsilon ^2 \varphi ^2{\overline{u}}_A^2]$, we have:

$$\begin{aligned} &F_{i}(\textbf{w}_{t+1})-F_{i}(\textbf{w}_{t})+\Phi \Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2\le 0 \\&\Leftrightarrow \Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2 \le \frac{[F_{i}(\textbf{w}_{t})-F_{i}(\textbf{w}_{t+1})]}{\Phi } \end{aligned}$$

(55)

Take expected values on both sides:

$$\begin{aligned} {\mathbb {E}}\Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2\le \frac{{\mathbb {E}}[F_{i}(\textbf{w}_{t})-F_{i}(\textbf{w}_{t+1})]}{\Phi } \end{aligned}$$

(56)

Now we apply eqn. (56) over and over again:

$$\begin{aligned} &{\mathbb {E}}\Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2\le \frac{{\mathbb {E}}[F_{i}(\textbf{w}_{1})-F_{i}(\textbf{w}_{2})]}{\Phi }\\&{\mathbb {E}}\Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2\le \frac{{\mathbb {E}}[F_{i}(\textbf{w}_{2})-F_{i}(\textbf{w}_{3})]}{\Phi }\\&\hspace{7em} \vdots \nonumber \end{aligned}$$

Sum the above up, we have:

$$\begin{aligned} \sum _{t=1}^T{\mathbb {E}}\left\| \nabla F_i(\textbf{w}_{t})\right\| ^2\quad \le \frac{F_{1}-F_{*}}{\Phi } \end{aligned}$$

(57)

Divide both sides by T, we have:

$$\begin{aligned} \min _{t=1:T}{\mathbb {E}}\left\| \nabla F_i(\textbf{w}_t)\right\| ^2\le \frac{\sum _{t=1}^T{\mathbb {E}}\left\| \nabla F_i(\textbf{w}_t)\right\| ^2}{T}&\le \frac{F_{1}-F_{*}}{\Phi T} \end{aligned}$$

(58)

Let us assume that $F_{1}-F_{*}\le D$. Then if we set $\eta \le \frac{1}{5\,L}$, we can get:

$$\begin{aligned} \min _{t=1:T}{\mathbb {E}}\left\| \nabla F_i(\textbf{w}_t)\right\| ^2\le \frac{5LD}{T} \end{aligned}$$

(59)

We can see that the convergence rate of FedBat satisfies ${\mathcal {O}}(\frac{1}{T})$.

Proof of Theorem 2:

In the proof process, we derived eqn. (59), which provides the convergence bound for FedBat. To enhance these guarantees, we introduce the Polyak-Lojasiewicz (PL) condition. Starting from equ. (56), we have:

$$\begin{aligned} &{\mathbb {E}}[F(\textbf{w}_{t+1})-F(\textbf{w}_{t})]+\Phi {\mathbb {E}}[\Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2]\le 0 \\&\Leftrightarrow {\mathbb {E}}[F(\textbf{w}_{t+1})-F(\textbf{w}_{*})-(F(\textbf{w}_{t})-F(\textbf{w}_{*}))]+\Phi {\mathbb {E}}[\Vert \nabla F_{i}(\textbf{w}_{i,t})\Vert ^2]\le 0 \end{aligned}$$

(60)

By combining the Assumption 2:

$$\begin{aligned} &{\mathbb {E}}[F(\textbf{w}_{t+1})-F(\textbf{w}_{*})]\le {\mathbb {E}}[F(\textbf{w}_{t})-F(\textbf{w}_{*})]-\Phi \mu {\mathbb {E}}[F(\textbf{w}_t)-F(\textbf{w}_{*})]\\&\hspace{9em}\le (1-\Phi \mu ){\mathbb {E}}[F(\textbf{w}_t)-F(\textbf{w}_{*})]\\&\hspace{9em}\le (1-\Phi \mu )^2{\mathbb {E}}[F(\textbf{w}_{t-1})-F(\textbf{w}_{*})]\\&\hspace{12em} \vdots \nonumber \\&\hspace{9em}\le (1-\Phi \mu )^{k+1}{\mathbb {E}}[F(\textbf{w}_0)-F(\textbf{w}_{*})] \end{aligned}$$

To make this value less than or equal to $\Lambda$, we have:

$$\begin{aligned}&(1-\Phi \mu )^{k+1}{\mathbb {E}}[F(\textbf{w}_0)-F(\textbf{w}_{*})] \le \Lambda \\&T_{\Lambda }={\mathcal {O}}(\log {\frac{{\mathbb {E}}[F(\textbf{w}_0)-F(\textbf{w}_{*})]}{\Lambda }}\frac{1}{\Phi \mu }) \end{aligned}$$

Proof of Theorem 3:

$$\begin{aligned} \Vert \textbf{w}_{i,t+1}-\textbf{w}_{*,t+1}\Vert&=\Vert \textbf{w}_{i,t}-\textbf{w}_{*,t}+\textbf{vR}_{i,t+1}-\textbf{vR}_{*,t+1}-\eta \nabla F_{i}(\textbf{w}_{i,t})+\eta \nabla F(\textbf{w}_{*,t})\Vert \\&\le \Vert \textbf{w}_{i,t}-\textbf{w}_{*,t}\Vert +\Vert \textbf{vR}_{i,t+1}-\textbf{vR}_{*,t+1}\Vert + \left\| \eta \nabla F_i\big (\textbf{w}_{i,t}\big )-\eta \nabla F\big (\textbf{w}_{*,t}\big )\right\| \end{aligned}$$

(61)

Based on the definitions of gradients at both the local clients and the virtual client, the gradients are given by $\nabla F_i(\textbf{w}_{i,t})=\sum _{s=1}^Sm_i(b{=}s)\nabla {\mathbb {E}}_{\textbf{a}|b{=}s}[I_s(\textbf{a},\textbf{w}_{i,t})]$ and $\nabla F_i(\textbf{w}_{*,t})=\sum _{s=1}^Sm_i(b{=}s)\nabla {\mathbb {E}}_{\textbf{a}|b{=}s}[I_s(\textbf{a},\textbf{w}_{*,t})]$.

$$\begin{aligned} &\left\| \nabla F_i\big (\textbf{w}_{i,t}\big )-\nabla F\big (\textbf{w}_{*,t}\big )\right\| \\&=\left\| \sum _{s=1}^Sm_i(b=s)\nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a},\textbf{w}_{i,t})]-\sum _{s=1}^Sm(b=s)\nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a},\textbf{w}_{*,t})]\right\| \\&=\left\| \sum _{s=1}^Sm_i(b=s)\nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a},\textbf{w}_{i,t})]-\sum _{s=1}^Sm_i(b=s)\nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a},\textbf{w}_{*,t})] \right. \\&\quad \left. +\sum _{s=1}^Sm_i(b=s)\nabla {\mathbb {E}}_{\textbf{a}|b=s}\left[ I_s\left( \textbf{a},\textbf{w}_{*,t}\right) \right] -\sum _{s=1}^Sm(b=s)\nabla {\mathbb {E}}_{\textbf{a}|b=s}\left[ I_s\left( \textbf{a},\textbf{w}_{*,t}\right) \right] \right\| \\&\le \left\| \sum _{s=1}^Sm_i(b{=}s)(\nabla {\mathbb {E}}_{\textbf{a}|b{=}s}[I_s(\textbf{a},\textbf{w}_{i,t})]-\nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a},\textbf{w}_{*,t})])\right\| \\&\quad +\left\| \sum _{s=1}^S(m_i(b{=}s){-}m(b{=}s))\nabla {\mathbb {E}}_{\textbf{a}|b{=}s}[I_s(\textbf{a},\textbf{w}_{*,t})]\right\| \end{aligned}$$

(62)

Letting $I_{\text {max}}(\textbf{w}_{*,t}) = \max \{\nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a}, \textbf{w}_{*,t})]\}_{s=1}^S$. Using the Lipschitz continuity$\Vert \nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a},\mathbf {w_1})]-\nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a},\mathbf {w_2})]\Vert \le L_s\Vert \mathbf {w_1}-\mathbf {w_2}\Vert$, the equality of eqn. (62) can be further rewritten as:

$$\begin{aligned} &\Vert \nabla F_i\left( \textbf{w}_{i,t}\right) -\nabla F(\textbf{w}_{*,t})\Vert \\&\le \sum _{s=1}^Sm_i(b=s)L_s\Vert \textbf{w}_{i,t}-\textbf{w}_t\Vert + I_{max}\big (\textbf{w}_{*,t}\big )\sum _{s=1}^S|(m_i(b=s)-m(b=s))| \end{aligned}$$

(63)

Then, we have:

$$\begin{aligned}&\left\| \textbf{w}_{i,t+1}-\textbf{w}_{*,t+1}\right\| \\&\le \left\| \textbf{w}_{i,t}-\textbf{w}_{*,t}\right\| +\left\| \textbf{vR}_{i,t+1}-\textbf{vR}_{*,t+1}\right\| \\&\le \Vert \textbf{w}_{i,t}-\textbf{w}_{*,t}\Vert +|\left( 1+f_{i}\right) \textbf{vR}_{i,t}-\left( f_{i}+1\right) \textbf{vR}_{*,t}\Vert +\eta \Vert \nabla F_i\big (\textbf{w}_{i,t}\big )-\nabla F\big (\textbf{w}_{*,t}\big )\Vert \\&\le \Vert \textbf{w}_{i,t}-\textbf{w}_{*,t}\Vert +|\left( 1+f_{i}\right) \textbf{vR}_{i,t}-\left( f_{i}+1\right) \textbf{vR}_{*,t}\Vert \\ & +\eta \sum _{s=1}^Sm_i(b = s)L_s\Vert \textbf{w}_{i,t}-\textbf{w}_{*,t}\Vert +\eta I_{max}\big(\textbf{w}_{*,t}\big)\sum _{s=1}^S|m_i(b = s)-m(b=s)| \\ &=\left( 1+\eta \sum _{s=1}^Sm_i(b=s)L_s\right) \Vert \textbf{w}_{i,t}-\textbf{w}_{*,t}\Vert +\Vert \left( 1+f_{i}\right) \textbf{vR}_{i,t}-\left( f_{i}+1\right) \textbf{vR}_{*,t}\Vert \\&\quad+\eta I_{max}\big (\textbf{w}_{*,t}\big )\sum _{s=1}^S|m_i(b=s)-m(b=s)| \end{aligned}$$

(64)

Letting $\beta = 1 + \eta \sum _{s=1}^S m_i(b = s)L_s$, we rewrite eqn. (64) as:

$$\begin{aligned} &\left\| \textbf{w}_{i,t+1}-\textbf{w}_{*,t+1}\right\| \le \beta \Vert \textbf{w}_{i,t}-\textbf{w}_{*,t}\Vert +\left( 1+f_{i}\right) (\Vert \textbf{vR}_{i,t}-\textbf{vR}_{*,t}\Vert ) \\&\quad+\eta I_{max}\left( \textbf{w}_{*,t}\right) \sum _{s=1}^S|m_i(b=s)-m(b=s)| \\&\le \beta ^2\Vert \textbf{w}_{i,t-1}-\textbf{w}_{*,t-1}\Vert +\beta \left( 1+f_{i}\right) (\Vert \textbf{vR}_{i,t}-\textbf{vR}_{*,t}\Vert ) \\&\quad+\beta \eta I_{max}\left( \textbf{w}_{*,t}\right) \sum _{s=1}^S|m_i(b=s)-m(b=s)|+\left( 1+f_{i}\right) (\Vert \textbf{vR}_{i,t}-\textbf{vR}_{*,t}\Vert ) \\&\quad+\eta I_{max}\left( \textbf{w}_{*,t}\right) \sum _{s=1}^S|m_i(b=s)-m(b=s)| \\&\le \beta ^{t+1}\Vert \textbf{w}_{i,0}-\textbf{w}_{*,0}\Vert +\left( 1+f_{i}\right) \sum _{j=0}^t\beta ^{t-j}\Vert \textbf{vR}_{i,j}-\textbf{vR}_{*,j}\Vert \\&\quad+\eta \sum _{s=1}^S|m_i(b=s)-m(b=s)|\sum _{j=0}^tI_{max}\left( \textbf{w}_{*,j}\right) \end{aligned}$$

(65)

In conclusion, the proof is complete.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, J., Sun, C. & Peng, Y. FedBat: a self-adapting bat algorithm-based federated learning approach. J Supercomput 81, 137 (2025). https://doi.org/10.1007/s11227-024-06514-x

Download citation

Accepted: 26 September 2024
Published: 06 November 2024
DOI: https://doi.org/10.1007/s11227-024-06514-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FedBat: a self-adapting bat algorithm-based federated learning approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing federated learning with dynamic weight adjustment based on particle swarm optimization

DDPG-FL: A Reinforcement Learning Approach for Data Balancing in Federated Learning

FopLAHD: Federated Optimization Using Locally Approximated Hessian Diagonal

Data availability

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix 1

Appendix 1

Proof of Theorem 1:

Proof of Theorem 2:

Proof of Theorem 3:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now