Abstract
Federated learning (FL) is an advanced distributed machine learning (ML) framework designed to address issues related to data silos and data privacy. In real-world applications, common problems like non-convex optimization and nonindependent and identically distributed (Non-IID) client data reduce training efficiency, cause local optima, and degrade performance. Therefore, we propose a FL scheme based on the bat algorithm (FedBat), which leverages the echolocation mechanism of bats to effectively balance global and local search capabilities, enabling the algorithm to escape local optima with a certain probability. By combining global optimal model weight optimization with dynamically adjusted search strategies, FedBat guides weaker client models toward the global optimum, thereby accelerating convergence. Additionally, FedBat allows for adaptive parameter adjustments across various datasets. To mitigate client drift, we extend FedBat with Jensen–Shannon (JS) divergence to quantify differences between local and global models. Clients decide whether to upload their local models based on this divergence, to enhance the global model’s generalization and minimize communication overhead. Experimental results demonstrate that FedBat converges 5 times faster and enhances test accuracy by more than 40\(\%\) compared to FedAvg. The extended FedBat effectively mitigates the decrease in the generalization performance of the global model and reduces communication costs by approximately 20\(\%\). Comparing FedPso, FedGwo, and FedProx shows that FedBat demonstrates superior performance in terms of convergence speed and test accuracy. Furthermore, we derive the formula for the expected convergence rate of FedBat, analyze the impact of various parameters on FL performance, and establish the upper bound of FedBat to evaluate its model divergence.






Similar content being viewed by others
Data availability
No datasets were generated or analysed during the current study.
References
Brynjolfsson E, Li D, Raymond LR (2023) Generative ai at work. Technical report, National Bureau of Economic Research
Zheng Z, Pan Z, Wang D, Zhu K, Zhao W, Guo T, Qiu X, Sun M, Bai J, Zhang F et al (2023) Bladedisc: optimizing dynamic shape machine learning workloads via compiler approach. Proce ACM Manage Data 1(3):1–29
Akarvardar K, Wong H-SP (2023) Technology prospects for data-intensive computing. Procee IEEE 111(1):92–112
Al-Maroof RS, Alhumaid K, Alshaafi A, Akour I, Bettayeb A, Alfaisal R, Salloum SA (2024) A comparative analysis of chatgpt and google in educational settings: understanding the influence of mediators on learning platform adoption. Artificial Intelligence in Education The Power and Dangers of ChatGPT in the Classroom 386:365
Frey CB, Presidente G (2024) Privacy regulation and firm performance: estimating the gdpr effect globally. Economic Inquiry. https://doi.org/10.1111/ecin.13213
McMahan B, Moore E, Ramage D, Hampson S, Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data In: Artificial Intelligence and Statistics. PMLR, Westminster, p 24
Myrzashova R, Alsamhi SH, Shvetsov AV, Hawbani A, Wei X (2023) Blockchain meets federated learning in healthcare: a systematic review with challenges and opportunities. IEEE Internet Things J 10(16):14418–14437
Rahman A, Hossain MS, Muhammad G, Kundu D, Debnath T, Rahman M, Khan MSI, Tiwari P, Band SS (2023) Federated learning-based ai approaches in smart healthcare: concepts, taxonomies, challenges and open issues. Cluster comput 26(4):2271–2311
Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN, Bonawitz K, Charles Z, Cormode G, Cummings R et al (2021) Advances and open problems in federated learning. Foundat Trends Machine Learn 14(1):14–210
Nguyen H, Wu P, Chang JM (2023) Federated learning for distribution skewed data using sample weights. IEEE Trans Artific Intell. https://doi.org/10.1109/TAI.2023.3348073
Zhao Y, Li M, Lai L, Suda N, Civin D, Chandra V (2018) Federated learning with non-iid data. arXiv Preprint. https://doi.org/10.48550/arXiv.1806.005
Zhu H, Zhang H, Jin Y (2021) From federated learning to federated neural architecture search: a survey. Complex Intell Syst 7(2):639–657
Zhang H, Jin Y, Cheng R, Hao K (2020) Efficient evolutionary search of attention convolutional networks via sampled training and node inheritance. IEEE Trans Evol Comput 25(2):371–385
Zeng N, Li H, Wang Z, Liu W, Liu S, Alsaadi FE, Liu X (2021) Deep-reinforcement-learning-based images segmentation for quantitative analysis of gold immunochromatographic strip. Neurocomputing 425:173–180
Yang X-S (2010) A new metaheuristic bat-inspired algorithm. In: Nature Inspired Cooperative Strategies for Optimization (NICSO 2010), pp. 65-74. Springer, ???
Khanduja N, Bhushan B (2021) Metaheurist Evol Comput Algorith Appl. Recent advances and application of metaheuristicalgorithms: A survey (2014–2020)., Metaheuristic and Evolutionary Computation, pp 207–228
Woodworth B, Patel KK, Stich S, Dai Z, Bullins B, Mcmahan B, Shamir O, Srebro N (2020) Is local sgd better than minibatch sgd? In: International Conference on Machine Learning, pp. 10334–10343 . PMLR
Wu X, Huang F, Hu Z, Huang H (2023) Faster adaptive federated learning. Proce AAAI Conf Artific Intell 37:10379–10387
Tuor T, Wang S, Ko BJ, Liu C, Leung KK (2021) Overcoming noisy and irrelevant data in federated learning. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 5020–5027. IEEE
Yoshida N, Nishio T, Morikura M, Yamamoto K, Yonetani R (2020) Hybrid-fl for wireless networks: Cooperative learning mechanism using non-iid data. In: ICC 2020-2020 IEEE International Conference on Communications (ICC), pp. 1–7. IEEE
Duan M, Liu D, Chen X, Tan Y, Ren J, Qiao L, Liang L (2019) Astraea: Self-balancing federated learning for improving classification accuracy of mobile deep learning applications. In: 2019 IEEE 37th International Conference on Computer Design (ICCD), pp. 246–254. IEEE
Zhang H (2017) Mixup Beyond empirical risk minimization. Preprint, arXiv:1710.09412
Li Z, Shao J, Mao Y, Wang JH, Zhang J (2022) Federated learning with gan-based data synthesis for non-iid clients. In: International workshop on trustworthy federated learning, pp. 17–32. Springer
Reddi S, Charles Z, Zaheer M, Garrett Z, Rush K, Konečnỳ J, Kumar S, McMahan HB (2020) Adaptive federated optimization. arXiv preprint arXiv:2003.00295
Li T, Sahu AK, Zaheer M, Sanjabi M, Talwalkar A, Smith V (2020) Federated optimization in heterogeneous networks. Proce Mach Learn syst 2:429–450
Wu H, Wang P (2021) Fast-convergent federated learning with adaptive weighting. IEEE Trans Cognit Commun Network 7(4):1078–1088
Akay B, Karaboga D, Akay R (2022) A comprehensive survey on optimizing deep learning models by metaheuristics. Artificial Intelligence Review 55(2):829–894
Park S, Suh Y, Lee J (2021) Fedpso: federated learning using particle swarm optimization to reduce communication costs. Sensors 21(2):600
Houssein EH, Gad AG, Hussain K, Suganthan PN (2021) Major advances in particle swarm optimization: theory, analysis, and application. Swarm Evol Comput 63:100868
Nguyen LT, Kim J, Shim B (2021) Gradual federated learning with simulated annealing. IEEE Trans Signal Proces 69:6299–6313
Kumbhare S, Kathole AB, Shinde S (2023) Federated learning aided breast cancer detection with intelligent heuristic-based deep learning framework. Biomed Signal Proces Control 86:105080
Wang H, Kaplan Z, Niu D, Li B (2020) Optimizing federated learning on non-iid data with reinforcement learning. In: IEEE Infocom 2020-IEEE Conference on Computer Communications, pp. 1698–1707. IEEE
Luping W, Wei W, Bo L (2019) Cmfl: Mitigating communication overhead for federated learning. In: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pp. 954–964. IEEE
Nguyen HT, Sehwag V, Hosseinalipour S, Brinton CG, Chiang M, Poor HV (2020) Fast-convergent federated learning. IEEE J Select Areas Commun 39(1):201–218
Kaveh M, Mesgari MS (2023) Application of meta-heuristic algorithms for training neural networks and deep learning architectures: a comprehensive review. Neural Proces Lett 55(4):4519–4622
Khan MS, Jabeen F, Ghouzali S, Rehman Z, Naz S, Abdul W (2021) Metaheuristic algorithms in optimizing deep neural network model for software effort estimation. Ieee Access 9:60309–60327
Zeng N, Song D, Li H, You Y, Liu Y, Alsaadi FE (2021) A competitive mechanism integrated multi-objective whale optimization algorithm with differential evolution. Neurocomputing 432:170–182
Liu W, Chen L, Chen Y, Zhang W (2020) Accelerating federated learning via momentum gradient descent. IEEE Trans Parallel Distribut Syst 31(8):1754–1766
Fan X, Wang Y, Huo Y, Tian Z (2023) Cb-dsl: Communication-efficient and byzantine-robust distributed swarm learning on non-iid data. IEEE Transactions on Cognitive Communications and Networking
Wang J, Joshi G (2021) Cooperative sgd: a unified framework for the design and analysis of local-update sgd algorithms. J Mach Learn Res 22(213):1–50
Bernstein J, Wang Y-X, Azizzadenesheli K, Anandkumar A (2018) signsgd: Compressed optimisation for non-convex problems. In: International Conference on Machine Learning, pp. 560–569. PMLR
Zhang C, Cai Y, Lin G, Shen C (2022) Deepemd: Differentiable earth mover’s distance for few-shot learning. IEEE Trans Pattern Anal Mach Intell 45(5):5632–5648
Rubner Y, Tomasi C, Guibas LJ (2000) The earth mover’s distance as a metric for image retrieval. Int J Comput Vsion 40:99–121
Tan AZ, Yu H, Cui L, Yang Q (2022) Towards personalized federated learning. IEEE Tansa Nural Ntworks Larn Syts 34(12):9587–9603
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223. PMLR
Bobbia B, Picard M (2024) Active learning for regression based on wasserstein distance and groupsort neural networks. arXiv preprint arXiv:2403.15108
Zhang Y, Pan J, Li LK, Liu W, Chen Z, Liu X, Wang J (2024) On the properties of kullback-leibler divergence between multivariate gaussian distributions. Advances in Neural Information Processing Systems 36
Lin J (1991) Divergence measures based on the shannon entropy. IEEE Trans Inf Theory 37(1):145–151
Abasi AK, Aloqaily M, Guizani M (2022) Grey wolf optimizer for reducing communication cost of federated learning. In: GLOBECOM 2022-2022 IEEE Global Communications Conference, pp. 1049–1054. IEEE
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proce EEE 86(11):2278–2324
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Author information
Authors and Affiliations
Contributions
J.W., C.S., and Y.P. equally contributed to the conception and design of the study, data collection and analysis, and writing of the manuscript. All authors reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is supported by the National Natural Science Foundation of China (No. 41672114).
Appendix 1
Appendix 1
In this section, we perform a convergence analysis of the proposed FedBat algorithm. Each participating client uses the bat algorithm combined with SGD to update its local model. We will repeatedly use the following facts and illustrate some assumptions throughout the text.
-
Fact 1: \(\langle \textbf{p}, \textbf{q} \rangle = \Vert \textbf{p} \Vert \Vert \textbf{p} \Vert \cos \theta \ge -\Vert \textbf{p} \Vert \Vert \textbf{q} \Vert \)
-
Fact 2: \(\Vert \textbf{p}+\textbf{q}\Vert \le \Vert \textbf{p}\Vert +\Vert \textbf{q}\Vert\)
-
Fact 3: \(\Vert \textbf{pq}\Vert \le \Vert \textbf{p}\Vert \Vert \textbf{q}\Vert\)
-
Fact 4: For any set of vectors\(\left\{ \textbf{p}_i\right\} _{i=1}^n\) and nonnegative scalars \(\left\{ q_i\right\} _{i=1}^n\), define the weighted sum \(s=\sum _{i=1}^nq_i\). Then, according to Jensen’s inequality, we have: \(\left| \left| \sum _{i=1}^n\frac{q_i}{s}\textbf{p}_i\right| \right| ^2\le \sum _{i=1}^n\frac{q_i}{s}\Vert \textbf{p}_i\Vert ^2\)
-
Fact 5: For any set of vectors \(\left\{ \textbf{p}_i\right\} _{i=1}^n\), we have \(\left\| \sum _{i=1}^n\textbf{p}_i\right\| ^2\le n\sum _{i=1}^n\Vert \textbf{p}_i\Vert ^2\).
Proof of Theorem 1:
According to Assumption 1, where \(F_i(\cdot )\) is L-smooth, we have:
We next focus on bounding A. According to the Fact 1 and eqn. (6), we have:
By combining eqn. (14) and eqn. (15), it follows that:
By combining the Fact 2 and eqn. (10), we have:
where (a) uses Face 2 again.
According to the eqn. (20)-(25), we have:
Substituting eqn. (44)-(46) to A2, we have:
We next aim to bound B.
By combining the Fact 2 and eqn. (11), we have:
where (b) uses Jensen’s inequality in Fact 5, the quadratic term is expanded.
According to the eqn. (23)-(25), we have:
Substituting eqn. (50)-(52) to B1, we have:
Using eqn. (53) and (47), we complete the proof.
Let \(\Phi ={\underline{u}}{\underline{q}}(1+f_{i})+{\underline{u}}_{*}{\underline{q}}_{*}+\eta +p\epsilon \varphi {\underline{u}}_{A}{\underline{q}}_{A}-\frac{5\,L}{2}[{\overline{u}}^2(1+{f_i}^2)+{f_i}^2{\overline{u}}_*^2 + \eta ^2 + p^2 \epsilon ^2 \varphi ^2{\overline{u}}_A^2]\), we have:
Take expected values on both sides:
Now we apply eqn. (56) over and over again:
Sum the above up, we have:
Divide both sides by T, we have:
Let us assume that \(F_{1}-F_{*}\le D\). Then if we set \(\eta \le \frac{1}{5\,L}\), we can get:
We can see that the convergence rate of FedBat satisfies \({\mathcal {O}}(\frac{1}{T})\).
Proof of Theorem 2:
In the proof process, we derived eqn. (59), which provides the convergence bound for FedBat. To enhance these guarantees, we introduce the Polyak-Lojasiewicz (PL) condition. Starting from equ. (56), we have:
By combining the Assumption 2:
To make this value less than or equal to \(\Lambda\), we have:
Proof of Theorem 3:
Based on the definitions of gradients at both the local clients and the virtual client, the gradients are given by \(\nabla F_i(\textbf{w}_{i,t})=\sum _{s=1}^Sm_i(b{=}s)\nabla {\mathbb {E}}_{\textbf{a}|b{=}s}[I_s(\textbf{a},\textbf{w}_{i,t})]\) and \(\nabla F_i(\textbf{w}_{*,t})=\sum _{s=1}^Sm_i(b{=}s)\nabla {\mathbb {E}}_{\textbf{a}|b{=}s}[I_s(\textbf{a},\textbf{w}_{*,t})]\).
Letting \(I_{\text {max}}(\textbf{w}_{*,t}) = \max \{\nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a}, \textbf{w}_{*,t})]\}_{s=1}^S\). Using the Lipschitz continuity\(\Vert \nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a},\mathbf {w_1})]-\nabla {\mathbb {E}}_{\textbf{a}|b=s}[I_s(\textbf{a},\mathbf {w_2})]\Vert \le L_s\Vert \mathbf {w_1}-\mathbf {w_2}\Vert\), the equality of eqn. (62) can be further rewritten as:
Then, we have:
Letting \(\beta = 1 + \eta \sum _{s=1}^S m_i(b = s)L_s\), we rewrite eqn. (64) as:
In conclusion, the proof is complete.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, J., Sun, C. & Peng, Y. FedBat: a self-adapting bat algorithm-based federated learning approach. J Supercomput 81, 137 (2025). https://doi.org/10.1007/s11227-024-06514-x
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-024-06514-x