Abstract
Clustering is still one of the most common unsupervised learning techniques in data mining since it allows the discovery of meaningful and interesting patterns, knowledge, rules and associations from large-scale datasets. K-medoids, a variant of K-means, is a popular clustering method that attempts to find the optimal combination of K medoids from among a set of potential combinations. It has been successfully applied to solve various real-life problems owing to its simplicity and effectiveness. Nevertheless, due to the exponential number of possible combinations of K medoids, it is extremely challenging to produce the optimal one within a reasonable amount of time. Therefore, in this work, we propose to formulate the problem of K-medoids clustering as an optimization problem and then combine two effective and powerful Swarm Intelligence (SI) algorithms, namely Firefly Algorithm (FA) and Particle Swarm Optimization (PSO), to select the appropriate combination of K medoids. We extensively evaluate the proposed FA-PSO for K-medoids-based clustering, abbreviated as FA-PSO-KMED, using 10 UCI datasets. We first use the Iterated F-Race (I/F-Race) algorithm to determine the optimal parameter settings for FA and PSO. Then, we compare the results of the proposed FA-PSO-KMED with those obtained using the well-known state-of-the-art K-medoids-based clustering algorithms: PAM, CLARA and CLARANS. We also compare the results with 11 popular swarm intelligence algorithms: PSO, ABC, CS, FA, BA, APSO, EHO, HHO, SMA, AO and RSA. Experimental results and statistical analysis show that the proposed FA-PSO-KMED is very promising and demonstrates a significant improvement over the other clustering algorithms.
Similar content being viewed by others
References
Abualigah L, Yousri D, Abd Elaziz M et al (2021) Aquila optimizer: a novel meta-heuristic optimization algorithm. Comput Ind Eng 157(107):250. https://doi.org/10.1016/j.cie.2021.107250
Abualigah L, Abd Elaziz M, Sumari P et al (2022) Reptile search algorithm (RSA): a nature-inspired meta-heuristic optimizer. Expert Syst Appl 191(116):158. https://doi.org/10.1016/j.eswa.2021.116158
Agushaka JO, Ezugwu AE, Abualigah L (2022) Dwarf mongoose optimization algorithm. Comput Methods Appl Mech Eng 391(114):570. https://doi.org/10.1016/j.cma.2022.114570
Alrefai N, Ibrahim O (2022) Optimized feature selection method using particle swarm intelligence with ensemble learning for cancer classification based on microarray datasets. Neural Comput Appl 34:13513–13528. https://doi.org/10.1007/s00521-022-07147-y
Armano G, Farmani MR (2016) Multiobjective clustering analysis using particle swarm optimization. Expert Syst Appl 55:184–193. https://doi.org/10.1016/j.eswa.2016.02.009
Banharnsakun A (2017) A MapReduce-based artificial bee colony for large-scale data clustering. Pattern Recogn Lett 93:78–84. https://doi.org/10.1016/j.patrec.2016.07.027
Benmounah Z, Meshoul S, Batouche M et al (2018) Parallel swarm intelligence strategies for large-scale clustering based on MapReduce with application to epigenetics of aging. Appl Soft Comput 69:771–783. https://doi.org/10.1016/j.asoc.2018.04.012
Bousmaha R, Hamou RM, Amine A (2022) Automatic selection of hidden neurons and weights in neural networks for data classification using hybrid particle swarm optimization, multi-verse optimization based on Lévy flight. Evol Intel 15(3):1695–1714. https://doi.org/10.1007/s12065-021-00579-w
Chen J, Qi X, Chen L et al (2020) Quantum-inspired ant lion optimized hybrid k-means for cluster analysis and intrusion detection. Knowl-Based Syst 203(106):167. https://doi.org/10.1016/j.knosys.2020.106167
Danesh M, Shirgahi H (2017) A novel hybrid knowledge of firefly and PSO swarm intelligence algorithms for efficient data clustering. J Intell Fuzzy Syst 33(6):3529–3538. https://doi.org/10.3233/JIFS-17170
Das A, Dhal KG, Ray S et al (2022) Fitness based weighted flower pollination algorithm with mutation strategies for image enhancement. Multimed Tools Appl. https://doi.org/10.1007/s11042-022-12879-z
Dey A, Dey S, Bhattacharyya S et al (2020) Novel quantum inspired approaches for automatic clustering of gray level images using particle swarm optimization, spider monkey optimization and ageist spider monkey optimization algorithms. Appl Soft Comput 88(106):040. https://doi.org/10.1016/j.asoc.2019.106040
Dhal KG, Das A, Ray S et al (2021) Randomly attracted rough firefly algorithm for histogram based fuzzy image clustering. Knowl-Based Syst 216(106):814. https://doi.org/10.1016/j.knosys.2021.106814
Dua D, Graff C (2019) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA
D’Urso P, De Giovanni L, Vitale V (2022) A robust method for clustering football players with mixed attributes. Ann Oper Res. https://doi.org/10.1007/s10479-022-04558-x
Ezugwu AE, Ikotun AM, Oyelade OO et al (2022) A comprehensive survey of clustering algorithms: state-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Eng Appl Artif Intell 110(104):743. https://doi.org/10.1016/j.engappai.2022.104743
Feng Y, Lu H, Xie W et al (2018) An improved fuzzy c-means clustering algorithm based on multi-chain quantum bee colony optimization. Wirel Pers Commun 102(2):1421–1441. https://doi.org/10.1007/s11277-017-5203-2
Gao Z, Zhang C, Li Z (2022) Financial sequence prediction based on swarm intelligence algorithms and internet of things. J Supercomput 78:17470–17490. https://doi.org/10.1007/s11227-022-04572-7
Ghosh P, Mali K, Das SK (2018) Chaotic firefly algorithm-based fuzzy c-means algorithm for segmentation of brain tissues in magnetic resonance images. J Vis Commun Image Represent 54:63–79. https://doi.org/10.1016/j.jvcir.2018.04.007
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier. https://doi.org/10.1016/C2009-0-61819-5
Hashim FA, Hussien AG (2022) Snake optimizer: a novel meta-heuristic optimization algorithm. Knowl-Based Syst 242(108):320. https://doi.org/10.1016/j.knosys.2022.108320
Heidari AA, Mirjalili S, Faris H et al (2019) Harris hawks optimization: algorithm and applications. Futur Gener Comput Syst 97:849–872. https://doi.org/10.1016/j.future.2019.02.028
Ilango SS, Vimal S, Kaliappan M et al (2019) Optimization using artificial bee colony based clustering approach for big data. Clust Comput 22(5):12169–12177. https://doi.org/10.1007/s10586-017-1571-3
Jaya Mabel Rani A, Pravin A (2022) Clustering by hybrid k-means and black hole entropic fuzzy clustering algorithm for medical data. Int J Model Simul Sci Comput. https://doi.org/10.1142/S179396232341012X
Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical report, Technical report-tr06, Erciyes university, Engineering Faculty, Computer Engineering Department
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley. https://doi.org/10.1002/9780470316801
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of the IEEE international conference on neural networks, pp 1942–1948. https://doi.org/10.1109/ICNN.1995.488968
Khennak I, Drias H (2017) An accelerated PSO for query expansion in web information retrieval: application to medical dataset. Appl Intell 47(3):793–808. https://doi.org/10.1007/s10489-017-0924-1
Kumar A, Kumar D, Jarial S (2018) A novel hybrid k-means and artificial bee colony algorithm approach for data clustering. Decis Sci Lett 7(1):65–76. https://doi.org/10.5267/j.dsl.2017.4.003
Kumar Y, Singh PK (2018) Improved cat swarm optimization algorithm for solving global optimization problems and its application to clustering. Appl Intell 48(9):2681–2697. https://doi.org/10.1007/s10489-017-1096-8
Kuo RJ, Zulvia FE (2018) Automatic clustering using an improved artificial bee colony optimization for customer segmentation. Knowl Inf Syst 57(2):331–357. https://doi.org/10.1007/s10115-018-1162-5
Li S, Chen H, Wang M et al (2020) Slime Mould algorithm: a new method for stochastic optimization. Futur Gener Comput Syst 111:300–323. https://doi.org/10.1016/j.future.2020.03.055
Liao SH, Widowati R, Puttong P (2022) Data mining analytics investigate Facebook live stream users’ behaviors and business models: the evidence from Thailand. Entertain Comput 41(100):478. https://doi.org/10.1016/j.entcom.2022.100478
López-Ibáñez M, Dubois-Lacoste J, Pérez Cáceres L et al (2016) The irace package: iterated racing for automatic algorithm configuration. Oper Res Perspect 3:43–58. https://doi.org/10.1016/j.orp.2016.09.002
Majumder A (2022) Termite alate optimization algorithm: a swarm-based nature inspired algorithm for optimization problems. Evolut Intell. https://doi.org/10.1007/s12065-022-00714-1
Menéndez HD, Otero FE, Camacho D (2016) Medoid-based clustering using ant colony optimization. Swarm Intell 10(2):123–145. https://doi.org/10.1007/s11721-016-0122-5
Ng RT, Han J (2002) CLARANS: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016. https://doi.org/10.1109/TKDE.2002.1033770
Oyelade ON, Ezugwu AES, Mohamed TI et al (2022) Ebola optimization search algorithm: a new nature-inspired metaheuristic optimization algorithm. IEEE Access 10:16150–16177. https://doi.org/10.1109/ACCESS.2022.3147821
Pandey KK, Shukla D (2022) Min–max kurtosis mean distance based k-means initial centroid initialization method for big genomic data clustering. Evolut Intell. https://doi.org/10.1007/s12065-022-00720-3
Prakash V, Vinothina V, Kalaiselvi K et al (2022) An improved bacterial colony optimization using opposition-based learning for data clustering. Clust Comput. https://doi.org/10.1007/s10586-022-03633-z
Sancho A, Ribeiro J, Reis MS et al (2022) Cluster analysis of crude oils with k-means based on their physicochemical properties. Comput Chem Eng 157(107):633. https://doi.org/10.1016/j.compchemeng.2021.107633
Tan WH, Mohamad-Saleh J (2022) Modified normative fish swarm algorithm for optimizing power extraction in photovoltaic systems. Evolut Intell. https://doi.org/10.1007/s12065-022-00724-z
Tripathi AK, Sharma K, Bala M (2018) Dynamic frequency based parallel k-bat algorithm for massive data clustering (DFBPKBA). Int J Syst Assur Eng Manag 9(4):866–874. https://doi.org/10.1007/s13198-017-0665-x
Verma H, Verma D, Tiwari PK (2021) A population based hybrid FCM-PSO algorithm for clustering analysis and segmentation of brain image. Expert Syst Appl 167(114):121. https://doi.org/10.1016/j.eswa.2020.114121
Wang GG, Deb S, Coelho LS (2015) Elephant herding optimization. In: 2015 3rd international symposium on computational and business intelligence, pp 1–5. https://doi.org/10.1109/ISCBI.2015.8
Xie H, Zhang L, Lim CP et al (2019) Improving k-means clustering with enhanced firefly algorithms. Appl Soft Comput 84(105):763. https://doi.org/10.1016/j.asoc.2019.105763
Xu F, Li Z, Mao M et al (2022) LGWO-SVM geological steering identification method for shale gas based on a gamma spectral dataset. Neural Comput Appl 34(15):12317–12329. https://doi.org/10.1007/s00521-021-06570-x
Yang XS (2009) Firefly algorithms for multimodal optimization. In: Proceedings of the 5th international symposium on stochastic algorithms: foundations and applications, pp 169–178. https://doi.org/10.1007/978-3-642-04944-6_14
Yang XS (2010) Engineering optimization: an introduction with metaheuristic applications. Wiley. https://doi.org/10.1002/9780470640425
Yang XS (2010b) A new metaheuristic bat-inspired algorithm. In: Nature inspired cooperative strategies for optimization, pp 65–74. https://doi.org/10.1007/978-3-642-12538-6_6
Yang XS, Deb S (2009) Cuckoo search via lévy flights. In: 2009 world congress on nature & biologically inspired computing), pp 210–214. https://doi.org/10.1109/NABIC.2009.5393690
Acknowledgements
We would like to express our special thanks of gratitude to the Directorate General for Scientific Research and Technological Development (DGRSDT), for the support of this work under the grant number C0662300.
Funding
This work was supported by the Directorate General for Scientific Research and Technological Development (DGRSDT) under the grant number C0662300.
Author information
Authors and Affiliations
Contributions
Supervision Habiba Drias. Concept and Design Ilyes Khennak, Faysal Bendakir and Samy Hamdi. Data Collection and/or Processing Ilyes Khennak, Faysal Bendakir and Samy Hamdi. Analysis and/or Interpretation Ilyes Khennak. Literature Search Ilyes Khennak. Manuscript Writing Ilyes Khennak. Critical Review Habiba Drias and Yassine Drias
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed Consent
The authors consent the declarations and the publication of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Khennak, I., Drias, H., Drias, Y. et al. I/F-Race tuned firefly algorithm and particle swarm optimization for K-medoids-based clustering. Evol. Intel. 16, 351–373 (2023). https://doi.org/10.1007/s12065-022-00794-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-022-00794-z