Abstract
Clustering is an ideal tool for working with big data and searching for structures in the data set. Clustering aims at maximizing the similarity between the data within a cluster and minimizing the similarity between the data between different clusters. This study presents a new and improved Particle Swarm Optimization (PSO) algorithm using pattern reduction and reducing the clustering calculation time with Multistart Pattern Reduction-Enhanced PSO (MPREPSO). This method adds two pattern reduction operators and multistart operators into the PSO algorithms. The goal of the pattern reduction operator is to reduce the computational time from the compression of static patterns. The purpose of the multistart operator is to avoid falling into the local optimal by enforcing diversity in the population. Two pattern reduction and multistart operators are combined with the PSO algorithm to evaluate the performance of this method.









Similar content being viewed by others
References
Cheng S, Zhang Q, Qin Q. Big data analytics with swarm intelligence. Ind Manag Data Syst. 2016;116(4):646–66. https://doi.org/10.1108/IMDS-06-2015-0222.
Verma H, Verma D, Tiwari PK. A population based hybrid FCM-PSO algorithm for clustering analysis and segmentation of brain image. Expert Syst Appl. 2021;167: 114121. https://doi.org/10.1016/j.eswa.2020.114121.
Zhang C, Ouyang D, Ning J. An artificial bee colony approach for clustering. Expert Syst Appl. 2010;37(7):4761–7. https://doi.org/10.1016/j.eswa.2009.11.003.
Kuo RJ, Wang MJ, Huang TW. An application of particle swarm optimization algorithm to clustering analysis. Soft Comput. 2011;15(3):533–42. https://doi.org/10.1007/s00500-009-0539-5.
Tsai C-W, Huang K-W, Yang C-S, Chiang M-C. A fast particle swarm optimization for clustering. Soft Comput. 2015;19(2):321–38. https://doi.org/10.1007/s00500-014-1255-3.
Kogan J. Introduction to clustering large and high-dimensional data. Cambridge: Cambridge University Press; 2007.
Bagirov AM, Ugon J, Webb D. Fast modified global k-means algorithm for incremental cluster construction. Pattern Recogn. 2011;44(4):866–76. https://doi.org/10.1016/j.patcog.2010.10.018.
Xu R, WunschII D. Survey of clustering algorithms. IEEE Trans Neural Netw. 2005;16(3):645–78. https://doi.org/10.1109/TNN.2005.845141.
Lai JZC, Huang T-J, Liaw Y-C. A fast -means clustering algorithm using cluster center displacement. Pattern Recogn. 2009;42(11):2551–6. https://doi.org/10.1016/j.patcog.2009.02.014.
Chiang M-C, Tsai C-W, Yang C-S. A time-efficient pattern reduction algorithm for k-means clustering. Inf Sci. 2011;181(4):716–31. https://doi.org/10.1016/j.ins.2010.10.008.
van der Merwe DW, Engelbrecht AP. Data clustering using particle swarm optimization. In: The 2003 Congress on Evolutionary Computation, 2003. CEC ’03. (Vol. 1, pp. 215–220). IEEE. https://doi.org/10.1109/CEC.2003.1299577
Paterlini S, Krink T. Differential evolution and particle swarm optimization in partitional clustering. Comput Stat Data Anal. 2006;50(5):1220–47. https://doi.org/10.1016/j.csda.2004.12.004.
Parsopoulos KE, Vrahatis MN. Particle swarm optimization and intelligence: advances and applications: advances and applications. Chennai: IGI Global; 2010.
Su S, Zhao S. An optimal clustering mechanism based on Fuzzy-C means for wireless sensor networks. Sustain Comput Inform Syst. 2018;18:127–34. https://doi.org/10.1016/j.suscom.2017.08.001.
Ripan RC, Sarker IH, Hossain SMM, Anwar MM, Nowrozy R, Hoque MM, Furhad MH. A data-driven heart disease prediction model through K-means clustering-based anomaly detection. SN Comput Sci. 2021;2(2):112. https://doi.org/10.1007/s42979-021-00518-7.
Kaur A, Kaur R, Jagdev G. Analyzing and exploring the impact of big data analytics in sports sector. SN Comput Sci. 2021;2(3):184. https://doi.org/10.1007/s42979-021-00575-y.
Sharma M, Chhabra JK. Sustainable automatic data clustering using hybrid PSO algorithm with mutation. Sustain Comput Inform Syst. 2019;23:144–57. https://doi.org/10.1016/j.suscom.2019.07.009.
Su Z, Wang P, Shen J, Li Y, Zhang Y, Hu E. Automatic fuzzy partitioning approach using variable string length artificial bee colony (VABC) algorithm. Appl Soft Comput. 2012;12(11):3421–41. https://doi.org/10.1016/j.asoc.2012.06.019.
Mitra S, Banka H. Multi-objective evolutionary biclustering of gene expression data. Pattern Recogn. 2006;39(12):2464–77. https://doi.org/10.1016/j.patcog.2006.03.003.
Jain AK, Duin PW, Mao Jianchang. Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell. 2000;22(1):4–37. https://doi.org/10.1109/34.824819.
Reddy CK. Data Clustering. In: Aggarwal CC, Reddy CK (eds). Chapman and Hall/CRC 2018. https://doi.org/10.1201/9781315373515
Kuo RJ, Potti Y, Zulvia FE. Application of metaheuristic based fuzzy K-modes algorithm to supplier clustering. Comput Ind Eng. 2018;120:298–307. https://doi.org/10.1016/j.cie.2018.04.050.
Baskar A. Clustering of Indian districts based on supply chain requirements. Mater Today Proc. 2021;46:9914–9. https://doi.org/10.1016/j.matpr.2021.02.292.
Allen TT, Sui Z, Parker NL. Timely decision analysis enabled by efficient social media modeling. Decis Anal. 2017;14(4):250–60. https://doi.org/10.1287/deca.2017.0360.
Rose RL, Puranik TG, Mavris DN. Natural language processing based method for clustering and analysis of aviation safety narratives. Aerospace. 2020;7(10):143. https://doi.org/10.3390/aerospace7100143.
Tang R, Fong S. Clustering big IoT data by metaheuristic optimized mini-batch and parallel partition-based DGC in Hadoop. Futur Gener Comput Syst. 2018;86:1395–412. https://doi.org/10.1016/j.future.2018.03.006.
Masoudi-Sobhanzadeh Y, Jafari B, Parvizpour S, Pourseif MM, Omidi Y. A novel multi-objective metaheuristic algorithm for protein-peptide docking and benchmarking on the LEADS-PEP dataset. Comput Biol Med. 2021;138: 104896. https://doi.org/10.1016/j.compbiomed.2021.104896.
Kraus JM, Kestler HA. A highly efficient multi-core algorithm for clustering extremely large datasets. BMC Bioinform. 2010;11(1):169. https://doi.org/10.1186/1471-2105-11-169.
Kuo RJ, Zheng YR, Nguyen TPQ. Metaheuristic-based possibilistic fuzzy k-modes algorithms for categorical data clustering. Inf Sci. 2021;557:1–15. https://doi.org/10.1016/j.ins.2020.12.051.
Kennedy J, Eberhart R. Particle swarm optimization. In: Proceedings of ICNN’95—International Conference on Neural Networks (Vol. 4, pp. 1942–1948). 1995. IEEE. https://doi.org/10.1109/ICNN.1995.488968
Assareh E, Behrang MA, Assari MR, Ghanbarzadeh A. Application of PSO (particle swarm optimization) and GA (genetic algorithm) techniques on demand estimation of oil in Iran. Energy. 2010;35(12):5223–9. https://doi.org/10.1016/j.energy.2010.07.043.
Zhu Z, Zhou J, Ji Z, Shi Y-H. DNA sequence compression using adaptive particle swarm optimization-based memetic algorithm. IEEE Trans Evol Comput. 2011;15(5):643–58. https://doi.org/10.1109/TEVC.2011.2160399.
Alam S, Dobbie G, Koh YS, Riddle P, Ur Rehman S. Research on particle swarm optimization based clustering: a systematic review of literature and techniques. Swarm Evol Comput. 2014;17:1–13. https://doi.org/10.1016/j.swevo.2014.02.001.
Akbar S, Pardasani KR, Panda NR. PSO based neuro-fuzzy model for secondary structure prediction of protein. Neural Process Lett. 2021;53(6):4593–612. https://doi.org/10.1007/s11063-021-10615-6.
Alswaitti M, Albughdadi M, Isa NAM. Density-based particle swarm optimization algorithm for data clustering. Expert Syst Appl. 2018;91:170–86. https://doi.org/10.1016/j.eswa.2017.08.050.
Rengasamy S, Murugesan P. PSO based data clustering with a different perception. Swarm Evol Comput. 2021;64: 100895. https://doi.org/10.1016/j.swevo.2021.100895.
Malarvizhi K, Amshakala K. Data clustering using hybrid of feature linkage weight based feature reduction and particle Swarm optimization. Mater Today Proc. 2021. https://doi.org/10.1016/j.matpr.2021.01.514.
Tarkhaneh O, Isazadeh A, Khamnei HJ. A new hybrid strategy for data clustering using cuckoo search based on Mantegna levy distribution, PSO and k-means. Int J Comput Appl Technol. 2018;58(2):137–49. https://doi.org/10.1504/IJCAT.2018.094576.
Liu B, Li J, Lin W, Bai W, Li P, Gao Q. K-PSO: an improved PSO-based container scheduling algorithm for big data applications. Int J Netw Manag. 2021;31(2): e2092. https://doi.org/10.1002/nem.2092.
Omran MG, Engelbrecht AP, Salman A. Image classification using particle swarm optimization. In: Recent advances in simulated evolution and learning. Chennai: World Scientific; 2004. p. 347–65.
Alguliyev RM, Aliguliyev RM, Sukhostat LV. Parallel batch k-means for Big data clustering. Comput Ind Eng. 2021;152: 107023. https://doi.org/10.1016/j.cie.2020.107023.
Hatamlou A, Abdullah S, Nezamabadi-pour H. A combined approach for clustering based on K-means and gravitational search algorithms. Swarm Evol Comput. 2012;6:47–52. https://doi.org/10.1016/j.swevo.2012.02.003.
Niknam T, Taherian Fard E, Pourjafarian N, Rousta A. An efficient hybrid algorithm based on modified imperialist competitive algorithm and K-means for data clustering. Eng Appl Artif Intell. 2011;24(2):306–17. https://doi.org/10.1016/j.engappai.2010.10.001.
Rana S, Jasola S, Kumar R. A review on particle swarm optimization algorithms and their applications to data clustering. Artif Intell Rev. 2011;35(3):211–22. https://doi.org/10.1007/s10462-010-9191-9.
Silva Filho TM, Pimentel BA, Souza RMCR, Oliveira ALI. Hybrid methods for fuzzy clustering based on fuzzy c-means and improved particle swarm optimization. Expert Syst Appl. 2015;42(17–18):6315–28. https://doi.org/10.1016/j.eswa.2015.04.032.
Črepinšek M, Liu S-H, Mernik M. Exploration and exploitation in evolutionary algorithms. ACM Comput Surv. 2013;45(3):1–33. https://doi.org/10.1145/2480741.2480752.
Lee YL, El-Saleh AA, Ismail M. Gravity-based particle swarm optimization with hybrid cooperative swarm approach for global optimization. J Intell Fuzzy Syst. 2014;26(1):465–81. https://doi.org/10.3233/IFS-130872.
Pei S, Tong L. Gaussian kernel particle swarm optimization clustering algorithm. In: 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), 2016; 198–204. https://doi.org/10.1109/FSKD.2016.7603174
Elkan C. Using the triangle inequality to accelerate k-means. In: Proceedings of the 20th international conference on Machine Learning (ICML-03) 2003; pp. 147–153.
Lu Y, Lu S, Fotouhi F, Deng Y, Brown SJ. FGKA. In: Proceedings of the 2004 ACM symposium on Applied computing—SAC ’04 (p. 622). New York, New York, USA: ACM Press 2004. https://doi.org/10.1145/967900.968029
Amiri B, Hossain L, Mosavi SE. Application of harmony search algorithm on clustering. In: Proceedings of the world congress on engineering and computer science (Vol. 1, pp. 20–22) 2010.
Maulik U, Bandyopadhyay S. Genetic algorithm-based clustering technique. Pattern Recogn. 2000;33(9):1455–65. https://doi.org/10.1016/S0031-3203(99)00137-5.
Bandyopadhyay S, Maulik U. Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recogn. 2002;35(6):1197–208. https://doi.org/10.1016/S0031-3203(01)00108-X.
Sung CS, Jin HW. A tabu-search-based heuristic for clustering. Pattern Recogn. 2000;33(5):849–58. https://doi.org/10.1016/S0031-3203(99)00090-4.
Shelokar P, Jayaraman V, Kulkarni B. An ant colony approach for clustering. Anal Chim Acta. 2004;509(2):187–95. https://doi.org/10.1016/j.aca.2003.12.032.
Fathian M, Amiri B. A honeybee-mating approach for cluster analysis. Int J Adv Manuf Technol. 2008;38(7–8):809–21. https://doi.org/10.1007/s00170-007-1132-7.
Niknam T, Olamaei J, Amiri B. A hybrid evolutionary algorithm based on ACO and SA for cluster analysis. J Appl Sci. 2008;8(15):2695–702. https://doi.org/10.3923/jas.2008.2695.2702.
Jarboui B, Cheikh M, Siarry P, Rebai A. Combinatorial particle swarm optimization (CPSO) for partitional clustering problem. Appl Math Comput. 2007;192(2):337–45. https://doi.org/10.1016/j.amc.2007.03.010.
Miranda V, Fonseca N. EPSO—best-of-two-worlds meta-heuristic applied to power system problems. In: Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No.02TH8600) (Vol. 2, pp. 1080–1085). IEEE 2002. https://doi.org/10.1109/CEC.2002.1004393.
Bratton D, Kennedy J. Defining a Standard for Particle Swarm Optimization. In: 2007 IEEE Swarm Intelligence Symposium (pp. 120–127). 2007; IEEE. https://doi.org/10.1109/SIS.2007.368035.
UCL. Dataset. 2002. Retrieved from https://archive.ics.uci.edu/ml/datasets.php.
Funding
The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Advanced Computing and Data Sciences” guest edited by Mayank Singh, Vipin Tyagi and P.K. Gupta.
Rights and permissions
About this article
Cite this article
Hashemi, S.E., Tavana, M. & Bakhshi, M. A New Particle Swarm Optimization Algorithm for Optimizing Big Data Clustering. SN COMPUT. SCI. 3, 311 (2022). https://doi.org/10.1007/s42979-022-01208-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01208-8