Skip to main content
Log in

A New Particle Swarm Optimization Algorithm for Optimizing Big Data Clustering

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Clustering is an ideal tool for working with big data and searching for structures in the data set. Clustering aims at maximizing the similarity between the data within a cluster and minimizing the similarity between the data between different clusters. This study presents a new and improved Particle Swarm Optimization (PSO) algorithm using pattern reduction and reducing the clustering calculation time with Multistart Pattern Reduction-Enhanced PSO (MPREPSO). This method adds two pattern reduction operators and multistart operators into the PSO algorithms. The goal of the pattern reduction operator is to reduce the computational time from the compression of static patterns. The purpose of the multistart operator is to avoid falling into the local optimal by enforcing diversity in the population. Two pattern reduction and multistart operators are combined with the PSO algorithm to evaluate the performance of this method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Cheng S, Zhang Q, Qin Q. Big data analytics with swarm intelligence. Ind Manag Data Syst. 2016;116(4):646–66. https://doi.org/10.1108/IMDS-06-2015-0222.

    Article  Google Scholar 

  2. Verma H, Verma D, Tiwari PK. A population based hybrid FCM-PSO algorithm for clustering analysis and segmentation of brain image. Expert Syst Appl. 2021;167: 114121. https://doi.org/10.1016/j.eswa.2020.114121.

    Article  Google Scholar 

  3. Zhang C, Ouyang D, Ning J. An artificial bee colony approach for clustering. Expert Syst Appl. 2010;37(7):4761–7. https://doi.org/10.1016/j.eswa.2009.11.003.

    Article  Google Scholar 

  4. Kuo RJ, Wang MJ, Huang TW. An application of particle swarm optimization algorithm to clustering analysis. Soft Comput. 2011;15(3):533–42. https://doi.org/10.1007/s00500-009-0539-5.

    Article  Google Scholar 

  5. Tsai C-W, Huang K-W, Yang C-S, Chiang M-C. A fast particle swarm optimization for clustering. Soft Comput. 2015;19(2):321–38. https://doi.org/10.1007/s00500-014-1255-3.

    Article  Google Scholar 

  6. Kogan J. Introduction to clustering large and high-dimensional data. Cambridge: Cambridge University Press; 2007.

    MATH  Google Scholar 

  7. Bagirov AM, Ugon J, Webb D. Fast modified global k-means algorithm for incremental cluster construction. Pattern Recogn. 2011;44(4):866–76. https://doi.org/10.1016/j.patcog.2010.10.018.

    Article  MATH  Google Scholar 

  8. Xu R, WunschII D. Survey of clustering algorithms. IEEE Trans Neural Netw. 2005;16(3):645–78. https://doi.org/10.1109/TNN.2005.845141.

    Article  Google Scholar 

  9. Lai JZC, Huang T-J, Liaw Y-C. A fast -means clustering algorithm using cluster center displacement. Pattern Recogn. 2009;42(11):2551–6. https://doi.org/10.1016/j.patcog.2009.02.014.

    Article  MATH  Google Scholar 

  10. Chiang M-C, Tsai C-W, Yang C-S. A time-efficient pattern reduction algorithm for k-means clustering. Inf Sci. 2011;181(4):716–31. https://doi.org/10.1016/j.ins.2010.10.008.

    Article  Google Scholar 

  11. van der Merwe DW, Engelbrecht AP. Data clustering using particle swarm optimization. In: The 2003 Congress on Evolutionary Computation, 2003. CEC ’03. (Vol. 1, pp. 215–220). IEEE. https://doi.org/10.1109/CEC.2003.1299577

  12. Paterlini S, Krink T. Differential evolution and particle swarm optimization in partitional clustering. Comput Stat Data Anal. 2006;50(5):1220–47. https://doi.org/10.1016/j.csda.2004.12.004.

    Article  MATH  Google Scholar 

  13. Parsopoulos KE, Vrahatis MN. Particle swarm optimization and intelligence: advances and applications: advances and applications. Chennai: IGI Global; 2010.

    Google Scholar 

  14. Su S, Zhao S. An optimal clustering mechanism based on Fuzzy-C means for wireless sensor networks. Sustain Comput Inform Syst. 2018;18:127–34. https://doi.org/10.1016/j.suscom.2017.08.001.

    Article  Google Scholar 

  15. Ripan RC, Sarker IH, Hossain SMM, Anwar MM, Nowrozy R, Hoque MM, Furhad MH. A data-driven heart disease prediction model through K-means clustering-based anomaly detection. SN Comput Sci. 2021;2(2):112. https://doi.org/10.1007/s42979-021-00518-7.

    Article  Google Scholar 

  16. Kaur A, Kaur R, Jagdev G. Analyzing and exploring the impact of big data analytics in sports sector. SN Comput Sci. 2021;2(3):184. https://doi.org/10.1007/s42979-021-00575-y.

    Article  Google Scholar 

  17. Sharma M, Chhabra JK. Sustainable automatic data clustering using hybrid PSO algorithm with mutation. Sustain Comput Inform Syst. 2019;23:144–57. https://doi.org/10.1016/j.suscom.2019.07.009.

    Article  Google Scholar 

  18. Su Z, Wang P, Shen J, Li Y, Zhang Y, Hu E. Automatic fuzzy partitioning approach using variable string length artificial bee colony (VABC) algorithm. Appl Soft Comput. 2012;12(11):3421–41. https://doi.org/10.1016/j.asoc.2012.06.019.

    Article  Google Scholar 

  19. Mitra S, Banka H. Multi-objective evolutionary biclustering of gene expression data. Pattern Recogn. 2006;39(12):2464–77. https://doi.org/10.1016/j.patcog.2006.03.003.

    Article  MATH  Google Scholar 

  20. Jain AK, Duin PW, Mao Jianchang. Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell. 2000;22(1):4–37. https://doi.org/10.1109/34.824819.

    Article  Google Scholar 

  21. Reddy CK. Data Clustering. In: Aggarwal CC, Reddy CK (eds). Chapman and Hall/CRC 2018. https://doi.org/10.1201/9781315373515

  22. Kuo RJ, Potti Y, Zulvia FE. Application of metaheuristic based fuzzy K-modes algorithm to supplier clustering. Comput Ind Eng. 2018;120:298–307. https://doi.org/10.1016/j.cie.2018.04.050.

    Article  Google Scholar 

  23. Baskar A. Clustering of Indian districts based on supply chain requirements. Mater Today Proc. 2021;46:9914–9. https://doi.org/10.1016/j.matpr.2021.02.292.

    Article  Google Scholar 

  24. Allen TT, Sui Z, Parker NL. Timely decision analysis enabled by efficient social media modeling. Decis Anal. 2017;14(4):250–60. https://doi.org/10.1287/deca.2017.0360.

    Article  MathSciNet  MATH  Google Scholar 

  25. Rose RL, Puranik TG, Mavris DN. Natural language processing based method for clustering and analysis of aviation safety narratives. Aerospace. 2020;7(10):143. https://doi.org/10.3390/aerospace7100143.

    Article  Google Scholar 

  26. Tang R, Fong S. Clustering big IoT data by metaheuristic optimized mini-batch and parallel partition-based DGC in Hadoop. Futur Gener Comput Syst. 2018;86:1395–412. https://doi.org/10.1016/j.future.2018.03.006.

    Article  Google Scholar 

  27. Masoudi-Sobhanzadeh Y, Jafari B, Parvizpour S, Pourseif MM, Omidi Y. A novel multi-objective metaheuristic algorithm for protein-peptide docking and benchmarking on the LEADS-PEP dataset. Comput Biol Med. 2021;138: 104896. https://doi.org/10.1016/j.compbiomed.2021.104896.

    Article  Google Scholar 

  28. Kraus JM, Kestler HA. A highly efficient multi-core algorithm for clustering extremely large datasets. BMC Bioinform. 2010;11(1):169. https://doi.org/10.1186/1471-2105-11-169.

    Article  Google Scholar 

  29. Kuo RJ, Zheng YR, Nguyen TPQ. Metaheuristic-based possibilistic fuzzy k-modes algorithms for categorical data clustering. Inf Sci. 2021;557:1–15. https://doi.org/10.1016/j.ins.2020.12.051.

    Article  MathSciNet  MATH  Google Scholar 

  30. Kennedy J, Eberhart R. Particle swarm optimization. In: Proceedings of ICNN’95—International Conference on Neural Networks (Vol. 4, pp. 1942–1948). 1995. IEEE. https://doi.org/10.1109/ICNN.1995.488968

  31. Assareh E, Behrang MA, Assari MR, Ghanbarzadeh A. Application of PSO (particle swarm optimization) and GA (genetic algorithm) techniques on demand estimation of oil in Iran. Energy. 2010;35(12):5223–9. https://doi.org/10.1016/j.energy.2010.07.043.

    Article  Google Scholar 

  32. Zhu Z, Zhou J, Ji Z, Shi Y-H. DNA sequence compression using adaptive particle swarm optimization-based memetic algorithm. IEEE Trans Evol Comput. 2011;15(5):643–58. https://doi.org/10.1109/TEVC.2011.2160399.

    Article  Google Scholar 

  33. Alam S, Dobbie G, Koh YS, Riddle P, Ur Rehman S. Research on particle swarm optimization based clustering: a systematic review of literature and techniques. Swarm Evol Comput. 2014;17:1–13. https://doi.org/10.1016/j.swevo.2014.02.001.

    Article  Google Scholar 

  34. Akbar S, Pardasani KR, Panda NR. PSO based neuro-fuzzy model for secondary structure prediction of protein. Neural Process Lett. 2021;53(6):4593–612. https://doi.org/10.1007/s11063-021-10615-6.

    Article  Google Scholar 

  35. Alswaitti M, Albughdadi M, Isa NAM. Density-based particle swarm optimization algorithm for data clustering. Expert Syst Appl. 2018;91:170–86. https://doi.org/10.1016/j.eswa.2017.08.050.

    Article  Google Scholar 

  36. Rengasamy S, Murugesan P. PSO based data clustering with a different perception. Swarm Evol Comput. 2021;64: 100895. https://doi.org/10.1016/j.swevo.2021.100895.

    Article  Google Scholar 

  37. Malarvizhi K, Amshakala K. Data clustering using hybrid of feature linkage weight based feature reduction and particle Swarm optimization. Mater Today Proc. 2021. https://doi.org/10.1016/j.matpr.2021.01.514.

    Article  Google Scholar 

  38. Tarkhaneh O, Isazadeh A, Khamnei HJ. A new hybrid strategy for data clustering using cuckoo search based on Mantegna levy distribution, PSO and k-means. Int J Comput Appl Technol. 2018;58(2):137–49. https://doi.org/10.1504/IJCAT.2018.094576.

    Article  Google Scholar 

  39. Liu B, Li J, Lin W, Bai W, Li P, Gao Q. K-PSO: an improved PSO-based container scheduling algorithm for big data applications. Int J Netw Manag. 2021;31(2): e2092. https://doi.org/10.1002/nem.2092.

    Article  Google Scholar 

  40. Omran MG, Engelbrecht AP, Salman A. Image classification using particle swarm optimization. In: Recent advances in simulated evolution and learning. Chennai: World Scientific; 2004. p. 347–65.

    Chapter  Google Scholar 

  41. Alguliyev RM, Aliguliyev RM, Sukhostat LV. Parallel batch k-means for Big data clustering. Comput Ind Eng. 2021;152: 107023. https://doi.org/10.1016/j.cie.2020.107023.

    Article  Google Scholar 

  42. Hatamlou A, Abdullah S, Nezamabadi-pour H. A combined approach for clustering based on K-means and gravitational search algorithms. Swarm Evol Comput. 2012;6:47–52. https://doi.org/10.1016/j.swevo.2012.02.003.

    Article  Google Scholar 

  43. Niknam T, Taherian Fard E, Pourjafarian N, Rousta A. An efficient hybrid algorithm based on modified imperialist competitive algorithm and K-means for data clustering. Eng Appl Artif Intell. 2011;24(2):306–17. https://doi.org/10.1016/j.engappai.2010.10.001.

    Article  Google Scholar 

  44. Rana S, Jasola S, Kumar R. A review on particle swarm optimization algorithms and their applications to data clustering. Artif Intell Rev. 2011;35(3):211–22. https://doi.org/10.1007/s10462-010-9191-9.

    Article  Google Scholar 

  45. Silva Filho TM, Pimentel BA, Souza RMCR, Oliveira ALI. Hybrid methods for fuzzy clustering based on fuzzy c-means and improved particle swarm optimization. Expert Syst Appl. 2015;42(17–18):6315–28. https://doi.org/10.1016/j.eswa.2015.04.032.

    Article  Google Scholar 

  46. Črepinšek M, Liu S-H, Mernik M. Exploration and exploitation in evolutionary algorithms. ACM Comput Surv. 2013;45(3):1–33. https://doi.org/10.1145/2480741.2480752.

    Article  MATH  Google Scholar 

  47. Lee YL, El-Saleh AA, Ismail M. Gravity-based particle swarm optimization with hybrid cooperative swarm approach for global optimization. J Intell Fuzzy Syst. 2014;26(1):465–81. https://doi.org/10.3233/IFS-130872.

    Article  MathSciNet  MATH  Google Scholar 

  48. Pei S, Tong L. Gaussian kernel particle swarm optimization clustering algorithm. In: 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), 2016; 198–204. https://doi.org/10.1109/FSKD.2016.7603174

  49. Elkan C. Using the triangle inequality to accelerate k-means. In: Proceedings of the 20th international conference on Machine Learning (ICML-03) 2003; pp. 147–153.

  50. Lu Y, Lu S, Fotouhi F, Deng Y, Brown SJ. FGKA. In: Proceedings of the 2004 ACM symposium on Applied computing—SAC ’04 (p. 622). New York, New York, USA: ACM Press 2004. https://doi.org/10.1145/967900.968029

  51. Amiri B, Hossain L, Mosavi SE. Application of harmony search algorithm on clustering. In: Proceedings of the world congress on engineering and computer science (Vol. 1, pp. 20–22) 2010.

  52. Maulik U, Bandyopadhyay S. Genetic algorithm-based clustering technique. Pattern Recogn. 2000;33(9):1455–65. https://doi.org/10.1016/S0031-3203(99)00137-5.

    Article  Google Scholar 

  53. Bandyopadhyay S, Maulik U. Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recogn. 2002;35(6):1197–208. https://doi.org/10.1016/S0031-3203(01)00108-X.

    Article  MATH  Google Scholar 

  54. Sung CS, Jin HW. A tabu-search-based heuristic for clustering. Pattern Recogn. 2000;33(5):849–58. https://doi.org/10.1016/S0031-3203(99)00090-4.

    Article  Google Scholar 

  55. Shelokar P, Jayaraman V, Kulkarni B. An ant colony approach for clustering. Anal Chim Acta. 2004;509(2):187–95. https://doi.org/10.1016/j.aca.2003.12.032.

    Article  Google Scholar 

  56. Fathian M, Amiri B. A honeybee-mating approach for cluster analysis. Int J Adv Manuf Technol. 2008;38(7–8):809–21. https://doi.org/10.1007/s00170-007-1132-7.

    Article  Google Scholar 

  57. Niknam T, Olamaei J, Amiri B. A hybrid evolutionary algorithm based on ACO and SA for cluster analysis. J Appl Sci. 2008;8(15):2695–702. https://doi.org/10.3923/jas.2008.2695.2702.

    Article  Google Scholar 

  58. Jarboui B, Cheikh M, Siarry P, Rebai A. Combinatorial particle swarm optimization (CPSO) for partitional clustering problem. Appl Math Comput. 2007;192(2):337–45. https://doi.org/10.1016/j.amc.2007.03.010.

    Article  MathSciNet  MATH  Google Scholar 

  59. Miranda V, Fonseca N. EPSO—best-of-two-worlds meta-heuristic applied to power system problems. In: Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No.02TH8600) (Vol. 2, pp. 1080–1085). IEEE 2002. https://doi.org/10.1109/CEC.2002.1004393.

  60. Bratton D, Kennedy J. Defining a Standard for Particle Swarm Optimization. In: 2007 IEEE Swarm Intelligence Symposium (pp. 120–127). 2007; IEEE. https://doi.org/10.1109/SIS.2007.368035.

  61. UCL. Dataset. 2002. Retrieved from https://archive.ics.uci.edu/ml/datasets.php.

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seyed Emadedin Hashemi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advanced Computing and Data Sciences” guest edited by Mayank Singh, Vipin Tyagi and P.K. Gupta.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hashemi, S.E., Tavana, M. & Bakhshi, M. A New Particle Swarm Optimization Algorithm for Optimizing Big Data Clustering. SN COMPUT. SCI. 3, 311 (2022). https://doi.org/10.1007/s42979-022-01208-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-022-01208-8

Keywords

Navigation