Skip to main content

Advertisement

Log in

A multi-threaded particle swarm optimization-kmeans algorithm based on MapReduce

  • Published:
Cluster Computing Aims and scope Submit manuscript

A Correction to this article was published on 31 May 2024

This article has been updated

Abstract

The particle swarm optimization-K-Means algorithm is proposed by the related researchers to improve the clustering accuracy of the K-Means algorithm. However, the particle swarm optimization-K-Means algorithm brings more burden to the computation, and the computational efficiency is low when dealing with large data sets. To solve this problem, a parallel particle swarm K-Means algorithm based on MapReduce with multi-threading is proposed. The algorithm performs parallel computation by dividing the particle swarm into several equal-sized sub-populations based on the number of available nodes in the cluster and distributing them to each node. It uses a multi-threaded execution in the evaluation stage, which has the highest computational complexity in the evolutionary process. Experiments show that although splitting the population will affect the optimization effect to some extent, the proposed still can effectively optimize the clustering results of the K-Means algorithm, and the computational efficiency is significantly improved compared with serial particle swarm optimization k-means algorithm and MapReduce-based non-multithreaded particle swarm optimization k-means algorithm, in the experiment with the largest dataset and a configuration of 16 nodes, the proposed algorithm is 58 times faster than the serial algorithm. Furthermore, the computing efficiency can be improved in the clusters with more CPU cores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The datasets employed in the experiments are publicly accessible through the UCI Machine Learning Repository. These datasets are available for non-commercial use and can be found at http://archive.ics.uci.edu/ml/index.php.

Change history

References

  1. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability 1(14), 281–297 (1967)

    MathSciNet  Google Scholar 

  2. Ahmed, M., Seraj, R., Islam, S.M.: The k-means algorithm: a comprehensive survey and performance evaluation. Electronics 9(8), 1295 (2020)

    Article  Google Scholar 

  3. Arthur D, Vassilvitskii S (2007) k-means++: The advantages of careful seeding. In Soda. Vol. 7, pp. 1027–1035

  4. Rdusseeun LK, Kaufman P: Clustering by means of medoids. In Proceedings of the statistical data analysis based on the L1 norm conference. Vol. 31(1987)

  5. Holland JH: Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press (1992)

  6. Kennedy, J., Eberhart, R.: Particle swarm optimization. In Proceedings of ICNN’95-International Conference on Neural Networks 4, 1942–1948 (1995)

    Article  Google Scholar 

  7. Shami, T.M., El-Saleh, A.A., Alswaitti, M., Al-Tashi, Q., Summakieh, M.A., Mirjalili, S.: Particle swarm optimization: a comprehensive survey. IEEE Access 10, 10031–10061 (2022)

    Article  Google Scholar 

  8. Gad, A.G.: Particle swarm optimization algorithm and its applications: a systematic review. Arch. Computat. Methods Eng. (2022). https://doi.org/10.1007/s11831-021-09694-4

    Article  MathSciNet  Google Scholar 

  9. Ahmadyfard A, Modares H: Combining PSO and k-means to enhance data clustering. In 2008 international symposium on telecommunications pp. 688–691(2008).

  10. Zhang, H., Peng, Q.: PSO and K-means-based semantic segmentation toward agricultural products. Futur. Gener. Comput. Syst. 126, 82–87 (2022)

    Article  Google Scholar 

  11. Yuan, Y., Li, Y.: A modified hybrid method based on PSO, GA, and K-means for network anomaly detection. Math. Probl. Eng. (2022). https://doi.org/10.1155/2022/5985426

    Article  Google Scholar 

  12. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  13. Handa, Ma., Xiaoyu, He., Renqing, Ma.: Parallel PSO-kmeans algorithm implementing web log minging based on Hadoop. Compt. Sci. S1, 470–473 (2015)

    Google Scholar 

  14. Ferrucci, F., Salza, P., Sarro, F.: Using hadoop mapreduce for parallel genetic algorithms: a comparison of the global, grid and island models. Evol. Comput. 26(4), 535–567 (2018)

    Article  Google Scholar 

  15. Papazoglou, G., Biskas, P.: Review and comparison of genetic algorithm and particle swarm optimization in the optimal power flow problem. Energies 16(3), 1152 (2023)

    Article  Google Scholar 

  16. Charilogis, V., Tsoulos, I.G., Tzallas, A.: An improved parallel particle swarm optimization. SN Compt. Sci. 4(6), 766 (2023)

    Article  Google Scholar 

  17. Tripathi, S.L., Mahmud, M.: Explainable machine learning models and architectures. Wiley, Hoboken (2023)

    Book  Google Scholar 

  18. Yang, Y., et al.: Application of multi-objective particle swarm optimization based on short-term memory and K-means clustering in multi-modal multi-objective optimization. Eng. Appl. Artif. Intell. 112, 104866 (2022)

    Article  Google Scholar 

  19. Li, Y., et al.: Customer segmentation using K-means clustering and the adaptive particle swarm optimization algorithm. Appl. Soft Compt. 113, 107924 (2021)

    Article  Google Scholar 

  20. Xiaoqiong, W., Zhang, Y.E.: Image segmentation algorithm based on dynamic particle swarm optimization and K-means clustering. Int. J. Compt. Appl. 42(7), 649–654 (2020)

    Google Scholar 

  21. Paul, Shouvik, Sourav De, and Sandip Dey.: A novel approach of data clustering using an improved particle swarm optimization based k–means clustering algorithm. 2020 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT). IEEE, (2020).

  22. Sheikhhosseini, Z., et al.: Delineation of potential seismic sources using weighted K-means cluster analysis and particle swarm optimization (PSO). Acta Geophysica 69, 2161–2172 (2021)

    Article  Google Scholar 

  23. Li, J.Y., et al.: Generation-level parallelism for evolutionary computation: a pipeline-based parallel particle swarm optimization. IEEE Transactions on Cybernetics 51(10), 4848–4859 (2020)

    Article  Google Scholar 

  24. Cao, B., et al.: RFID reader anticollision based on distributed parallel particle swarm optimization. IEEE Int. Things J. 8(5), 3099–3107 (2020)

    Article  Google Scholar 

  25. Rodríguez-García, Javier, et al. 2020 Maximizing the profit for industrial customers of providing operation services in electric power systems via a parallel particle swarm optimization algorithm. IEEE Access. 8: 24721–24733.

  26. Kumar, L., Pandey, M., Ahirwal, M.K.: Parallel global best-worst particle swarm optimization algorithm for solving optimization problems. Appl. Soft Compt. 142, 110329 (2023)

    Article  Google Scholar 

  27. Hussain, M.M., Fujimoto, N.: GPU-based parallel multi-objective particle swarm optimization for large swarms and high dimensional problems. Parallel Compt. 92, 102589 (2020)

    Article  MathSciNet  Google Scholar 

  28. Mardi M, Keyvanpour MR: GBKM: a new genetic based k-means clustering algorithm. In 2021 7th international conference on web research (ICWR) pp. 222–226 (2021)

  29. Kapil S, Chawla M, Ansari MD: On K-means data clustering algorithm with genetic algorithm. In2016 Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC) pp. 202–206(2016)

  30. Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Statis.-Theory and Methods 3(1), 1–27 (1974)

    Article  MathSciNet  Google Scholar 

  31. Shvachko K, Kuang H, Radia S, Chansler R: The hadoop distributed file system. In2010 IEEE 26th symposium on mass storage systems and technologies (MSST) pp. 1–10 (2010)

  32. Usman, S., Mehmood, R., Katib, I., Albeshri, A.: Data locality in high performance computing, big data, and converged systems: an analysis of the cutting edge and a future system architecture. Electronics 12(1), 53 (2022)

    Article  Google Scholar 

  33. Arfat, Y., Usman, S., Mehmood, R., Katib, I.: Big data for smart infrastructure design: Opportunities and challenges. In: Mehmood, Rashid, See, Simon, Katib, Iyad, Chlamtac, Imrich (eds.) Smart Infrastructure and Applications: Foundations for Smarter Cities and Societies. Springer, Cham (2020)

    Google Scholar 

  34. Lea D. A java fork/join framework. InProceedings of the ACM 2000 conference on Java Grande. pp 36–43 (2000)

  35. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Article  Google Scholar 

  36. Davies DL, Bouldin DW: A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence. 224–7(1979)

  37. Shi, Guolong, et al.: DANTD: A deep abnormal network traffic detection model for security of industrial internet of things using high-order features. IEEE Internet of Things Journal pp. 21143–21153 (2023)

  38. Shi, Guolong, et al.: Multipath Interference Analysis for Low-power RFID-Sensor under metal medium envi-ronment. IEEE Sensors Journal pp. 20561–20569 (2023)

  39. Shi, Guolong, et al.: Passive Wireless Detection for Ammonia Based on 2.4 GHz Square Carbon Nanotube-Loaded Chipless RFID-Inspired Tag. IEEE Transac-tions on Instrumentation and Measurement pp. 1–12 (2023)

  40. Unhelkar, B., et al.: Enhancing supply chain performance using RFID technology and decision support systems in the industry 4.0–A systematic literature review. Int. J. Inf. Manag. Data Insights 2, 100084 (2022)

    Google Scholar 

  41. Kaiwartya, O., et al.: Virtualization in wireless sensor networks: Fault tolerant embedding for internet of things. IEEE Internet Things J. 2, 571–580 (2017)

    Google Scholar 

  42. Trivedi, V., Prakash, S., Ramteke, M.: Optimized on-line control of MMA polymerization using fast multi-objective DE. Mater. Manuf. Process. 32(10), 1144–1151 (2017)

    Article  Google Scholar 

  43. Kalia, K., Gupta, N.: Analysis of hadoop MapReduce scheduling in heterogeneous environment. Ain Shams Eng. J. 1, 1101–1110 (2021)

    Article  Google Scholar 

Download references

Funding

This work is partially supported by the National Natural Science Foundation of China (62276032).

Author information

Authors and Affiliations

Authors

Contributions

The authors confirm contribution to the paper as follows: study conception and design: Xikang Wang, Tongxi Wang; data collection: Hua Xiang; analysis and interpretation of results: Xikang Wang; draft manuscript preparation: Xikang Wang. Tongxi Wang. All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Tongxi Wang.

Ethics declarations

Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: Section headings in the article were formatted incorrectly, the section headings are formatted correctly now.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Wang, T. & Xiang, H. A multi-threaded particle swarm optimization-kmeans algorithm based on MapReduce. Cluster Comput 27, 8031–8044 (2024). https://doi.org/10.1007/s10586-024-04456-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-024-04456-w

Keywords

Navigation