A multi-threaded particle swarm optimization-kmeans algorithm based on MapReduce

Wang, Xikang; Wang, Tongxi; Xiang, Hua

doi:10.1007/s10586-024-04456-w

A multi-threaded particle swarm optimization-kmeans algorithm based on MapReduce

Published: 06 April 2024

Volume 27, pages 8031–8044, (2024)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Xikang Wang¹,
Tongxi Wang¹ &
Hua Xiang¹

248 Accesses
1 Citation
Explore all metrics

A Correction to this article was published on 31 May 2024

This article has been updated

Abstract

The particle swarm optimization-K-Means algorithm is proposed by the related researchers to improve the clustering accuracy of the K-Means algorithm. However, the particle swarm optimization-K-Means algorithm brings more burden to the computation, and the computational efficiency is low when dealing with large data sets. To solve this problem, a parallel particle swarm K-Means algorithm based on MapReduce with multi-threading is proposed. The algorithm performs parallel computation by dividing the particle swarm into several equal-sized sub-populations based on the number of available nodes in the cluster and distributing them to each node. It uses a multi-threaded execution in the evaluation stage, which has the highest computational complexity in the evolutionary process. Experiments show that although splitting the population will affect the optimization effect to some extent, the proposed still can effectively optimize the clustering results of the K-Means algorithm, and the computational efficiency is significantly improved compared with serial particle swarm optimization k-means algorithm and MapReduce-based non-multithreaded particle swarm optimization k-means algorithm, in the experiment with the largest dataset and a configuration of 16 nodes, the proposed algorithm is 58 times faster than the serial algorithm. Furthermore, the computing efficiency can be improved in the clusters with more CPU cores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

An Improved K-Means Parallel Algorithm Based on Cloud Computing

A population-based clustering technique using particle swarm optimization and k-means

Article 09 February 2016

Cooperative particle swarm optimization using MapReduce

Article 13 October 2016

Data availability

The datasets employed in the experiments are publicly accessible through the UCI Machine Learning Repository. These datasets are available for non-commercial use and can be found at http://archive.ics.uci.edu/ml/index.php.

Change history

31 May 2024
A Correction to this paper has been published: https://doi.org/10.1007/s10586-024-04572-7

References

MacQueen, J.: Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability 1(14), 281–297 (1967)
MathSciNet Google Scholar
Ahmed, M., Seraj, R., Islam, S.M.: The k-means algorithm: a comprehensive survey and performance evaluation. Electronics 9(8), 1295 (2020)
Article Google Scholar
Arthur D, Vassilvitskii S (2007) k-means++: The advantages of careful seeding. In Soda. Vol. 7, pp. 1027–1035
Rdusseeun LK, Kaufman P: Clustering by means of medoids. In Proceedings of the statistical data analysis based on the L1 norm conference. Vol. 31(1987)
Holland JH: Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press (1992)
Kennedy, J., Eberhart, R.: Particle swarm optimization. In Proceedings of ICNN’95-International Conference on Neural Networks 4, 1942–1948 (1995)
Article Google Scholar
Shami, T.M., El-Saleh, A.A., Alswaitti, M., Al-Tashi, Q., Summakieh, M.A., Mirjalili, S.: Particle swarm optimization: a comprehensive survey. IEEE Access 10, 10031–10061 (2022)
Article Google Scholar
Gad, A.G.: Particle swarm optimization algorithm and its applications: a systematic review. Arch. Computat. Methods Eng. (2022). https://doi.org/10.1007/s11831-021-09694-4
Article MathSciNet Google Scholar
Ahmadyfard A, Modares H: Combining PSO and k-means to enhance data clustering. In 2008 international symposium on telecommunications pp. 688–691(2008).
Zhang, H., Peng, Q.: PSO and K-means-based semantic segmentation toward agricultural products. Futur. Gener. Comput. Syst. 126, 82–87 (2022)
Article Google Scholar
Yuan, Y., Li, Y.: A modified hybrid method based on PSO, GA, and K-means for network anomaly detection. Math. Probl. Eng. (2022). https://doi.org/10.1155/2022/5985426
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Handa, Ma., Xiaoyu, He., Renqing, Ma.: Parallel PSO-kmeans algorithm implementing web log minging based on Hadoop. Compt. Sci. S1, 470–473 (2015)
Google Scholar
Ferrucci, F., Salza, P., Sarro, F.: Using hadoop mapreduce for parallel genetic algorithms: a comparison of the global, grid and island models. Evol. Comput. 26(4), 535–567 (2018)
Article Google Scholar
Papazoglou, G., Biskas, P.: Review and comparison of genetic algorithm and particle swarm optimization in the optimal power flow problem. Energies 16(3), 1152 (2023)
Article Google Scholar
Charilogis, V., Tsoulos, I.G., Tzallas, A.: An improved parallel particle swarm optimization. SN Compt. Sci. 4(6), 766 (2023)
Article Google Scholar
Tripathi, S.L., Mahmud, M.: Explainable machine learning models and architectures. Wiley, Hoboken (2023)
Book Google Scholar
Yang, Y., et al.: Application of multi-objective particle swarm optimization based on short-term memory and K-means clustering in multi-modal multi-objective optimization. Eng. Appl. Artif. Intell. 112, 104866 (2022)
Article Google Scholar
Li, Y., et al.: Customer segmentation using K-means clustering and the adaptive particle swarm optimization algorithm. Appl. Soft Compt. 113, 107924 (2021)
Article Google Scholar
Xiaoqiong, W., Zhang, Y.E.: Image segmentation algorithm based on dynamic particle swarm optimization and K-means clustering. Int. J. Compt. Appl. 42(7), 649–654 (2020)
Google Scholar
Paul, Shouvik, Sourav De, and Sandip Dey.: A novel approach of data clustering using an improved particle swarm optimization based k–means clustering algorithm. 2020 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT). IEEE, (2020).
Sheikhhosseini, Z., et al.: Delineation of potential seismic sources using weighted K-means cluster analysis and particle swarm optimization (PSO). Acta Geophysica 69, 2161–2172 (2021)
Article Google Scholar
Li, J.Y., et al.: Generation-level parallelism for evolutionary computation: a pipeline-based parallel particle swarm optimization. IEEE Transactions on Cybernetics 51(10), 4848–4859 (2020)
Article Google Scholar
Cao, B., et al.: RFID reader anticollision based on distributed parallel particle swarm optimization. IEEE Int. Things J. 8(5), 3099–3107 (2020)
Article Google Scholar
Rodríguez-García, Javier, et al. 2020 Maximizing the profit for industrial customers of providing operation services in electric power systems via a parallel particle swarm optimization algorithm. IEEE Access. 8: 24721–24733.
Kumar, L., Pandey, M., Ahirwal, M.K.: Parallel global best-worst particle swarm optimization algorithm for solving optimization problems. Appl. Soft Compt. 142, 110329 (2023)
Article Google Scholar
Hussain, M.M., Fujimoto, N.: GPU-based parallel multi-objective particle swarm optimization for large swarms and high dimensional problems. Parallel Compt. 92, 102589 (2020)
Article MathSciNet Google Scholar
Mardi M, Keyvanpour MR: GBKM: a new genetic based k-means clustering algorithm. In 2021 7th international conference on web research (ICWR) pp. 222–226 (2021)
Kapil S, Chawla M, Ansari MD: On K-means data clustering algorithm with genetic algorithm. In2016 Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC) pp. 202–206(2016)
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Statis.-Theory and Methods 3(1), 1–27 (1974)
Article MathSciNet Google Scholar
Shvachko K, Kuang H, Radia S, Chansler R: The hadoop distributed file system. In2010 IEEE 26th symposium on mass storage systems and technologies (MSST) pp. 1–10 (2010)
Usman, S., Mehmood, R., Katib, I., Albeshri, A.: Data locality in high performance computing, big data, and converged systems: an analysis of the cutting edge and a future system architecture. Electronics 12(1), 53 (2022)
Article Google Scholar
Arfat, Y., Usman, S., Mehmood, R., Katib, I.: Big data for smart infrastructure design: Opportunities and challenges. In: Mehmood, Rashid, See, Simon, Katib, Iyad, Chlamtac, Imrich (eds.) Smart Infrastructure and Applications: Foundations for Smarter Cities and Societies. Springer, Cham (2020)
Google Scholar
Lea D. A java fork/join framework. InProceedings of the ACM 2000 conference on Java Grande. pp 36–43 (2000)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article Google Scholar
Davies DL, Bouldin DW: A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence. 224–7(1979)
Shi, Guolong, et al.: DANTD: A deep abnormal network traffic detection model for security of industrial internet of things using high-order features. IEEE Internet of Things Journal pp. 21143–21153 (2023)
Shi, Guolong, et al.: Multipath Interference Analysis for Low-power RFID-Sensor under metal medium envi-ronment. IEEE Sensors Journal pp. 20561–20569 (2023)
Shi, Guolong, et al.: Passive Wireless Detection for Ammonia Based on 2.4 GHz Square Carbon Nanotube-Loaded Chipless RFID-Inspired Tag. IEEE Transac-tions on Instrumentation and Measurement pp. 1–12 (2023)
Unhelkar, B., et al.: Enhancing supply chain performance using RFID technology and decision support systems in the industry 4.0–A systematic literature review. Int. J. Inf. Manag. Data Insights 2, 100084 (2022)
Google Scholar
Kaiwartya, O., et al.: Virtualization in wireless sensor networks: Fault tolerant embedding for internet of things. IEEE Internet Things J. 2, 571–580 (2017)
Google Scholar
Trivedi, V., Prakash, S., Ramteke, M.: Optimized on-line control of MMA polymerization using fast multi-objective DE. Mater. Manuf. Process. 32(10), 1144–1151 (2017)
Article Google Scholar
Kalia, K., Gupta, N.: Analysis of hadoop MapReduce scheduling in heterogeneous environment. Ain Shams Eng. J. 1, 1101–1110 (2021)
Article Google Scholar

Download references

Funding

This work is partially supported by the National Natural Science Foundation of China (62276032).

Author information

Authors and Affiliations

School of Computer Science, Yangtze University, Jingzhou, Hubei, China
Xikang Wang, Tongxi Wang & Hua Xiang

Authors

Xikang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tongxi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hua Xiang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The authors confirm contribution to the paper as follows: study conception and design: Xikang Wang, Tongxi Wang; data collection: Hua Xiang; analysis and interpretation of results: Xikang Wang; draft manuscript preparation: Xikang Wang. Tongxi Wang. All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Tongxi Wang.

Ethics declarations

Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: Section headings in the article were formatted incorrectly, the section headings are formatted correctly now.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, X., Wang, T. & Xiang, H. A multi-threaded particle swarm optimization-kmeans algorithm based on MapReduce. Cluster Comput 27, 8031–8044 (2024). https://doi.org/10.1007/s10586-024-04456-w

Download citation

Received: 05 February 2024
Revised: 12 March 2024
Accepted: 19 March 2024
Published: 06 April 2024
Issue Date: September 2024
DOI: https://doi.org/10.1007/s10586-024-04456-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multi-threaded particle swarm optimization-kmeans algorithm based on MapReduce

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Improved K-Means Parallel Algorithm Based on Cloud Computing

A population-based clustering technique using particle swarm optimization and k-means

Cooperative particle swarm optimization using MapReduce

Data availability

Change history

31 May 2024

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A multi-threaded particle swarm optimization-kmeans algorithm based on MapReduce

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Improved K-Means Parallel Algorithm Based on Cloud Computing

A population-based clustering technique using particle swarm optimization and k-means

Cooperative particle swarm optimization using MapReduce

Data availability

Change history

31 May 2024

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation