Abstract
Data clustering methods are crucial in various data analysis and machine learning fields. They are essential for organizing and understanding complex datasets by grouping similar (according to the assumed distance function) data. These methods facilitate customer segmentation, anomaly detection, pattern recognition and many other practical problems. Various clustering algorithms can be found in the literature, but the most popular method is the k-means algorithm and its subsequent modifications. This work presents a new method of initializing the k-means algorithm, which is tested on random and benchmark datasets. This paper discusses the importance of selecting the appropriate initialization of the k-means clustering method, demonstrating the effectiveness and superiority of the developed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aeberhard, S., Forina, M.: Wine. UCI Machine Learning Repository (1991)
Al-Daoud, M.: A new initialization approach for k-means algorithm. Ubiquitous Comput. Commun. J. 2(3), 1–8 (2005)
Arthur, D., Vassilvitskii, S., et al.: K-means++: the advantages of careful seeding. In: Soda, vol. 7, pp. 1027–1035 (2007)
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable K-means++. In: Proceedings of the VLDB Endowment, vol. 5, pp. 622–633. VLDB Endowment (2012)
Bradley, P.S., Fayyad, U.M.: Refining initial points for k-means clustering. In: Proceedings of the Fifteenth International Conference on Machine Learning, vol. 66, pp. 91–99. Citeseer (1998)
Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 40(1), 200–210 (2013)
Ding, Y., Zhao, Y., Shen, X., Musuvathi, M., Mytkowicz, T.: Yinyang k-means: a drop-in replacement of the classic k-means with consistent speedup, pp. 579–587, July 2015
Fisher, R.A.: Iris. UCI Machine Learning Repository (1988)
Franti, P., Mariescu-Istodor, R., Zhong, C.: K-means properties on six clustering benchmark datasets. Appl. Intell. 48(12), 4743–4759 (2018)
Fritzke, B.: Breathing k-means. arXiv preprint arXiv:2006.15666, 2020
Inaba, M., Katoh, N., Imai, H.: Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering. In: Proceedings of the Tenth Annual Symposium on Computational Geometry, pp. 332–339 (1994)
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)
Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for k-means clustering. Pattern Recognit. Lett. 25(11), 1293–1302 (2004)
Krzanowski, W.: Principles of Multivariate Analysis, vol. 23. OUP Oxford, Oxford (2000)
Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recognit. 36(2), 451–461 (2003)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations, vol. 1, no. 14, pp. 281–297 (1967)
Pelleg, D., Moore, A.W., et al.: X-means: extending k-means with efficient estimation of the number of clusters, vol. 1, pp. 727–734 (2000)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic, vol. 63, pp. 411–423. Wiley Online Library (2001)
Wolberg, W.: Breast Cancer Wisconsin (Original). UCI Machine Learning Repository (1992)
Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. 2, 165–193 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Brejna, B., Pietranik, M., Kozierkiewicz, A. (2024). The New K-Means Initialization Method. In: Nguyen, N.T., et al. Computational Collective Intelligence. ICCCI 2024. Lecture Notes in Computer Science(), vol 14810. Springer, Cham. https://doi.org/10.1007/978-3-031-70816-9_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-70816-9_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70815-2
Online ISBN: 978-3-031-70816-9
eBook Packages: Computer ScienceComputer Science (R0)