The New K-Means Initialization Method

Brejna, Bartosz; Pietranik, Marcin; Kozierkiewicz, Adrianna

doi:10.1007/978-3-031-70816-9_29

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14810))

Included in the following conference series:

International Conference on Computational Collective Intelligence

382 Accesses

Abstract

Data clustering methods are crucial in various data analysis and machine learning fields. They are essential for organizing and understanding complex datasets by grouping similar (according to the assumed distance function) data. These methods facilitate customer segmentation, anomaly detection, pattern recognition and many other practical problems. Various clustering algorithms can be found in the literature, but the most popular method is the k-means algorithm and its subsequent modifications. This work presents a new method of initializing the k-means algorithm, which is tested on random and benchmark datasets. This paper discusses the importance of selecting the appropriate initialization of the k-means clustering method, demonstrating the effectiveness and superiority of the developed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Favoring the k-Means Algorithm with Initialization Methods

Improved K-Means Algorithm for Optimizing Initial Centers

DISCERN: diversity-based selection of centroids for k-estimation and rapid non-stochastic clustering

Article 21 September 2020

References

Aeberhard, S., Forina, M.: Wine. UCI Machine Learning Repository (1991)
Google Scholar
Al-Daoud, M.: A new initialization approach for k-means algorithm. Ubiquitous Comput. Commun. J. 2(3), 1–8 (2005)
Google Scholar
Arthur, D., Vassilvitskii, S., et al.: K-means++: the advantages of careful seeding. In: Soda, vol. 7, pp. 1027–1035 (2007)
Google Scholar
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable K-means++. In: Proceedings of the VLDB Endowment, vol. 5, pp. 622–633. VLDB Endowment (2012)
Google Scholar
Bradley, P.S., Fayyad, U.M.: Refining initial points for k-means clustering. In: Proceedings of the Fifteenth International Conference on Machine Learning, vol. 66, pp. 91–99. Citeseer (1998)
Google Scholar
Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 40(1), 200–210 (2013)
Article Google Scholar
Ding, Y., Zhao, Y., Shen, X., Musuvathi, M., Mytkowicz, T.: Yinyang k-means: a drop-in replacement of the classic k-means with consistent speedup, pp. 579–587, July 2015
Google Scholar
Fisher, R.A.: Iris. UCI Machine Learning Repository (1988)
Google Scholar
Franti, P., Mariescu-Istodor, R., Zhong, C.: K-means properties on six clustering benchmark datasets. Appl. Intell. 48(12), 4743–4759 (2018)
Article Google Scholar
Fritzke, B.: Breathing k-means. arXiv preprint arXiv:2006.15666, 2020
Inaba, M., Katoh, N., Imai, H.: Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering. In: Proceedings of the Tenth Annual Symposium on Computational Geometry, pp. 332–339 (1994)
Google Scholar
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)
Article Google Scholar
Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for k-means clustering. Pattern Recognit. Lett. 25(11), 1293–1302 (2004)
Article Google Scholar
Krzanowski, W.: Principles of Multivariate Analysis, vol. 23. OUP Oxford, Oxford (2000)
Google Scholar
Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recognit. 36(2), 451–461 (2003)
Article Google Scholar
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MathSciNet Google Scholar
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations, vol. 1, no. 14, pp. 281–297 (1967)
Google Scholar
Pelleg, D., Moore, A.W., et al.: X-means: extending k-means with efficient estimation of the number of clusters, vol. 1, pp. 727–734 (2000)
Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic, vol. 63, pp. 411–423. Wiley Online Library (2001)
Google Scholar
Wolberg, W.: Breast Cancer Wisconsin (Original). UCI Machine Learning Repository (1992)
Google Scholar
Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. 2, 165–193 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information and Communication Technology, Wroclaw University of Science and Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
Bartosz Brejna, Marcin Pietranik & Adrianna Kozierkiewicz

Authors

Bartosz Brejna
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Pietranik
View author publications
You can also search for this author in PubMed Google Scholar
Adrianna Kozierkiewicz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcin Pietranik .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
University of Leipzig, Leipzig, Germany
Bogdan Franczyk
University of Leipzig, Leipzig, Sachsen, Germany
André Ludwig
Universidad Complutense de Madrid, Madrid, Spain
Manuel Núñez
Vrije Universiteit Amsterdam, Amsterdam, Noord-Holland, The Netherlands
Jan Treur
University of Münster, Münster, Germany
Gottfried Vossen
Wrocław University of Science and Technology, Wrocław, Poland
Adrianna Kozierkiewicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brejna, B., Pietranik, M., Kozierkiewicz, A. (2024). The New K-Means Initialization Method. In: Nguyen, N.T., et al. Computational Collective Intelligence. ICCCI 2024. Lecture Notes in Computer Science(), vol 14810. Springer, Cham. https://doi.org/10.1007/978-3-031-70816-9_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-70816-9_29
Published: 28 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70815-2
Online ISBN: 978-3-031-70816-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics