Skip to main content

The New K-Means Initialization Method

  • Conference paper
  • First Online:
Computational Collective Intelligence (ICCCI 2024)

Abstract

Data clustering methods are crucial in various data analysis and machine learning fields. They are essential for organizing and understanding complex datasets by grouping similar (according to the assumed distance function) data. These methods facilitate customer segmentation, anomaly detection, pattern recognition and many other practical problems. Various clustering algorithms can be found in the literature, but the most popular method is the k-means algorithm and its subsequent modifications. This work presents a new method of initializing the k-means algorithm, which is tested on random and benchmark datasets. This paper discusses the importance of selecting the appropriate initialization of the k-means clustering method, demonstrating the effectiveness and superiority of the developed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aeberhard, S., Forina, M.: Wine. UCI Machine Learning Repository (1991)

    Google Scholar 

  2. Al-Daoud, M.: A new initialization approach for k-means algorithm. Ubiquitous Comput. Commun. J. 2(3), 1–8 (2005)

    Google Scholar 

  3. Arthur, D., Vassilvitskii, S., et al.: K-means++: the advantages of careful seeding. In: Soda, vol. 7, pp. 1027–1035 (2007)

    Google Scholar 

  4. Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable K-means++. In: Proceedings of the VLDB Endowment, vol. 5, pp. 622–633. VLDB Endowment (2012)

    Google Scholar 

  5. Bradley, P.S., Fayyad, U.M.: Refining initial points for k-means clustering. In: Proceedings of the Fifteenth International Conference on Machine Learning, vol. 66, pp. 91–99. Citeseer (1998)

    Google Scholar 

  6. Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 40(1), 200–210 (2013)

    Article  Google Scholar 

  7. Ding, Y., Zhao, Y., Shen, X., Musuvathi, M., Mytkowicz, T.: Yinyang k-means: a drop-in replacement of the classic k-means with consistent speedup, pp. 579–587, July 2015

    Google Scholar 

  8. Fisher, R.A.: Iris. UCI Machine Learning Repository (1988)

    Google Scholar 

  9. Franti, P., Mariescu-Istodor, R., Zhong, C.: K-means properties on six clustering benchmark datasets. Appl. Intell. 48(12), 4743–4759 (2018)

    Article  Google Scholar 

  10. Fritzke, B.: Breathing k-means. arXiv preprint arXiv:2006.15666, 2020

  11. Inaba, M., Katoh, N., Imai, H.: Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering. In: Proceedings of the Tenth Annual Symposium on Computational Geometry, pp. 332–339 (1994)

    Google Scholar 

  12. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  13. Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for k-means clustering. Pattern Recognit. Lett. 25(11), 1293–1302 (2004)

    Article  Google Scholar 

  14. Krzanowski, W.: Principles of Multivariate Analysis, vol. 23. OUP Oxford, Oxford (2000)

    Google Scholar 

  15. Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recognit. 36(2), 451–461 (2003)

    Article  Google Scholar 

  16. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Article  MathSciNet  Google Scholar 

  17. MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations, vol. 1, no. 14, pp. 281–297 (1967)

    Google Scholar 

  18. Pelleg, D., Moore, A.W., et al.: X-means: extending k-means with efficient estimation of the number of clusters, vol. 1, pp. 727–734 (2000)

    Google Scholar 

  19. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Article  Google Scholar 

  20. Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic, vol. 63, pp. 411–423. Wiley Online Library (2001)

    Google Scholar 

  21. Wolberg, W.: Breast Cancer Wisconsin (Original). UCI Machine Learning Repository (1992)

    Google Scholar 

  22. Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. 2, 165–193 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcin Pietranik .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Brejna, B., Pietranik, M., Kozierkiewicz, A. (2024). The New K-Means Initialization Method. In: Nguyen, N.T., et al. Computational Collective Intelligence. ICCCI 2024. Lecture Notes in Computer Science(), vol 14810. Springer, Cham. https://doi.org/10.1007/978-3-031-70816-9_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70816-9_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70815-2

  • Online ISBN: 978-3-031-70816-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics