Skip to main content

An Empirical Evaluation of Different Initializations on the Number of K-Means Iterations

  • Conference paper
Advances in Artificial Intelligence (MICAI 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7629))

Included in the following conference series:

Abstract

This paper presents an analysis of the number of iterations K-Means takes to converge under different initializations. We have experimented with seven initialization algorithms in a total of 37 real and synthetic datasets. We have found that hierarchical-based initializations tend to be most effective at reducing the number of iterations, especially a divisive algorithm using the Ward criterion when applied to real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)

    Google Scholar 

  2. Ball, G., Hall, D.: A clustering technique for summarizing multivariate data. Behav. Sci. 12, 153–155 (1967)

    Article  Google Scholar 

  3. Jain, A.K.: Data Clustering: 50 Years Beyond K-Means. Pattern Recognition Letters 31, 651–666 (2010)

    Article  Google Scholar 

  4. Mirkin, B.: Clustering for Data Mining: A Data Discovery Approach. Chapman and Hall/CRC, Boca Raton (2005)

    Book  MATH  Google Scholar 

  5. Pal, S.K., Mitra, P.: Pattern Recognition Algorithms for Data Mining. CRC Press, Boca Raton (2004)

    Book  MATH  Google Scholar 

  6. Steinley, D., Brusco, M.: Initializing K-Means Batch Clustering: A Critical Evaluation of Several Techniques. Journal of Classification 22, 221–250 (2007)

    Article  MathSciNet  Google Scholar 

  7. Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58, 236–244 (1963)

    Article  MathSciNet  Google Scholar 

  8. Chiang, M.M., Mirkin, B.: Intelligent choice of the number of clusters in k-means clustering: An experimental study with different cluster spreads. Journal of Classification 27(1), 1–38 (2010)

    Article  MathSciNet  Google Scholar 

  9. Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society 28(1), 100–108 (1979)

    MATH  Google Scholar 

  10. Irvine machine learning repository (accessed September 05, 2011)

    Google Scholar 

  11. Mirkin, B.: Mathematical classification and clustering. Kluwer Academic Press, Dordrecht (1996)

    Book  MATH  Google Scholar 

  12. Kaufman, L., Rousseeuw, P.: Finding groups in data: An introduction to cluster analysis. J. Wiley and Son (1990)

    Google Scholar 

  13. Maitra, R., Peterson, A.D., Ghosh, A.P.: A systematic evaluation of different methods for initializing the K-means clustering algorithm. Transactions on Knowledge and Data Engineering (2010)

    Google Scholar 

  14. Milligan, G.W.: The validation of four ultrametric clustering algorithms. Pattern Recognition 12, 41–50 (1980)

    Article  Google Scholar 

  15. Arthur, D., Vissilvitskii, S.: K-Means++: The advantages of careful seeding. In: ACM-SIAM Symposiom on Discrete Algorithms, Astor Crowne Plaza, New Orlans, Lousiana, pp. 1–11 (2007)

    Google Scholar 

  16. Netlab Neural Network software, http://www1.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/ (accessed on September 01, 2011)

  17. Lozano, J.A., Pena, J.M., Larranaga, P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Lett. 20, 1027–1040 (1999)

    Article  Google Scholar 

  18. Xu, R., Wunsch II, D.C.: Clustering. John Wiley and Sons (2010)

    Google Scholar 

  19. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley and Sons, Inc., New York (2001)

    MATH  Google Scholar 

  20. Milligan, G.W., Cooper, M.C.: A study of standardization of the variables in cluster analysis. Journal of Classification 5, 181–204 (1988)

    Article  MathSciNet  Google Scholar 

  21. De Amorim, R.C.: Constrained Intelligent K-Means: Improving results with limited previous knowledge. In: Proceedings of Advanced Engineering Computing and Applications in Science, pp. 176–180. IEEE Computer Society (2008)

    Google Scholar 

  22. De Amorim, R.C., Mirkin, B.: Minkowski Metric, Feature Weighting and Anomalous Cluster Initializing in K-Means Clustering. Pattern Recognition (2011), doi:doi:10.1016/j.patcog.2011.08.12

    Google Scholar 

  23. de Amorim, R.C., Komisarczuk, P.: On Initializations for the Minkowski Weighted K-Means. In: Hollmén, J., Klawonn, F., Tucker, A. (eds.) IDA 2012. LNCS, vol. 7619, pp. 45–55. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

de Amorim, R.C. (2013). An Empirical Evaluation of Different Initializations on the Number of K-Means Iterations. In: Batyrshin, I., González Mendoza, M. (eds) Advances in Artificial Intelligence. MICAI 2012. Lecture Notes in Computer Science(), vol 7629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37807-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37807-2_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37806-5

  • Online ISBN: 978-3-642-37807-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics