An Empirical Evaluation of Different Initializations on the Number of K-Means Iterations

de Amorim, Renato Cordeiro

doi:10.1007/978-3-642-37807-2_2

Renato Cordeiro de Amorim²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7629))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

2245 Accesses
3 Citations

Abstract

This paper presents an analysis of the number of iterations K-Means takes to converge under different initializations. We have experimented with seven initialization algorithms in a total of 37 real and synthetic datasets. We have found that hierarchical-based initializations tend to be most effective at reducing the number of iterations, especially a divisive algorithm using the Ward criterion when applied to real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Google Scholar
Ball, G., Hall, D.: A clustering technique for summarizing multivariate data. Behav. Sci. 12, 153–155 (1967)
Article Google Scholar
Jain, A.K.: Data Clustering: 50 Years Beyond K-Means. Pattern Recognition Letters 31, 651–666 (2010)
Article Google Scholar
Mirkin, B.: Clustering for Data Mining: A Data Discovery Approach. Chapman and Hall/CRC, Boca Raton (2005)
Book MATH Google Scholar
Pal, S.K., Mitra, P.: Pattern Recognition Algorithms for Data Mining. CRC Press, Boca Raton (2004)
Book MATH Google Scholar
Steinley, D., Brusco, M.: Initializing K-Means Batch Clustering: A Critical Evaluation of Several Techniques. Journal of Classification 22, 221–250 (2007)
Article MathSciNet Google Scholar
Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58, 236–244 (1963)
Article MathSciNet Google Scholar
Chiang, M.M., Mirkin, B.: Intelligent choice of the number of clusters in k-means clustering: An experimental study with different cluster spreads. Journal of Classification 27(1), 1–38 (2010)
Article MathSciNet Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society 28(1), 100–108 (1979)
MATH Google Scholar
Irvine machine learning repository (accessed September 05, 2011)
Google Scholar
Mirkin, B.: Mathematical classification and clustering. Kluwer Academic Press, Dordrecht (1996)
Book MATH Google Scholar
Kaufman, L., Rousseeuw, P.: Finding groups in data: An introduction to cluster analysis. J. Wiley and Son (1990)
Google Scholar
Maitra, R., Peterson, A.D., Ghosh, A.P.: A systematic evaluation of different methods for initializing the K-means clustering algorithm. Transactions on Knowledge and Data Engineering (2010)
Google Scholar
Milligan, G.W.: The validation of four ultrametric clustering algorithms. Pattern Recognition 12, 41–50 (1980)
Article Google Scholar
Arthur, D., Vissilvitskii, S.: K-Means++: The advantages of careful seeding. In: ACM-SIAM Symposiom on Discrete Algorithms, Astor Crowne Plaza, New Orlans, Lousiana, pp. 1–11 (2007)
Google Scholar
Netlab Neural Network software, http://www1.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/ (accessed on September 01, 2011)
Lozano, J.A., Pena, J.M., Larranaga, P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Lett. 20, 1027–1040 (1999)
Article Google Scholar
Xu, R., Wunsch II, D.C.: Clustering. John Wiley and Sons (2010)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley and Sons, Inc., New York (2001)
MATH Google Scholar
Milligan, G.W., Cooper, M.C.: A study of standardization of the variables in cluster analysis. Journal of Classification 5, 181–204 (1988)
Article MathSciNet Google Scholar
De Amorim, R.C.: Constrained Intelligent K-Means: Improving results with limited previous knowledge. In: Proceedings of Advanced Engineering Computing and Applications in Science, pp. 176–180. IEEE Computer Society (2008)
Google Scholar
De Amorim, R.C., Mirkin, B.: Minkowski Metric, Feature Weighting and Anomalous Cluster Initializing in K-Means Clustering. Pattern Recognition (2011), doi:doi:10.1016/j.patcog.2011.08.12
Google Scholar
de Amorim, R.C., Komisarczuk, P.: On Initializations for the Minkowski Weighted K-Means. In: Hollmén, J., Klawonn, F., Tucker, A. (eds.) IDA 2012. LNCS, vol. 7619, pp. 45–55. Springer, Heidelberg (2012)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Systems, Birkbeck University of London, Malet Street, WC1E 7HX, UK
Renato Cordeiro de Amorim

Authors

Renato Cordeiro de Amorim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Eje Central Lazaro Cardenas Norte, Mexican Petroleum Institute, 152, Col. San Bartolo Atepehuacan, CP 07730, México D.F., Mexico
Ildar Batyrshin
Tecnológico de Monterrey, Campus Estado de México, Carretera Lago de Guadalupe Km 3.5, Atizapán de Zaragoza, ,,, CP 52926, Estado de México, Mexico
Miguel González Mendoza

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Amorim, R.C. (2013). An Empirical Evaluation of Different Initializations on the Number of K-Means Iterations. In: Batyrshin, I., González Mendoza, M. (eds) Advances in Artificial Intelligence. MICAI 2012. Lecture Notes in Computer Science(), vol 7629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37807-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-37807-2_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37806-5
Online ISBN: 978-3-642-37807-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics