Abstract
This paper presents an analysis of the number of iterations K-Means takes to converge under different initializations. We have experimented with seven initialization algorithms in a total of 37 real and synthetic datasets. We have found that hierarchical-based initializations tend to be most effective at reducing the number of iterations, especially a divisive algorithm using the Ward criterion when applied to real datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Ball, G., Hall, D.: A clustering technique for summarizing multivariate data. Behav. Sci. 12, 153–155 (1967)
Jain, A.K.: Data Clustering: 50 Years Beyond K-Means. Pattern Recognition Letters 31, 651–666 (2010)
Mirkin, B.: Clustering for Data Mining: A Data Discovery Approach. Chapman and Hall/CRC, Boca Raton (2005)
Pal, S.K., Mitra, P.: Pattern Recognition Algorithms for Data Mining. CRC Press, Boca Raton (2004)
Steinley, D., Brusco, M.: Initializing K-Means Batch Clustering: A Critical Evaluation of Several Techniques. Journal of Classification 22, 221–250 (2007)
Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58, 236–244 (1963)
Chiang, M.M., Mirkin, B.: Intelligent choice of the number of clusters in k-means clustering: An experimental study with different cluster spreads. Journal of Classification 27(1), 1–38 (2010)
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society 28(1), 100–108 (1979)
Irvine machine learning repository (accessed September 05, 2011)
Mirkin, B.: Mathematical classification and clustering. Kluwer Academic Press, Dordrecht (1996)
Kaufman, L., Rousseeuw, P.: Finding groups in data: An introduction to cluster analysis. J. Wiley and Son (1990)
Maitra, R., Peterson, A.D., Ghosh, A.P.: A systematic evaluation of different methods for initializing the K-means clustering algorithm. Transactions on Knowledge and Data Engineering (2010)
Milligan, G.W.: The validation of four ultrametric clustering algorithms. Pattern Recognition 12, 41–50 (1980)
Arthur, D., Vissilvitskii, S.: K-Means++: The advantages of careful seeding. In: ACM-SIAM Symposiom on Discrete Algorithms, Astor Crowne Plaza, New Orlans, Lousiana, pp. 1–11 (2007)
Netlab Neural Network software, http://www1.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/ (accessed on September 01, 2011)
Lozano, J.A., Pena, J.M., Larranaga, P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Lett. 20, 1027–1040 (1999)
Xu, R., Wunsch II, D.C.: Clustering. John Wiley and Sons (2010)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley and Sons, Inc., New York (2001)
Milligan, G.W., Cooper, M.C.: A study of standardization of the variables in cluster analysis. Journal of Classification 5, 181–204 (1988)
De Amorim, R.C.: Constrained Intelligent K-Means: Improving results with limited previous knowledge. In: Proceedings of Advanced Engineering Computing and Applications in Science, pp. 176–180. IEEE Computer Society (2008)
De Amorim, R.C., Mirkin, B.: Minkowski Metric, Feature Weighting and Anomalous Cluster Initializing in K-Means Clustering. Pattern Recognition (2011), doi:doi:10.1016/j.patcog.2011.08.12
de Amorim, R.C., Komisarczuk, P.: On Initializations for the Minkowski Weighted K-Means. In: Hollmén, J., Klawonn, F., Tucker, A. (eds.) IDA 2012. LNCS, vol. 7619, pp. 45–55. Springer, Heidelberg (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
de Amorim, R.C. (2013). An Empirical Evaluation of Different Initializations on the Number of K-Means Iterations. In: Batyrshin, I., González Mendoza, M. (eds) Advances in Artificial Intelligence. MICAI 2012. Lecture Notes in Computer Science(), vol 7629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37807-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-37807-2_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37806-5
Online ISBN: 978-3-642-37807-2
eBook Packages: Computer ScienceComputer Science (R0)