Abstract
In this paper, quantization errors of individual variables in k-means quantization algorithm are investigated with respect to scaling factors, variable dependency, and distribution characteristics. It is observed that Z-norm standardation limits average quantization errors per variable to unit range. Two measures, quantization quality and effective number of quantization points are proposed for evaluating the goodness of quantization of individual variables. Both measures are invariant with respect to scaling/variances of variables. By comparing these measures between variables, a sense of the relative importance of variables is gained.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Michael R. Anderberg. Cluster Analysis For Applications. Academic Press, 1973.
E. Anderson. The irises of the gaspe peninsula. Bulletin of American Iris Society, 1935.
Robert M. Gray. Vector quantization. IEEE ASSP Magazine, pages 4–29, April 1984.
Jari A. Kangas, Teuvo K. Kohonen, and Jorma T. Laaksonen. Variants of Self-Organizing Maps. IEEE Transactions on Neural Networks, 1(1):93–99, March 1990.
Leonard Kaufman and Peter J. Rousseeuw. Finding Groups in Data: and Introduction to Cluster Analysis. John Wiley & Sons, Inc., 1990.
Glenn W. Milligan and Martha C. Cooper. A study of standardation of variables in cluster analysis. Journal of Classification, 5:181–204, 1988.
John Moody and Christian J. Darken. Fast Learning in Networks of Locally-Tuned Processing Units. Neural Computation, 1(2):281–294, 1989.
Dorian Pyle. Data Preparation for Data Mining. Morgan Kaufmann Publishers, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vesanto, J. (2001). Importance of Individual Variables in the k-Means Algorithm. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_54
Download citation
DOI: https://doi.org/10.1007/3-540-45357-1_54
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41910-5
Online ISBN: 978-3-540-45357-4
eBook Packages: Springer Book Archive