Importance of Individual Variables in the k-Means Algorithm

Vesanto, Juha

doi:10.1007/3-540-45357-1_54

Juha Vesanto⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2035))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1399 Accesses

Abstract

In this paper, quantization errors of individual variables in k-means quantization algorithm are investigated with respect to scaling factors, variable dependency, and distribution characteristics. It is observed that Z-norm standardation limits average quantization errors per variable to unit range. Two measures, quantization quality and effective number of quantization points are proposed for evaluating the goodness of quantization of individual variables. Both measures are invariant with respect to scaling/variances of variables. By comparing these measures between variables, a sense of the relative importance of variables is gained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

K-Means Clustering

K-Way Spectral Clustering

The global Minmax k-means algorithm

Article Open access 27 September 2016

References

Michael R. Anderberg. Cluster Analysis For Applications. Academic Press, 1973.
Google Scholar
E. Anderson. The irises of the gaspe peninsula. Bulletin of American Iris Society, 1935.
Google Scholar
Robert M. Gray. Vector quantization. IEEE ASSP Magazine, pages 4–29, April 1984.
Google Scholar
Jari A. Kangas, Teuvo K. Kohonen, and Jorma T. Laaksonen. Variants of Self-Organizing Maps. IEEE Transactions on Neural Networks, 1(1):93–99, March 1990.
Article Google Scholar
Leonard Kaufman and Peter J. Rousseeuw. Finding Groups in Data: and Introduction to Cluster Analysis. John Wiley & Sons, Inc., 1990.
Google Scholar
Glenn W. Milligan and Martha C. Cooper. A study of standardation of variables in cluster analysis. Journal of Classification, 5:181–204, 1988.
Article MathSciNet Google Scholar
John Moody and Christian J. Darken. Fast Learning in Networks of Locally-Tuned Processing Units. Neural Computation, 1(2):281–294, 1989.
Article Google Scholar
Dorian Pyle. Data Preparation for Data Mining. Morgan Kaufmann Publishers, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Neural Networks Research Centre, Helsinki University of Technology, P.O.Box 5400, 02015, HUT, Finland
Juha Vesanto

Authors

Juha Vesanto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science and Information Systems, The University of Hong Kong, Pokfulam, Hong Kong China
David Cheung
CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, ACT 2601, Australia
Graham J. Williams
Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave., Kowloon, Hong Kong China
Qing Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vesanto, J. (2001). Importance of Individual Variables in the k-Means Algorithm. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_54

Download citation

DOI: https://doi.org/10.1007/3-540-45357-1_54
Published: 11 April 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41910-5
Online ISBN: 978-3-540-45357-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics