Skip to main content

Importance of Individual Variables in the k-Means Algorithm

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2035))

Included in the following conference series:

Abstract

In this paper, quantization errors of individual variables in k-means quantization algorithm are investigated with respect to scaling factors, variable dependency, and distribution characteristics. It is observed that Z-norm standardation limits average quantization errors per variable to unit range. Two measures, quantization quality and effective number of quantization points are proposed for evaluating the goodness of quantization of individual variables. Both measures are invariant with respect to scaling/variances of variables. By comparing these measures between variables, a sense of the relative importance of variables is gained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Michael R. Anderberg. Cluster Analysis For Applications. Academic Press, 1973.

    Google Scholar 

  2. E. Anderson. The irises of the gaspe peninsula. Bulletin of American Iris Society, 1935.

    Google Scholar 

  3. Robert M. Gray. Vector quantization. IEEE ASSP Magazine, pages 4–29, April 1984.

    Google Scholar 

  4. Jari A. Kangas, Teuvo K. Kohonen, and Jorma T. Laaksonen. Variants of Self-Organizing Maps. IEEE Transactions on Neural Networks, 1(1):93–99, March 1990.

    Article  Google Scholar 

  5. Leonard Kaufman and Peter J. Rousseeuw. Finding Groups in Data: and Introduction to Cluster Analysis. John Wiley & Sons, Inc., 1990.

    Google Scholar 

  6. Glenn W. Milligan and Martha C. Cooper. A study of standardation of variables in cluster analysis. Journal of Classification, 5:181–204, 1988.

    Article  MathSciNet  Google Scholar 

  7. John Moody and Christian J. Darken. Fast Learning in Networks of Locally-Tuned Processing Units. Neural Computation, 1(2):281–294, 1989.

    Article  Google Scholar 

  8. Dorian Pyle. Data Preparation for Data Mining. Morgan Kaufmann Publishers, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vesanto, J. (2001). Importance of Individual Variables in the k-Means Algorithm. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_54

Download citation

  • DOI: https://doi.org/10.1007/3-540-45357-1_54

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41910-5

  • Online ISBN: 978-3-540-45357-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics