Skip to main content

A Non-parametric Method for Data Clustering with Optimal Variable Weighting

  • Conference paper
Intelligent Data Engineering and Automated Learning – IDEAL 2006 (IDEAL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4224))

Abstract

Since cluster analysis in data mining often deals with large-scale high-dimensional data with masking variables, it is important to remove non-contributing variables for accurate cluster recovery and also for proper interpretation of clustering results. Although the weights obtained by variable weighting methods can be used for the purpose of variable selection (or, elimination), they alone hardly provide a clear guide on selecting variables for subsequent analysis. In addition, variable selection and variable weighting are highly interrelated with the choice on the number of clusters. In this paper, we propose a non-parametric data clustering method, based on the W-k-means type clustering, for an automated and joint decision on selecting variables, determining variable weights, and deciding the number of clusters. Conclusions are drawn from computational experiments with random data and real-life data.

This work was supported by the Korea Research Foundation Grant (KRF-2003-041-D00629).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bezdek, J.C., Pal, N.R.: Some new indexes of cluster validity. IEEE Transaction on Systems, Man, and Cybernetics 28(3), 301–315 (1998)

    Article  Google Scholar 

  2. Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in statistics 3(1), 1–27 (1974)

    Article  MathSciNet  Google Scholar 

  3. Chung, J.-W., Choi, I.-C.: A new clustering index for datasets with noise dimensions. Under review, Department of ISIE, Korea University, Republic of Korea (2006)

    Google Scholar 

  4. De Soete, G.: Optimal variable weighting for ultrametric and additive tree clustering. Quality and Quantity 20, 169–180 (1986)

    Article  Google Scholar 

  5. DeSarbo, W.S., Carroll, J.D., Clark, L., Green, P.E.: Synthesized clustering: a method for amalgamating alternative clustering bases with differential weighting of variables. Psychometrika 49, 59–78 (1984)

    MathSciNet  Google Scholar 

  6. De Soete, G., DeSarbo, W.S., Carroll, J.D.: Optimal variable weighting for hierarchical clustering: an alternating least-squares algorithm. Journal of Classification 2, 173–192 (1985)

    Article  MATH  Google Scholar 

  7. Gnanadesikan, R., Ketterning, J.R., Tsao, S.L.: Weighting and selection of variables for cluster analysis. Journal of Classification 12, 113–136 (1995)

    Article  MATH  Google Scholar 

  8. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 17, 107–145 (2001)

    Article  MATH  Google Scholar 

  9. Hansen, P., Jaumard, B., Mladenovic, N.: Minimum sum of squares clustering in a low dimensional space. Journal of Classification 15, 37–55 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  10. Huang, J.Z., Ng, M.K., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(5), 657–668 (2005)

    Article  Google Scholar 

  11. Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2, 193–218 (1985)

    Article  Google Scholar 

  12. Kim, M., Yoo, H., Ramakrishna, R.S.: Cluster validation for high-dimensional datasets. In: Bussler, C.J., Fensel, D. (eds.) AIMSA 2004. LNCS (LNAI), vol. 3192, pp. 178–187. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  13. Last, M., Kandel, A., Maimon, O.: Information-theoretic algorithm for feature selection. Pattern Recognition Letters 22, 799–811 (2001)

    Article  MATH  Google Scholar 

  14. Legendre, P.: Program K-means (2000), http://www.fas.umontreal.ca/biol/legendre/

  15. Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recognition 36, 451–461 (2003)

    Article  Google Scholar 

  16. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley symposium in Mathematical Statistics and Probability, vol. 1, pp. 231–297 (1967)

    Google Scholar 

  17. Makarenkov, V., Legendre, P.: Optimal variable weighting for ultrametric and additive trees and k-means partitioning: methods and software. Journal of Classification 18, 247–271 (2001), http://www.fas.umontreal.ca/biol/legendre/

    MathSciNet  Google Scholar 

  18. Mangasarian, O.L., Setiono, R., Wolberg, W.H.: Pattern recognition via linear programming: Theory and application to medical diagnosis. In: Coleman, T.F., Li, Y. (eds.) Large-scale numerical optimization, pp. 22–30. SIAM Publications, Philadelphia (1990)

    Google Scholar 

  19. Milligan, G.W.: An algorithm for generating artificial test clusters. Psychometrika 50(1), 123–127 (1985), http://www.pitt.edu/~csna/Milligan/readme.html

    Article  Google Scholar 

  20. Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2), 159–179 (1985)

    Article  Google Scholar 

  21. Kumar, N., Andreou, A.G.: A generalization of linear discriminant analysis in maximum likelyhood framework. In: Proceedings of the Joint Statistical Meeting, Statistical Computing section (1996)

    Google Scholar 

  22. Ray, S., Turi, R.H.: Determination of number of clusters in k-means clustering and application in colour image segmentation. In: Proceedings of the 4th International Conference on Advances in Pattern Recognition and Digital Techniques, pp. 137–143 (1999)

    Google Scholar 

  23. Swets, D.L., Weng, J.J.: Using discriminant eigenfeatures for image retrieval. IEEE Transaction on Pattern Analysis and Machine Intelligence 18(8), 831–836 (1996)

    Article  Google Scholar 

  24. Wettschereck, D., Aha, D., Mohri, T.: A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review 11, 273–314 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chung, JW., Choi, IC. (2006). A Non-parametric Method for Data Clustering with Optimal Variable Weighting. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_97

Download citation

  • DOI: https://doi.org/10.1007/11875581_97

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45485-4

  • Online ISBN: 978-3-540-45487-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics