Skip to main content
Log in

Validation indices for projective clustering

  • Research Article
  • Published:
Frontiers of Computer Science in China Aims and scope Submit manuscript

Abstract

Cluster validation is a major issue in cluster analysis of data mining, which is the process of evaluating performance of clustering algorithms under varying input conditions. Many existing validity indices address clustering results of low-dimensional data. Within high-dimensional data, many of the dimensions are irrelevant, and the clusters usually only exist in some projected subspaces spanned by different combinations of dimensions. This paper presents a solution to the problem of cluster validation for projective clustering. We propose two new measurements for the intracluster compactness and intercluster separation of projected clusters. Based on these measurements and the conventional indices, three new cluster validity indices are presented. Combined with a fuzzy projective clustering algorithm, the new indices are used to determine the number of projected clusters in high-dimensional data. The suitability of our proposal has been demonstrated through an empirical study using synthetic and real-world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Berkhin P. A survey of clustering data mining techniques. In: Kogan J, Nicholas C, Teboulle M, eds. Grouping Multidimensional Data: Recent Advances in Clustering. Berlin: Springer, 2006, 25–71

    Chapter  Google Scholar 

  2. Parsons L, Haque E, Liu H. Subspace clustering for high dimensional data: a review. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 90–105

    Article  Google Scholar 

  3. Sun H, Wang S, Jiang Q. FCM-based model selection algorithms for determining the number of custers. Pattern Recognition, 2004, 37(10): 2027–2037

    Article  MATH  Google Scholar 

  4. Kim M, Yoo H, Ramakrishna R S. Cluster validation for high dimensional datasets. Proceeding of the AIMSA, 2004, 178–187

  5. Halkidi M, Batistakis Y, Vazirgiannis M. Clustering validity checking methods: Part II. ACM SIGMOD Record Archive, 2002, 31(3): 19–27

    Article  Google Scholar 

  6. Bouguessa M, Wang S, Sun H. An objective approach to cluster validation. Pattern Recognition Letters, 2006, 27: 1419–1430

    Article  Google Scholar 

  7. Pal N R, Bezdek J C. On cluster validity for the fuzzy C-means model. IEEE Transaction on Fuzzy Systems, 1995, 3(3): 370–379

    Article  Google Scholar 

  8. Xie X, Beni G. A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991, 13(8): 841–847

    Article  Google Scholar 

  9. Patrikainen M, Meila M. Comparing subspace clusterings. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(7): 902–916

    Article  Google Scholar 

  10. Chen L, Jiang Q, Wang S. A probabiliy model for projective clustering on high dimensional data. In: Proceedings of the IEEE ICDM, 2008, 755–760

  11. Moise G, Sander J, Ester M. Robust projected clustering. knowledge lnformation System, 2008, 14(3): 273–298

    Article  MATH  Google Scholar 

  12. Aggarwal C C, Procopiuc C, Wolf J L, et al. Fast algorithm for projected clustering. ACM SIGMOD Record. New York: ACM, 1999, 28(2): 61–72

    Google Scholar 

  13. Steinbach M, Ertöz L, Kumar V. The challenges of clustering high dimensional data. University of Mnnesota Supercomputing Institute Research Report, 2003, 213: 1–33

    Google Scholar 

  14. Aggarwal C C, Yu P S. Redefining clustering for high-dimensional applications. IEEE Transaction on Knowledge and Data Engineering, 2002, 14(2): 210–225

    Article  Google Scholar 

  15. Domeniconi C, Gunopulos D, Ma S, et al. Locally adaptive metrics for clustering high dimensional data. Data Mining and Knowledge Discovery, 2007, 14(1): 63–98

    Article  MathSciNet  Google Scholar 

  16. Agarwal R K, Mustafa N H. k-means projective clustering. In: Proceedings of the PODS, 2004, 155–165

  17. Chen L, Jiang Q, Wang S. Clusten valiation for subspace clustering on high dimensional data. In: Proceedings of APCCAS, 2008, 225–228

  18. Bezdek J C. Pattern recognition in handbook of fuzzy computation. IOP Publishing Ltd., Boston, Ny, 1998 (Chapter F6)

    Google Scholar 

  19. Kriegel H P, Kröger P, Zimek A. Detecting clusters in moderate-tohigh dimensional data: Subspace Clustering, Pattern-based Clustering, and Correlation Clustering. Tutorial ICDM, 2007

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lifei Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, L., He, S. & Jiang, Q. Validation indices for projective clustering. Front. Comput. Sci. China 3, 477–484 (2009). https://doi.org/10.1007/s11704-009-0051-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-009-0051-1

Keywords

Navigation