Abstract
Cluster validation is a major issue in cluster analysis of data mining, which is the process of evaluating performance of clustering algorithms under varying input conditions. Many existing validity indices address clustering results of low-dimensional data. Within high-dimensional data, many of the dimensions are irrelevant, and the clusters usually only exist in some projected subspaces spanned by different combinations of dimensions. This paper presents a solution to the problem of cluster validation for projective clustering. We propose two new measurements for the intracluster compactness and intercluster separation of projected clusters. Based on these measurements and the conventional indices, three new cluster validity indices are presented. Combined with a fuzzy projective clustering algorithm, the new indices are used to determine the number of projected clusters in high-dimensional data. The suitability of our proposal has been demonstrated through an empirical study using synthetic and real-world datasets.
Similar content being viewed by others
References
Berkhin P. A survey of clustering data mining techniques. In: Kogan J, Nicholas C, Teboulle M, eds. Grouping Multidimensional Data: Recent Advances in Clustering. Berlin: Springer, 2006, 25–71
Parsons L, Haque E, Liu H. Subspace clustering for high dimensional data: a review. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 90–105
Sun H, Wang S, Jiang Q. FCM-based model selection algorithms for determining the number of custers. Pattern Recognition, 2004, 37(10): 2027–2037
Kim M, Yoo H, Ramakrishna R S. Cluster validation for high dimensional datasets. Proceeding of the AIMSA, 2004, 178–187
Halkidi M, Batistakis Y, Vazirgiannis M. Clustering validity checking methods: Part II. ACM SIGMOD Record Archive, 2002, 31(3): 19–27
Bouguessa M, Wang S, Sun H. An objective approach to cluster validation. Pattern Recognition Letters, 2006, 27: 1419–1430
Pal N R, Bezdek J C. On cluster validity for the fuzzy C-means model. IEEE Transaction on Fuzzy Systems, 1995, 3(3): 370–379
Xie X, Beni G. A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991, 13(8): 841–847
Patrikainen M, Meila M. Comparing subspace clusterings. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(7): 902–916
Chen L, Jiang Q, Wang S. A probabiliy model for projective clustering on high dimensional data. In: Proceedings of the IEEE ICDM, 2008, 755–760
Moise G, Sander J, Ester M. Robust projected clustering. knowledge lnformation System, 2008, 14(3): 273–298
Aggarwal C C, Procopiuc C, Wolf J L, et al. Fast algorithm for projected clustering. ACM SIGMOD Record. New York: ACM, 1999, 28(2): 61–72
Steinbach M, Ertöz L, Kumar V. The challenges of clustering high dimensional data. University of Mnnesota Supercomputing Institute Research Report, 2003, 213: 1–33
Aggarwal C C, Yu P S. Redefining clustering for high-dimensional applications. IEEE Transaction on Knowledge and Data Engineering, 2002, 14(2): 210–225
Domeniconi C, Gunopulos D, Ma S, et al. Locally adaptive metrics for clustering high dimensional data. Data Mining and Knowledge Discovery, 2007, 14(1): 63–98
Agarwal R K, Mustafa N H. k-means projective clustering. In: Proceedings of the PODS, 2004, 155–165
Chen L, Jiang Q, Wang S. Clusten valiation for subspace clustering on high dimensional data. In: Proceedings of APCCAS, 2008, 225–228
Bezdek J C. Pattern recognition in handbook of fuzzy computation. IOP Publishing Ltd., Boston, Ny, 1998 (Chapter F6)
Kriegel H P, Kröger P, Zimek A. Detecting clusters in moderate-tohigh dimensional data: Subspace Clustering, Pattern-based Clustering, and Correlation Clustering. Tutorial ICDM, 2007
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, L., He, S. & Jiang, Q. Validation indices for projective clustering. Front. Comput. Sci. China 3, 477–484 (2009). https://doi.org/10.1007/s11704-009-0051-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-009-0051-1