Abstract
Cluster validation is the process of evaluating performance of cluster-ing algorithms under varying input conditions. This paper presents a new solu-tion to the problem of cluster validation in high-dimensional applications. We examine the applicability of conventional cluster validity indices in evaluating the results of high-dimensional clustering and propose new indices that can be applied to high-dimensional datasets. We also propose an algorithm for auto-matically determining cluster dimension. By utilizing the proposed indices and the algorithm, we can discard the input parameters that PROCLUS needs. Ex-perimental studies show that the proposed cluster validity indices yield better cluster validation performance than is possible with conventional indices.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proc. ACM-SIGMOD Int’l Conf. Management of Data, pp. 94–105 (1998)
Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.S.: Fast Algorithms for Projected Clustering. In: Proc. ACM SIGMOD Int’l Conf. Management of Data, pp. 61–72 (1999)
Aggarwal, C.C., Yu, P.S.: Redefining Clustering for High-Dimensional Applications. IEEE Trans. Knowledge and Data Engineering 14(2), 210–225 (2002)
Berry, M.J.A., Linoff, G.: Data Mining Techniques: For Marketing, Sales, and Customer Support. John Wiley & Sons, Chichester (1997)
Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When Is ‘Nearest Neighbor’ Meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Davies, D.L., Bouldin, D.W.: A Cluster Separation Measure. IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI) 1(2), 224–227 (1979)
Dunn, J.C.: A Fuzzy Relative of the ISODATA Process and its Use in Detecting Compact Well-Separated Clusters. J. Cybernetics 3, 32–57 (1973)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-based Algorithm for Discovering Clusters in Large Spatial Databases. In: Proc. Int’l Conf. Knowledge Discovery and Data Mining (KDD), pp. 226–231 (1996)
Halkidi, M., Vazirgiannis, M.: Quality Scheme Assessment in the Clustering Process. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 265–276. Springer, Heidelberg (2000)
Halkidi, M., Vazirgiannis, M.: Clustering Validity Assessment: Finding the Optimal Partitioning of a Dataset. In: Proc. Int’l Conf. Data Mining (ICDM), pp. 187–194 (2001)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)
Hinneburg, A., Keim, D.A.: An Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. In: Proc. Int’l Conf. Very Large DataBases (VLDB), pp. 506–517 (1999)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)
Kim, D.-J., Park, Y.-W., Park, D.-J.: A Novel Validity Index for Determination of the Optimal Number of Clusters. IEICE Trans. Inf. & Syst. E84-D(2), 281–285 (2001)
Maulik, U., Bandyopadhyay, S.: Performance Evaluation of Some Clustering Algorithms and Validity Indices. IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI) 24(12), 1650–1654 (2002)
Schwarz, G.: Estimating the Dimension of a Model. Annals of Statistics 6(2), 461–464 (1978)
Xie, X.L., Beni, G.A.: A Validity Measure for Fuzzy Clustering. IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI) 3(8), 841–846 (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, M., Yoo, H., Ramakrishna, R.S. (2004). Cluster Validation for High-Dimensional Datasets. In: Bussler, C., Fensel, D. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2004. Lecture Notes in Computer Science(), vol 3192. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30106-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-30106-6_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22959-9
Online ISBN: 978-3-540-30106-6
eBook Packages: Springer Book Archive