Abstract
This paper presents M-FastMap, a modified FastMap algorithm for visual cluster validation in data mining. In the visual cluster validation with FastMap, clusters are first generated with a clustering algorithm from a database. Then, the FastMap algorithm is used to project the clusters onto a 2-dimensional (2D) or 3-dimensional (3D) space and the clusters are visualized with different colors and/or symbols on a 2D (or 3D) display. From the display a human can visually examine the separation of clusters. This method follows the principle that if a cluster is separate from others in the projected 2D (or 3D) space, it is also separate from others in the original high dimensional space (the opposite is not true). The modified FastMap algorithm improves the quality of visual cluster validation by optimizing the separation of clusters on the 2D or (3D) space in the selection of pivot objects (or projection axis). The comparison study has shown that the modified FastMap algorithm can produce better visualization results than the original FastMap algorithm.
supported in part by RGC Grant No. 7132/00P and HKU CRCG Grant Nos 10203501, 10203907 and 10203408.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Gehrke, J, Gunopulos, D. and Raghavan, P. (1998) Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of SIGMOD Conference.
Cormack, R. (1971) A review of classification. Journal of Royal Statistical Society, Series A, Vol. 134, pp. 321–367.
Cox, T and Cox, M (1994) Multidimensional Scaling. Chapman & Hall.
Dubes, R. C. (1987) How many clusters are best?-an experiment. Pattern Recognition, Vol. 20, No. 6, pp. 645–663.
Dubes, R. and Jain, A. K. (1979) Validity studies in clustering methodologies. Pattern Recognition, Vol. 11, pp. 235–254.
Ester, M., Kriegel, H.-P., Sander, J. and Xu, X. (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining, Portland, Oregon, USA.
Everitt, B. (1974) Cluster Analysis. Heinemann Educational Books Ltd.
Faloutsos, C. and Lin, K., (1995) Fastmap: a fast algorithm for indexing, datamining and visualization of traditional and multimedia datasets. In Proceedings of ACM-SIGMOD, pp. 163–174.
Fukunaga, K. (1990) Introduction to Statistical Pattern Recognition. Academic Press.
Ganti, V., Ramakrishnan, R., Gehrke, J, Powell, A. L. and French, J. C. (1999) Clustering large datasets in arbitrary metric spaces. ICDE 1999, pp. 502–511.
Gordon, A. D. (1998) Cluster validation, In Data Science, Classification, and Related Methods, ed. C Hayashi, N Ohsumi, K Yajima, Y Tanaka, H-H Bock and Y Baba, Springer, Tokyo, pp 22–39.
Gordon, A. D. (1994) Identifying genuine clusters in a classification. Computational Statistics and Data Analysis 18, pp. 516–581.
Huang, Z. (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, Vol. 2, No. 3, pp. 283–304.
Huang, Z. and Lin, T. (2000) A visual method of cluster validation with Fastmap. In Proceedings of PAKDD2000, Kyoto, Japan.
Huang, Z., Ng, M. K. and Cheung, D. W. (2001) An empirical study on the visual cluster validation method with Fastmap. In Proceedings of DASFAA2001, Hong Kong.
Jain, A. K. and Dubes, R. C. (1988) Algorithms for Clustering Data. Prentice Hall.
Kruskal, J. B. and Carroll, J. D. (1969) Geometrical models and badness-of-fit functions, in Multivariate Analysis II, ed. P. R. Krishnaiah, Academic Press, pp. 639–670.
Milligan, G. W. (1996) Clustering validation: results and implications for applied analysis. in Clustering and Classification, ed. P. Arabie, L. J. Hubert and G. De Soete, World Scientific, pp. 341–375.
Milligan, G. W. (1981) A Monte Carlo study of thirty internal criterion measures for cluster analysis. Psychometrika, Vol. 46, No. 2, pp. 187–199.
Milligan, G. W. and Cooper, M. C. (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika, Vol. 50, No. 2, pp. 159–179.
Milligan, G. W. and Isaac, P. D. (1980) The validation of four ultrametric clustering algorithms. Pattern Recognition, Vol. 12, pp. 41–50.
Ng, R. and Han, J. (1994) Efficient and effective clustering methods for spatial data mining. In Proceedings of VLDB, 1994.
Rousseeuw, P. J. (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, Vol. 20, pp. 53–65.
Theodoridis, S. and Koutroumbas, K. (1999) Pattern Recognition. Academic Press.
Young, F. W. (1987) Multidimensional scaling: history, theory and applications. Lawrence Erlbaum Associates.
Zhang, T. and Ramakrishnan, R. (1997) BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, Vol. 1, No. 2, pp. 141–182.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ng, M., Huang, J. (2002). M-FastMap: A Modified FastMap Algorithm for Visual Cluster Validation in Data Mining. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_22
Download citation
DOI: https://doi.org/10.1007/3-540-47887-6_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43704-8
Online ISBN: 978-3-540-47887-4
eBook Packages: Springer Book Archive