Abstract
Clustering analysis is an unsupervised method to find hidden structures in datasets and has been widely used in various fields. However, it is always difficult for users to understand, evaluate, and explain the clustering results in the spaces with dimension greater than three. Although high-dimensional visualization of clustering technology can express clustering results well, it still has significant limitations. In this paper, a visualization cluster analysis method based on the minimum distance spectrum (MinDS) is proposed, aimed at reducing the problems of clustering multidimensional datasets. First, the concept of MinDS is defined based on the distance between high-dimensional data. MinDS can map any dataset from high-dimensional space to a lower dimension to determine whether the data set is separable. Next, a clustering method which can automatically determine the number of categories is designed based on MinDS. This method is not only able to cluster a dataset with clear boundaries, but can also cluster the dataset with fuzzy boundaries through the edge corrosion strategy based on the energy of each data point. In addition, strategies for removing noise and identifying outliers are designed to clean datasets according to the characteristics of MinDS. The experimental results presented validate the feasibility and effectiveness of the proposed schemes and show that the proposed approach is simple, stable, and efficient, and can achieve multidimensional visualization cluster analysis of complex datasets.
Similar content being viewed by others
References
Han J, Kamber M. Data Mining: Concepts and Techniques. Morgan Kauffman, 2011
Yue S H, Wang J S, Tao G, et al. An unsupervised grid-based approach for clustering analysis. Sci China Inf Sci, 2010, 53: 1345–1357
Elmqvist N. Hierarchical aggregation for information visualization: overview, techniques and design guidelines. IEEE Trans Vis Comput Graph, 2010, 16: 439–454
Cui W W, Zhou H, Qu H M, et al. Geometry-based edge clustering for graph visualization. IEEE Trans Vis Comput Graph, 2008, 14: 1277–1284
Tasdemir K. Exploiting data topology in visualization and clustering of self-organizing maps. IEEE Trans Neural Netw, 2009, 20: 549–562
Gupta G. Automated hierarchical density shaving: a robust automated clustering and visualization framework for large biological data sets. IEEE-ACM Trans Comput Biol Bioinform, 2010, 7: 223–237
Linsen L. Surface extraction from multi-field particle volume data using multidimensional cluster visualization. IEEE Trans Vis Comput Graph, 2008, 14: 1483–1490
Somerville J, Stuart L, Sernagor E. iRaster: A novel information visualization tool to explore spatiotemporal patterns in multiple spike trains. J Neurosci Methods, 2010, 194: 158–171
Jolliffe I. Principal Component Analysis. Berlin: Springer, 2005
Fukunaga K. Introduction to Statistical Pattern Recognition. New York: Academic Press, 1990
Kruskal J B, Wish M. Multidimensional Scaling (Quantitative Applications in the Social Sciences). California: SAGE Publications, 1978
Cvek U, Trutschl M, Kilgore P C, et al. Multidimensional Visualization Techniques for Microarray Data. In: The 15th International Conference on Information Visualization, London, 2011. 241–246
Kohonen T. Self-Organizing Maps. Berlin: Springer-Verlag, 2001
Choo J, Bohn S, Park H. Two-stage framework for visualization of clustered high dimensional data. In: Proceedings of IEEE Symposium on Visual Analytics Science and Technology, Atlantic City, 2009. 67–74
Chen Y, Wang L, Dong M, et al. Exemplar-based visualization of large document corpus. IEEE Trans Vis Comput Graph, 2009, 15: 1161–1168
Daniels J, Anderson E W, Nonato L G, et al. Interactive vector field feature identification. IEEE Trans Vis Comput Graph, 2010, 16: 1560–1568
Paulovich F V, Eler D M, Poco J, et al. Piece wise laplacian-based projection for Interactive data exploration and organization. In: Proceedings of the 13th Eurographics/IEEE-VGTC conference on Visualization. Switzerland: Eurographics Association Aire-la-Ville, 2011. 1091–1100
Paulovich F V, Silva C T, Nonato L G. Two-phase mapping for projecting massive data sets. IEEE Trans Vis Comput Graph, 2010, 16: 1281–1290
Agrawal R, Gehrke J, Gunopulos D, et al. Automatic subspace clustering of high dimensional data. Data Min Knowl Discov, 2005, 11: 5–33
Aggarwal C C, Wolf J L, Yu P S, et al. Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data. New York: ACM, 1999. 61–72
LeBlanc I, Ward M O, Wittels N. Exploring n-dimensional databases. In: Proceedings of Visualization’90, San Francisco, 1990. 230–237
Shneiderman B. Tree visualization with treemaps: A 2D space-filling approach. ACM Trans Graph, 1992, 11: 92–99
Beshers C, Feiner S K. Visualizing n-dimensional virtual worlds with n-vision. ACM SIGGRAPH Comput Graph, 1990, 24: 37–38
Tufte E R. The Visual Display of Quantitative Information. Cheshire: Graphics Press, 1983
Chernoff H. The use of faces to represent points in k-dimensional space graphically. J Am Stat Assoc, 1973, 68: 361–368
Pickett R M, Grinstem G G. Iconographic displays for visualizing multidimensional data. In: Proceedings of IEEE Conference on Systems, Man and Cybernetzcs. Piscataway: IEEE Press, 1988. 514–519
Pickett R M. Visual analyses of texture in the detection and recognition of objects. In: Lipkin B C, Rosenfeld A, eds. Picture Processing and Psychopictorics. New York: Academic Press, 1970. 289–308
Grinstein G, Sieg J C, Smith S, et al. Visualization for knowledge discovery. Technical Report, Computer Science Department, University of Massachusetts, Lowell, 1991
Keim D, Kriegel H. VisDB: Database exploration using multidimensional visualization. IEEE Comput Graph Appl, 2002, 14: 40–49
Yang J, Hubball D, Ward M, et al. Value and relation display: Interactive visual exploration of large data sets with hundreds of dimensions. IEEE Trans Vis Comput Graph, 2007, 13: 494–507
Nan C, David G, Sun J M, et al. DICON: Interactive visual analysis of multidimensional clusters. IEEE Trans Vis Comput Graph, 2011, 17: 2581–2590
UCI Machine Learning Repository. Available: http://archive.ics.uci.edu/ml/
Lu Z M, Zhang Q. Clustering by data competition. Sci China Inf Sci, 2013, 56: 012105
Witten L H, Frank E, Hall M A. Data Ming: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2011
Jiang D X, Tang C, Zhang A D. Clustering analysis for gene expression data: A survey. IEEE Trans Knowl Data Eng, 2004, 16: 1370–1386
Dudoit S, Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol, 2002, 3: 1–21
Fowlkes E B, Mallows C L. A method for comparing two hierarchical clusterings. J Am Stat Assoc, 1983, 78: 553–569
Yue S H, Wei M M, Wang J S. A general grid-clustering approach. Pattern Recognit Lett, 2008, 29: 1372–1384
Frey B J, Dueck D. Clustering by passing message between data points. Science, 2007, 315: 972–976
MacQueen J B. Some methods for classification and analysis of multivariate observations. In: the 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, 1967. 281–297
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lu, Z., Liu, C., Zhang, Q. et al. Visual analytics for the clustering capability of data. Sci. China Inf. Sci. 56, 1–14 (2013). https://doi.org/10.1007/s11432-013-4832-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-013-4832-7