Visual analytics for the clustering capability of data

Lu, ZhiMao; Liu, Chen; Zhang, Qi; Zhang, ChunXiang; Fan, DongMei; Yang, Peng

doi:10.1007/s11432-013-4832-7

Visual analytics for the clustering capability of data

Research Paper
Special Focus
Published: 24 May 2013

Volume 56, pages 1–14, (2013)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

ZhiMao Lu^1,2,
Chen Liu¹,
Qi Zhang¹,
ChunXiang Zhang³,
DongMei Fan¹ &
…
Peng Yang¹

196 Accesses
3 Citations
Explore all metrics

Abstract

Clustering analysis is an unsupervised method to find hidden structures in datasets and has been widely used in various fields. However, it is always difficult for users to understand, evaluate, and explain the clustering results in the spaces with dimension greater than three. Although high-dimensional visualization of clustering technology can express clustering results well, it still has significant limitations. In this paper, a visualization cluster analysis method based on the minimum distance spectrum (MinDS) is proposed, aimed at reducing the problems of clustering multidimensional datasets. First, the concept of MinDS is defined based on the distance between high-dimensional data. MinDS can map any dataset from high-dimensional space to a lower dimension to determine whether the data set is separable. Next, a clustering method which can automatically determine the number of categories is designed based on MinDS. This method is not only able to cluster a dataset with clear boundaries, but can also cluster the dataset with fuzzy boundaries through the edge corrosion strategy based on the energy of each data point. In addition, strategies for removing noise and identifying outliers are designed to clean datasets according to the characteristics of MinDS. The experimental results presented validate the feasibility and effectiveness of the proposed schemes and show that the proposed approach is simple, stable, and efficient, and can achieve multidimensional visualization cluster analysis of complex datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Han J, Kamber M. Data Mining: Concepts and Techniques. Morgan Kauffman, 2011
Google Scholar
Yue S H, Wang J S, Tao G, et al. An unsupervised grid-based approach for clustering analysis. Sci China Inf Sci, 2010, 53: 1345–1357
Article Google Scholar
Elmqvist N. Hierarchical aggregation for information visualization: overview, techniques and design guidelines. IEEE Trans Vis Comput Graph, 2010, 16: 439–454
Article Google Scholar
Cui W W, Zhou H, Qu H M, et al. Geometry-based edge clustering for graph visualization. IEEE Trans Vis Comput Graph, 2008, 14: 1277–1284
Article Google Scholar
Tasdemir K. Exploiting data topology in visualization and clustering of self-organizing maps. IEEE Trans Neural Netw, 2009, 20: 549–562
Article Google Scholar
Gupta G. Automated hierarchical density shaving: a robust automated clustering and visualization framework for large biological data sets. IEEE-ACM Trans Comput Biol Bioinform, 2010, 7: 223–237
Article Google Scholar
Linsen L. Surface extraction from multi-field particle volume data using multidimensional cluster visualization. IEEE Trans Vis Comput Graph, 2008, 14: 1483–1490
Article Google Scholar
Somerville J, Stuart L, Sernagor E. iRaster: A novel information visualization tool to explore spatiotemporal patterns in multiple spike trains. J Neurosci Methods, 2010, 194: 158–171
Article Google Scholar
Jolliffe I. Principal Component Analysis. Berlin: Springer, 2005
Google Scholar
Fukunaga K. Introduction to Statistical Pattern Recognition. New York: Academic Press, 1990
MATH Google Scholar
Kruskal J B, Wish M. Multidimensional Scaling (Quantitative Applications in the Social Sciences). California: SAGE Publications, 1978
Google Scholar
Cvek U, Trutschl M, Kilgore P C, et al. Multidimensional Visualization Techniques for Microarray Data. In: The 15th International Conference on Information Visualization, London, 2011. 241–246
Kohonen T. Self-Organizing Maps. Berlin: Springer-Verlag, 2001
Book MATH Google Scholar
Choo J, Bohn S, Park H. Two-stage framework for visualization of clustered high dimensional data. In: Proceedings of IEEE Symposium on Visual Analytics Science and Technology, Atlantic City, 2009. 67–74
Chen Y, Wang L, Dong M, et al. Exemplar-based visualization of large document corpus. IEEE Trans Vis Comput Graph, 2009, 15: 1161–1168
Article Google Scholar
Daniels J, Anderson E W, Nonato L G, et al. Interactive vector field feature identification. IEEE Trans Vis Comput Graph, 2010, 16: 1560–1568
Article Google Scholar
Paulovich F V, Eler D M, Poco J, et al. Piece wise laplacian-based projection for Interactive data exploration and organization. In: Proceedings of the 13th Eurographics/IEEE-VGTC conference on Visualization. Switzerland: Eurographics Association Aire-la-Ville, 2011. 1091–1100
Google Scholar
Paulovich F V, Silva C T, Nonato L G. Two-phase mapping for projecting massive data sets. IEEE Trans Vis Comput Graph, 2010, 16: 1281–1290
Article Google Scholar
Agrawal R, Gehrke J, Gunopulos D, et al. Automatic subspace clustering of high dimensional data. Data Min Knowl Discov, 2005, 11: 5–33
Article MathSciNet Google Scholar
Aggarwal C C, Wolf J L, Yu P S, et al. Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data. New York: ACM, 1999. 61–72
Chapter Google Scholar
LeBlanc I, Ward M O, Wittels N. Exploring n-dimensional databases. In: Proceedings of Visualization’90, San Francisco, 1990. 230–237
Shneiderman B. Tree visualization with treemaps: A 2D space-filling approach. ACM Trans Graph, 1992, 11: 92–99
Article MATH Google Scholar
Beshers C, Feiner S K. Visualizing n-dimensional virtual worlds with n-vision. ACM SIGGRAPH Comput Graph, 1990, 24: 37–38
Article Google Scholar
Tufte E R. The Visual Display of Quantitative Information. Cheshire: Graphics Press, 1983
Google Scholar
Chernoff H. The use of faces to represent points in k-dimensional space graphically. J Am Stat Assoc, 1973, 68: 361–368
Article Google Scholar
Pickett R M, Grinstem G G. Iconographic displays for visualizing multidimensional data. In: Proceedings of IEEE Conference on Systems, Man and Cybernetzcs. Piscataway: IEEE Press, 1988. 514–519
Google Scholar
Pickett R M. Visual analyses of texture in the detection and recognition of objects. In: Lipkin B C, Rosenfeld A, eds. Picture Processing and Psychopictorics. New York: Academic Press, 1970. 289–308
Google Scholar
Grinstein G, Sieg J C, Smith S, et al. Visualization for knowledge discovery. Technical Report, Computer Science Department, University of Massachusetts, Lowell, 1991
Google Scholar
Keim D, Kriegel H. VisDB: Database exploration using multidimensional visualization. IEEE Comput Graph Appl, 2002, 14: 40–49
Article Google Scholar
Yang J, Hubball D, Ward M, et al. Value and relation display: Interactive visual exploration of large data sets with hundreds of dimensions. IEEE Trans Vis Comput Graph, 2007, 13: 494–507
Article Google Scholar
Nan C, David G, Sun J M, et al. DICON: Interactive visual analysis of multidimensional clusters. IEEE Trans Vis Comput Graph, 2011, 17: 2581–2590
Article Google Scholar
UCI Machine Learning Repository. Available: http://archive.ics.uci.edu/ml/
Lu Z M, Zhang Q. Clustering by data competition. Sci China Inf Sci, 2013, 56: 012105
Article Google Scholar
Witten L H, Frank E, Hall M A. Data Ming: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2011
Google Scholar
Jiang D X, Tang C, Zhang A D. Clustering analysis for gene expression data: A survey. IEEE Trans Knowl Data Eng, 2004, 16: 1370–1386
Article Google Scholar
Dudoit S, Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol, 2002, 3: 1–21
Article Google Scholar
Fowlkes E B, Mallows C L. A method for comparing two hierarchical clusterings. J Am Stat Assoc, 1983, 78: 553–569
Article MATH Google Scholar
Yue S H, Wei M M, Wang J S. A general grid-clustering approach. Pattern Recognit Lett, 2008, 29: 1372–1384
Article Google Scholar
Frey B J, Dueck D. Clustering by passing message between data points. Science, 2007, 315: 972–976
Article MathSciNet MATH Google Scholar
MacQueen J B. Some methods for classification and analysis of multivariate observations. In: the 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, 1967. 281–297

Download references

Author information

Authors and Affiliations

Pattern Recognition and Natural Computation Laboratory, Harbin Engineering University, Harbin, 150001, China
ZhiMao Lu, Chen Liu, Qi Zhang, DongMei Fan & Peng Yang
School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China
ZhiMao Lu
School of Software, Harbin University of Science and Technology, Harbin, 150080, China
ChunXiang Zhang

Authors

ZhiMao Lu
View author publications
You can also search for this author in PubMed Google Scholar
Chen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
ChunXiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
DongMei Fan
View author publications
You can also search for this author in PubMed Google Scholar
Peng Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to ZhiMao Lu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, Z., Liu, C., Zhang, Q. et al. Visual analytics for the clustering capability of data. Sci. China Inf. Sci. 56, 1–14 (2013). https://doi.org/10.1007/s11432-013-4832-7

Download citation

Received: 18 January 2013
Accepted: 02 February 2013
Published: 24 May 2013
Issue Date: May 2013
DOI: https://doi.org/10.1007/s11432-013-4832-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Visual analytics for the clustering capability of data

Abstract

Access this article

Similar content being viewed by others

Visual Approach to Boundary Detection of Clusters Projected in 2D Space

A New Scheme to Visualize Clusters Model in Data Mining

Conventional displays of structures in data compared with interactive projection-based clustering (IPBC)

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Visual analytics for the clustering capability of data

Abstract

Access this article

Similar content being viewed by others

Visual Approach to Boundary Detection of Clusters Projected in 2D Space

A New Scheme to Visualize Clusters Model in Data Mining

Conventional displays of structures in data compared with interactive projection-based clustering (IPBC)

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation