Skip to main content
Log in

Visual analytics for the clustering capability of data

  • Research Paper
  • Special Focus
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Clustering analysis is an unsupervised method to find hidden structures in datasets and has been widely used in various fields. However, it is always difficult for users to understand, evaluate, and explain the clustering results in the spaces with dimension greater than three. Although high-dimensional visualization of clustering technology can express clustering results well, it still has significant limitations. In this paper, a visualization cluster analysis method based on the minimum distance spectrum (MinDS) is proposed, aimed at reducing the problems of clustering multidimensional datasets. First, the concept of MinDS is defined based on the distance between high-dimensional data. MinDS can map any dataset from high-dimensional space to a lower dimension to determine whether the data set is separable. Next, a clustering method which can automatically determine the number of categories is designed based on MinDS. This method is not only able to cluster a dataset with clear boundaries, but can also cluster the dataset with fuzzy boundaries through the edge corrosion strategy based on the energy of each data point. In addition, strategies for removing noise and identifying outliers are designed to clean datasets according to the characteristics of MinDS. The experimental results presented validate the feasibility and effectiveness of the proposed schemes and show that the proposed approach is simple, stable, and efficient, and can achieve multidimensional visualization cluster analysis of complex datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Han J, Kamber M. Data Mining: Concepts and Techniques. Morgan Kauffman, 2011

    Google Scholar 

  2. Yue S H, Wang J S, Tao G, et al. An unsupervised grid-based approach for clustering analysis. Sci China Inf Sci, 2010, 53: 1345–1357

    Article  Google Scholar 

  3. Elmqvist N. Hierarchical aggregation for information visualization: overview, techniques and design guidelines. IEEE Trans Vis Comput Graph, 2010, 16: 439–454

    Article  Google Scholar 

  4. Cui W W, Zhou H, Qu H M, et al. Geometry-based edge clustering for graph visualization. IEEE Trans Vis Comput Graph, 2008, 14: 1277–1284

    Article  Google Scholar 

  5. Tasdemir K. Exploiting data topology in visualization and clustering of self-organizing maps. IEEE Trans Neural Netw, 2009, 20: 549–562

    Article  Google Scholar 

  6. Gupta G. Automated hierarchical density shaving: a robust automated clustering and visualization framework for large biological data sets. IEEE-ACM Trans Comput Biol Bioinform, 2010, 7: 223–237

    Article  Google Scholar 

  7. Linsen L. Surface extraction from multi-field particle volume data using multidimensional cluster visualization. IEEE Trans Vis Comput Graph, 2008, 14: 1483–1490

    Article  Google Scholar 

  8. Somerville J, Stuart L, Sernagor E. iRaster: A novel information visualization tool to explore spatiotemporal patterns in multiple spike trains. J Neurosci Methods, 2010, 194: 158–171

    Article  Google Scholar 

  9. Jolliffe I. Principal Component Analysis. Berlin: Springer, 2005

    Google Scholar 

  10. Fukunaga K. Introduction to Statistical Pattern Recognition. New York: Academic Press, 1990

    MATH  Google Scholar 

  11. Kruskal J B, Wish M. Multidimensional Scaling (Quantitative Applications in the Social Sciences). California: SAGE Publications, 1978

    Google Scholar 

  12. Cvek U, Trutschl M, Kilgore P C, et al. Multidimensional Visualization Techniques for Microarray Data. In: The 15th International Conference on Information Visualization, London, 2011. 241–246

  13. Kohonen T. Self-Organizing Maps. Berlin: Springer-Verlag, 2001

    Book  MATH  Google Scholar 

  14. Choo J, Bohn S, Park H. Two-stage framework for visualization of clustered high dimensional data. In: Proceedings of IEEE Symposium on Visual Analytics Science and Technology, Atlantic City, 2009. 67–74

  15. Chen Y, Wang L, Dong M, et al. Exemplar-based visualization of large document corpus. IEEE Trans Vis Comput Graph, 2009, 15: 1161–1168

    Article  Google Scholar 

  16. Daniels J, Anderson E W, Nonato L G, et al. Interactive vector field feature identification. IEEE Trans Vis Comput Graph, 2010, 16: 1560–1568

    Article  Google Scholar 

  17. Paulovich F V, Eler D M, Poco J, et al. Piece wise laplacian-based projection for Interactive data exploration and organization. In: Proceedings of the 13th Eurographics/IEEE-VGTC conference on Visualization. Switzerland: Eurographics Association Aire-la-Ville, 2011. 1091–1100

    Google Scholar 

  18. Paulovich F V, Silva C T, Nonato L G. Two-phase mapping for projecting massive data sets. IEEE Trans Vis Comput Graph, 2010, 16: 1281–1290

    Article  Google Scholar 

  19. Agrawal R, Gehrke J, Gunopulos D, et al. Automatic subspace clustering of high dimensional data. Data Min Knowl Discov, 2005, 11: 5–33

    Article  MathSciNet  Google Scholar 

  20. Aggarwal C C, Wolf J L, Yu P S, et al. Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data. New York: ACM, 1999. 61–72

    Chapter  Google Scholar 

  21. LeBlanc I, Ward M O, Wittels N. Exploring n-dimensional databases. In: Proceedings of Visualization’90, San Francisco, 1990. 230–237

  22. Shneiderman B. Tree visualization with treemaps: A 2D space-filling approach. ACM Trans Graph, 1992, 11: 92–99

    Article  MATH  Google Scholar 

  23. Beshers C, Feiner S K. Visualizing n-dimensional virtual worlds with n-vision. ACM SIGGRAPH Comput Graph, 1990, 24: 37–38

    Article  Google Scholar 

  24. Tufte E R. The Visual Display of Quantitative Information. Cheshire: Graphics Press, 1983

    Google Scholar 

  25. Chernoff H. The use of faces to represent points in k-dimensional space graphically. J Am Stat Assoc, 1973, 68: 361–368

    Article  Google Scholar 

  26. Pickett R M, Grinstem G G. Iconographic displays for visualizing multidimensional data. In: Proceedings of IEEE Conference on Systems, Man and Cybernetzcs. Piscataway: IEEE Press, 1988. 514–519

    Google Scholar 

  27. Pickett R M. Visual analyses of texture in the detection and recognition of objects. In: Lipkin B C, Rosenfeld A, eds. Picture Processing and Psychopictorics. New York: Academic Press, 1970. 289–308

    Google Scholar 

  28. Grinstein G, Sieg J C, Smith S, et al. Visualization for knowledge discovery. Technical Report, Computer Science Department, University of Massachusetts, Lowell, 1991

    Google Scholar 

  29. Keim D, Kriegel H. VisDB: Database exploration using multidimensional visualization. IEEE Comput Graph Appl, 2002, 14: 40–49

    Article  Google Scholar 

  30. Yang J, Hubball D, Ward M, et al. Value and relation display: Interactive visual exploration of large data sets with hundreds of dimensions. IEEE Trans Vis Comput Graph, 2007, 13: 494–507

    Article  Google Scholar 

  31. Nan C, David G, Sun J M, et al. DICON: Interactive visual analysis of multidimensional clusters. IEEE Trans Vis Comput Graph, 2011, 17: 2581–2590

    Article  Google Scholar 

  32. UCI Machine Learning Repository. Available: http://archive.ics.uci.edu/ml/

  33. Lu Z M, Zhang Q. Clustering by data competition. Sci China Inf Sci, 2013, 56: 012105

    Article  Google Scholar 

  34. Witten L H, Frank E, Hall M A. Data Ming: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2011

    Google Scholar 

  35. Jiang D X, Tang C, Zhang A D. Clustering analysis for gene expression data: A survey. IEEE Trans Knowl Data Eng, 2004, 16: 1370–1386

    Article  Google Scholar 

  36. Dudoit S, Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol, 2002, 3: 1–21

    Article  Google Scholar 

  37. Fowlkes E B, Mallows C L. A method for comparing two hierarchical clusterings. J Am Stat Assoc, 1983, 78: 553–569

    Article  MATH  Google Scholar 

  38. Yue S H, Wei M M, Wang J S. A general grid-clustering approach. Pattern Recognit Lett, 2008, 29: 1372–1384

    Article  Google Scholar 

  39. Frey B J, Dueck D. Clustering by passing message between data points. Science, 2007, 315: 972–976

    Article  MathSciNet  MATH  Google Scholar 

  40. MacQueen J B. Some methods for classification and analysis of multivariate observations. In: the 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, 1967. 281–297

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to ZhiMao Lu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, Z., Liu, C., Zhang, Q. et al. Visual analytics for the clustering capability of data. Sci. China Inf. Sci. 56, 1–14 (2013). https://doi.org/10.1007/s11432-013-4832-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-013-4832-7

Keywords

Navigation