Cluster-aware arrangement of the parallel coordinate plots
Introduction
Multi-dimensional data are common in various application domains. It is important in understanding the relationships among multiple dimensions when the specialist invests the domain-specific phenomena. However, multi-dimensional data exceeds human comprehension, and so, effective tools are needed to boost these capabilities. Several visualization methods for multi-dimensional data have been proposed in recent years, such as Multi-dimensional scaling (MDS), scatterplot matrices (SPLOM) and parallel coordinate analysis (PCP). MDS can produce a good overview of the multi-dimensional data while maintaining the level of similarity of individual cases of a dataset in the object space. Its main drawback is that the original information is lost. SPLOM provides a simple, familiar, and clear view of the data distributions. However, due to their distributed 2D tiled layout, it is difficult to discern relationships that extend across and involve more than two variables. Finally, parallel coordinate plots are constructed by placing axes in parallel with respect to the embedding of 2D Cartesian coordinate system in the plane. Parallel coordinates show the raw multi-dimensional data points as polylines spanning across a set of parallel vertical axes, and each polyline represents a sample and each axis is a feature. It can convey all data dimensions in a single display, and also preserve the original data dimensions.
Parallel coordinates are good for presenting overviews of the whole, raw data set, as well as for showing relationships among the dimensions. Nevertheless, the order of the parallel vertical axes can affect the patterns revealed by a parallel coordinates plot, because only the relationships between two adjacent axes are visualized. A good dimension ordering can express interrelationships clearly, while a poor dimension orderings will hide the relationships of interest. Dimension resorting is a straight-forward means to improve the expression of the parallel coordinate plots. However, N! possible axis orderings must be tried to achieve a preferred visualization result in the worst case. Hence, what is sorely needed is an effective method that can guide users in the selection of promising dimension orderings.
Dimension ordering in parallel coordinates has been a traditional researching topic for quite a long time. When the number of dimensions is relatively small, we can use enumerate to visualize multiple paths. But with the increasement of N, the above method will become impractical, either due to the computational complexity or the limited screen real-estate. Zhang et al. propose to use pearson’s correlation coefficient to determine the axis alignment [1]. Wu et al. employed a correlation based dimension sorting algorithm to generate a recommended axis order automatically. After that, users can easily drag any axis to make adjustment based on different applications [2]. Qu et al. invent and employ weighted complete graphs to give an overview of correlations of all dimensions and help users in determining the axis order in the parallel coordinates [3]. However, they all focus on the data itself, while the category distributions across dimensions are neglected during dimension ordering.
To achieve a semantic dimension ordering, we propose a cluster-aware arrangement method of the parallel coordinate plots and design a visualization framework for the multi-dimensional data exploration for exploring the relationships between dimensions. In this paper, we first use a hierarchical clustering scheme to identify the categories of interest across different dimensions, which are visualized by a group of icicle views. We also design a matrix map to show the correlation between different dimensions, the correlation is calculated by a cluster-aware method. Finally, we employ the MDS method to transform the dimensions and use of TSP to achieve dimension ordering of the parallel coordinate. The major contributions of this work can be summarized as follows:
- •
A cluster-aware arrangement method of the parallel coordinate plots which measures the relationships between different attribute axes based on the category distributions.
- •
A visualization analysis system of high-dimensional data. The system integrates correlation calculation, correlation display, correlation analysis and interaction.
The remainder of this paper is organized as follows. We present related work in section 2. In Section 3, we give a system overview. Section 4 provides specific visualization techniques. Section 5 has an evaluation. Section 6 offers conclusions and future work.
Section snippets
Related work
This section provides an overview of related work. We focus on two relevant works: parallel coordinates and dimension ordering. In this section, we discuss two categories of relevant literature including parallel coordinate plots and dimension ordering.
System overview
As shown in Fig. 1, our visualization system contains four coordinated views: a parallel coordinates view, a icicle view, an MDS view and a matrix map view. A suite of intuitive interaction tools are also integrated to enable the users explore regions of interest interactively and effectively.
Fig. 2 shows the overall pipeline of our system. We use a parallel coordinate plot to display multi-dimensional data. A hierarchical clustering scheme is employed to identify the categories of interest
Visualization techniques
The cluster-aware correlation evaluation and the visual designs for the arrangement of parallel coordinates are detailed in this section.
Evaluations
In this section, we conduct 3 case studies by using different datasets to evaluate the feasibility and applicability of our visual exploration system. We implement the dimension ordering approaches for parallel coordinate plots based on three different metrics, namely, the random, the item-aware correlation (the traditional), and the cluster-aware correlation (ours). Accordingly, three different dimension ordering schemes are presented in our current system. We also take a pilot user study to
Conclusion and future works
We propose a cluster-aware arrangement of parallel coordinate plots and design an interactive visualization system. The system provides users with a variety of interactive tools to control the dimension ordering and observe the correlation between the dimensions. The system can be used for data exploration, and also can be used as an interactive demonstration platform. Experiments and user studies show that category-based axis alignment method can help users better understand multidimensional
Acknowledgments
This work was supported by NFS of China Project No. 61303133 and No. 61303134, the Zhejiang Provincal Natural Science Foundation No. LY18F020024, the National Statistical Scientific Research Project No. 2015LD03, the Zhejiang Science & Technology Plan of China No. 2014C31057 and the First Class Discipline of Zhejiang-A (Zhejiang University of Finance and Economics-Statistics).
References (29)
- et al.
A network-based interface for the exploration of high-dimensional data spaces
Pacific Visualization Symposium
(2012) - et al.
Telcovis: visual exploration of co-occurrence in urban human mobility based on telco data.
IEEE Trans. Visual. Comput. Graph.
(2015) - et al.
Visual analysis of the air pollution problem in hong kong.
IEEE Trans. Visual. Comput. Graph.
(2007) Printer graphics for clustering
J. Stat. Comput. Simul.
(1975)- et al.
Parallel coordinates: a tool for visualizing multi-dimensional geometry
Proceedings of the First IEEE Conference on Visualization: Visualization ‘90
(1990) - et al.
State of the art of parallel coordinates
Star coordinates: a multi-dimensional visualization technique with uniform treatment of dimensions
In Proceedings of the IEEE Information Visualization Symposium, Late Breaking Hot Topics
(2000)- et al.
Recursive pattern: a technique for visualizing very large amounts of data
Proceedings of the 6th Conference on Visualization ’95
(1995) - et al.
Pixel bar charts: a visualization technique for very large multi-attribute data sets
Inf. Visual.
(2002) The use of faces to represent points in k-dimensional space graphically
J. Am. Stat. Assoc.
(1973)
Graphical Methods for Data Analysis
Self-organized formation of topologically correct feature maps
Biol. Cybern.
Principal component analysis
Springer Berlin
Multidimensional scaling: I. Theory and method
Psychometrika
Cited by (10)
Visual Analytics of Multidimensional Oral Health Surveys: Data Mining Study
2023, JMIR Medical InformaticsVisual analytics of spatio-temporal urban mobility patterns via network representation learning
2023, Multimedia Tools and ApplicationsTOWARDS AN OPEN SOURCE PYTHON LIBRARY FOR AUTOMATED EXPLORATORY SPATIAL DATA ANALYSIS
2022, International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS ArchivesTesting open-source visualization tools with small-and medium-sized enterprises ecosystem data: Towards the understanding of innovation ecosystem design
2021, Journal of Design, Business and SocietyEvaluating Reordering Strategies for Cluster Identification in Parallel Coordinates
2020, Computer Graphics Forum