Cluster-aware arrangement of the parallel coordinate plots

https://doi.org/10.1016/j.jvlc.2017.10.003Get rights and content

Abstract

The dimension ordering of parallel coordinate plots has been widely studied, aiming at the insightful exploration of multi-dimensional data. However, few works focus on the category distributions across dimensions and construct an effective dimension ordering to enable the visual exploration of clusters. Therefore, we propose a cluster-aware arrangement method of the parallel coordinate plots and design a visualization framework for the multi-dimensional data exploration. Firstly, a hierarchical clustering scheme is employed to identify the categories of interest across different dimensions. Then we design a group of icicle views to present the hierarchies of dimensions, the colors of which also indicate the relationships between different categories. A cluster-aware correlation is defined to measure the relationships between different attribute axes, based on the distributions of categories. Furthermore, a matrix map is designed to present the relationships between dimensions, and the MDS method is employed to transform the dimensions into 2D coordinates, in which the correlations among the dimensions are conserved. At last, we solve the Traveling Salesman Problem (TSP) and achieve an automated dimension ordering of the parallel coordinate plots, which largely highlights the relations of categories across dimensions. A set of convenient interactions are also integrated in the visualization system, allowing users to get insights into the multi-dimensional data from various perspectives. A large number of experimental results and the credible user studies further demonstrate the usefulness of the cluster-aware arrangement of the parallel coordinate plots.

Introduction

Multi-dimensional data are common in various application domains. It is important in understanding the relationships among multiple dimensions when the specialist invests the domain-specific phenomena. However, multi-dimensional data exceeds human comprehension, and so, effective tools are needed to boost these capabilities. Several visualization methods for multi-dimensional data have been proposed in recent years, such as Multi-dimensional scaling (MDS), scatterplot matrices (SPLOM) and parallel coordinate analysis (PCP). MDS can produce a good overview of the multi-dimensional data while maintaining the level of similarity of individual cases of a dataset in the object space. Its main drawback is that the original information is lost. SPLOM provides a simple, familiar, and clear view of the data distributions. However, due to their distributed 2D tiled layout, it is difficult to discern relationships that extend across and involve more than two variables. Finally, parallel coordinate plots are constructed by placing axes in parallel with respect to the embedding of 2D Cartesian coordinate system in the plane. Parallel coordinates show the raw multi-dimensional data points as polylines spanning across a set of parallel vertical axes, and each polyline represents a sample and each axis is a feature. It can convey all data dimensions in a single display, and also preserve the original data dimensions.

Parallel coordinates are good for presenting overviews of the whole, raw data set, as well as for showing relationships among the dimensions. Nevertheless, the order of the parallel vertical axes can affect the patterns revealed by a parallel coordinates plot, because only the relationships between two adjacent axes are visualized. A good dimension ordering can express interrelationships clearly, while a poor dimension orderings will hide the relationships of interest. Dimension resorting is a straight-forward means to improve the expression of the parallel coordinate plots. However, N! possible axis orderings must be tried to achieve a preferred visualization result in the worst case. Hence, what is sorely needed is an effective method that can guide users in the selection of promising dimension orderings.

Dimension ordering in parallel coordinates has been a traditional researching topic for quite a long time. When the number of dimensions is relatively small, we can use enumerate to visualize multiple paths. But with the increasement of N, the above method will become impractical, either due to the computational complexity or the limited screen real-estate. Zhang et al. propose to use pearson’s correlation coefficient to determine the axis alignment [1]. Wu et al. employed a correlation based dimension sorting algorithm to generate a recommended axis order automatically. After that, users can easily drag any axis to make adjustment based on different applications [2]. Qu et al. invent and employ weighted complete graphs to give an overview of correlations of all dimensions and help users in determining the axis order in the parallel coordinates [3]. However, they all focus on the data itself, while the category distributions across dimensions are neglected during dimension ordering.

To achieve a semantic dimension ordering, we propose a cluster-aware arrangement method of the parallel coordinate plots and design a visualization framework for the multi-dimensional data exploration for exploring the relationships between dimensions. In this paper, we first use a hierarchical clustering scheme to identify the categories of interest across different dimensions, which are visualized by a group of icicle views. We also design a matrix map to show the correlation between different dimensions, the correlation is calculated by a cluster-aware method. Finally, we employ the MDS method to transform the dimensions and use of TSP to achieve dimension ordering of the parallel coordinate. The major contributions of this work can be summarized as follows:

  • A cluster-aware arrangement method of the parallel coordinate plots which measures the relationships between different attribute axes based on the category distributions.

  • A visualization analysis system of high-dimensional data. The system integrates correlation calculation, correlation display, correlation analysis and interaction.

The remainder of this paper is organized as follows. We present related work in section 2. In Section 3, we give a system overview. Section 4 provides specific visualization techniques. Section 5 has an evaluation. Section 6 offers conclusions and future work.

Section snippets

Related work

This section provides an overview of related work. We focus on two relevant works: parallel coordinates and dimension ordering. In this section, we discuss two categories of relevant literature including parallel coordinate plots and dimension ordering.

System overview

As shown in Fig. 1, our visualization system contains four coordinated views: a parallel coordinates view, a icicle view, an MDS view and a matrix map view. A suite of intuitive interaction tools are also integrated to enable the users explore regions of interest interactively and effectively.

Fig. 2 shows the overall pipeline of our system. We use a parallel coordinate plot to display multi-dimensional data. A hierarchical clustering scheme is employed to identify the categories of interest

Visualization techniques

The cluster-aware correlation evaluation and the visual designs for the arrangement of parallel coordinates are detailed in this section.

Evaluations

In this section, we conduct 3 case studies by using different datasets to evaluate the feasibility and applicability of our visual exploration system. We implement the dimension ordering approaches for parallel coordinate plots based on three different metrics, namely, the random, the item-aware correlation (the traditional), and the cluster-aware correlation (ours). Accordingly, three different dimension ordering schemes are presented in our current system. We also take a pilot user study to

Conclusion and future works

We propose a cluster-aware arrangement of parallel coordinate plots and design an interactive visualization system. The system provides users with a variety of interactive tools to control the dimension ordering and observe the correlation between the dimensions. The system can be used for data exploration, and also can be used as an interactive demonstration platform. Experiments and user studies show that category-based axis alignment method can help users better understand multidimensional

Acknowledgments

This work was supported by NFS of China Project No. 61303133 and No. 61303134, the Zhejiang Provincal Natural Science Foundation No. LY18F020024, the National Statistical Scientific Research Project No. 2015LD03, the Zhejiang Science & Technology Plan of China No. 2014C31057 and the First Class Discipline of Zhejiang-A (Zhejiang University of Finance and Economics-Statistics).

References (29)

  • Z. Zhang et al.

    A network-based interface for the exploration of high-dimensional data spaces

    Pacific Visualization Symposium

    (2012)
  • W. Wu et al.

    Telcovis: visual exploration of co-occurrence in urban human mobility based on telco data.

    IEEE Trans. Visual. Comput. Graph.

    (2015)
  • H. Qu et al.

    Visual analysis of the air pollution problem in hong kong.

    IEEE Trans. Visual. Comput. Graph.

    (2007)
  • J. Hartigan

    Printer graphics for clustering

    J. Stat. Comput. Simul.

    (1975)
  • A. Inselberg et al.

    Parallel coordinates: a tool for visualizing multi-dimensional geometry

    Proceedings of the First IEEE Conference on Visualization: Visualization ‘90

    (1990)
  • J. Heinrich et al.

    State of the art of parallel coordinates

  • E. Kandogan

    Star coordinates: a multi-dimensional visualization technique with uniform treatment of dimensions

    In Proceedings of the IEEE Information Visualization Symposium, Late Breaking Hot Topics

    (2000)
  • D.A. Keim et al.

    Recursive pattern: a technique for visualizing very large amounts of data

    Proceedings of the 6th Conference on Visualization ’95

    (1995)
  • D.A. Keim et al.

    Pixel bar charts: a visualization technique for very large multi-attribute data sets

    Inf. Visual.

    (2002)
  • H. Chernoff

    The use of faces to represent points in k-dimensional space graphically

    J. Am. Stat. Assoc.

    (1973)
  • J.M. Chambers

    Graphical Methods for Data Analysis

    (1983)
  • T. Kohonen

    Self-organized formation of topologically correct feature maps

    Biol. Cybern.

    (1982)
  • Jolliffe et al.

    Principal component analysis

    Springer Berlin

    (1986)
  • W.S. Torgerson

    Multidimensional scaling: I. Theory and method

    Psychometrika

    (1952)
  • View full text