Visualization of transformed multivariate data sets with autoassociative neural networks

https://doi.org/10.1016/S0167-8655(98)00054-3Get rights and content

Abstract

Artificial neural networks have recently gained prominence as powerful tools for the projection of high-dimensional data, where fast interactive mapping of multi-dimensional data onto 2D or 3D maps with as little distortion as possible is required. These methods typically generate static maps of the data, based on some optimization criterion. A new strategy based on the transformation of the data prior to use of autoassociative neural networks is therefore proposed and it is shown that this strategy allows more flexible visualization of the data than is possible with either Kohonen or hidden target backpropagation (Sammon) neural networks, in that various perspectives of the multi-dimensional space can be explored by dynamically mapping the data with respect to user-defined vantage points in the multi-dimensional space.

Introduction

Data mining refers to various techniques for the exploration of process data via an intensive search for information in order to uncover new data or insights into hidden process trends. Although the technologies available for data mining include casebased reasoning, fractal geometry, abductive network modelling, etc., one of the best known ways to mine data is by using artificial neural networks (Lewinson, 1994). However, most of the proposed methods, such as Kohonen self-organizing feature maps (Kohonen, 1990), Sammon neural networks (Jain and Mao, 1992; Tattersall and Limb, 1994) and principal component neural networks (Mao and Jain, 1995) generate static displays of the data based on some optimization criterion, while it is recognized that future systems would probably be based on rapid dynamic display of the data.

In this letter a novel strategy based on the transformation of multivariate data prior to use of autoassociative backpropagation neural networks to visualize and explore the data is proposed. In contrast to other neural network methods, this new strategy is more flexible in that it allows viewing of the data from different perspectives, as defined by the user.

Section snippets

Visualization of data with autoassociative neural networks

Neural networks are data driven computing devices which have been especially successful as far as the interpretation of patterns, data and cluster analysis and ill-defined process trends are concerned (Maren et al., 1990). Neural network (and other) techniques aimed at the projection and visualization of data attempt to characterize the structure or distribution of the data to be analyzed and to project these data to a lower-dimensional space in order to facilitate analysis and interpretation.

Example 1. Mapping of bitetrahedrally arranged three-dimensional clusters

In order to gain a better understanding of the use of autoassociative neural networks for the projection of multivariate data, a set of four 3-dimensional clusters (A, B, C and D) which are arranged along the vertices of two tetrahedra joined at their bases and with apices pointing in opposite directions are considered. Clusters A, B and C are roughly spherical with equal radii of approximately 0.25 and uniformly distributed along the three respective vertices (denoted v1, v2 and v3) of the

Example 2. Mapping of the features of three iris species

The second example concerns a set of data characterizing three species of iris flowers, viz. Iris setosa, Iris versicolor and Iris virginica, originally described by Anderson (1939) and explored extensively by many investigators since. A total of 150 observations was available (50 for each class). Each individual specimen is characterized by four attributes, namely sepal length (x1), sepal width (x2), petal length (x3) and petal width (x4). The data were mapped by means of a Kohonen

Example 3. Mapping of multi-dimensional Gaussian clusters

The third example consists of data embedded in a 10-dimensional space, namely two standard normally distributed clusters with equal variances of unity 1 = 2 = (1,1,1,…,1)T and means μ1 = μ2 = (1,1,1, …,1)T, with 500 patterns per class. The same data set was previously investigated by Mao and Jain (1995).

As before, the data were projected to two dimensions by use of a Kohonen, Sammon and autoassociative neural network, with and without transformation of the data in the latter case. The Kohonen neural

Discussion of results and conclusions

In the visualization and exploration of multi-dimensional data manifolds, it is important to ascertain the sizes, shapes and boundaries of clusters, outliers, etc. Although recently proposed algorithms provide a powerful means to visualize multivariate data sets, it is usually not possible to view the data from different perspectives. In the procedure described above, the data are first transformed, prior to mapping. The projection is subsequently conducted via optimization of both the

References (13)

  • M.A. Kramer

    Associative neural networks

    Comput. Chem. Engrg.

    (1992)
  • E. Anderson

    The irises of the Grapé peninsula

    Bull. Amer. Iris Soc.

    (1939)
  • G. Biswas et al.

    Evaluation of projection algorithms

    IEEE Trans. Pattern Analysis Machine Intell.

    (1981)
  • G.E. Hinton

    Connectionist learning procedures

    Artif. Intell.

    (1989)
  • A.K. Jain et al.

    Artificial neural network for non-linear projection of multivariate data

  • T. Kohonen

    The self-organizing map

    Proc. IEEE

    (1990)
There are more references available in the full text version of this article.

Cited by (0)

View full text