Visualization of transformed multivariate data sets with autoassociative neural networks
Introduction
Data mining refers to various techniques for the exploration of process data via an intensive search for information in order to uncover new data or insights into hidden process trends. Although the technologies available for data mining include casebased reasoning, fractal geometry, abductive network modelling, etc., one of the best known ways to mine data is by using artificial neural networks (Lewinson, 1994). However, most of the proposed methods, such as Kohonen self-organizing feature maps (Kohonen, 1990), Sammon neural networks (Jain and Mao, 1992; Tattersall and Limb, 1994) and principal component neural networks (Mao and Jain, 1995) generate static displays of the data based on some optimization criterion, while it is recognized that future systems would probably be based on rapid dynamic display of the data.
In this letter a novel strategy based on the transformation of multivariate data prior to use of autoassociative backpropagation neural networks to visualize and explore the data is proposed. In contrast to other neural network methods, this new strategy is more flexible in that it allows viewing of the data from different perspectives, as defined by the user.
Section snippets
Visualization of data with autoassociative neural networks
Neural networks are data driven computing devices which have been especially successful as far as the interpretation of patterns, data and cluster analysis and ill-defined process trends are concerned (Maren et al., 1990). Neural network (and other) techniques aimed at the projection and visualization of data attempt to characterize the structure or distribution of the data to be analyzed and to project these data to a lower-dimensional space in order to facilitate analysis and interpretation.
Example 1. Mapping of bitetrahedrally arranged three-dimensional clusters
In order to gain a better understanding of the use of autoassociative neural networks for the projection of multivariate data, a set of four 3-dimensional clusters (A, B, C and D) which are arranged along the vertices of two tetrahedra joined at their bases and with apices pointing in opposite directions are considered. Clusters A, B and C are roughly spherical with equal radii of approximately 0.25 and uniformly distributed along the three respective vertices (denoted v1, v2 and v3) of the
Example 2. Mapping of the features of three iris species
The second example concerns a set of data characterizing three species of iris flowers, viz. Iris setosa, Iris versicolor and Iris virginica, originally described by Anderson (1939) and explored extensively by many investigators since. A total of 150 observations was available (50 for each class). Each individual specimen is characterized by four attributes, namely sepal length (x1), sepal width (x2), petal length (x3) and petal width (x4). The data were mapped by means of a Kohonen
Example 3. Mapping of multi-dimensional Gaussian clusters
The third example consists of data embedded in a 10-dimensional space, namely two standard normally distributed clusters with equal variances of unity ∑1 = ∑2 = (1,1,1,…,1)T and means μ1 = −μ2 = (1,1,1, …,1)T, with 500 patterns per class. The same data set was previously investigated by Mao and Jain (1995).
As before, the data were projected to two dimensions by use of a Kohonen, Sammon and autoassociative neural network, with and without transformation of the data in the latter case. The Kohonen neural
Discussion of results and conclusions
In the visualization and exploration of multi-dimensional data manifolds, it is important to ascertain the sizes, shapes and boundaries of clusters, outliers, etc. Although recently proposed algorithms provide a powerful means to visualize multivariate data sets, it is usually not possible to view the data from different perspectives. In the procedure described above, the data are first transformed, prior to mapping. The projection is subsequently conducted via optimization of both the
References (13)
Associative neural networks
Comput. Chem. Engrg.
(1992)The irises of the Grapé peninsula
Bull. Amer. Iris Soc.
(1939)- et al.
Evaluation of projection algorithms
IEEE Trans. Pattern Analysis Machine Intell.
(1981) Connectionist learning procedures
Artif. Intell.
(1989)- et al.
Artificial neural network for non-linear projection of multivariate data
The self-organizing map
Proc. IEEE
(1990)