Optimal decisions in combining the SOM with nonlinear projection methods

https://doi.org/10.1016/j.ejor.2005.05.030Get rights and content

Abstract

Visual data mining is an efficient way to involve human in search for a optimal decision. This paper focuses on the optimization of the visual presentation of multidimensional data.

A variety of methods for projection of multidimensional data on the plane have been developed. At present, a tendency of their joint use is observed. In this paper, two consequent combinations of the self-organizing map (SOM) with two other well-known nonlinear projection methods are examined theoretically and experimentally. These two methods are: Sammon’s mapping and multidimensional scaling (MDS). The investigations showed that the combinations (SOM_Sammon and SOM_MDS) have a similar efficiency. This grounds the possibility of application of the MDS with the SOM, because up to now in most researches SOM is applied together with Sammon’s mapping. The problems on the quality and accuracy of such combined visualization are discussed. Three criteria of different nature are selected for evaluation the efficiency of the combined mapping. The joint use of these criteria allows us to choose the best visualization result from some possible ones.

Several different initialization ways for nonlinear mapping are examined, and a new one is suggested. A new approach to the SOM visualization is suggested.

The obtained results allow us to make better decisions in optimizing the data visualization.

Introduction

Data points from real world are often described by an array of parameters, i.e., we deal with multidimensional data. These data points form a data set. The problem is to discover knowledge in the set of multidimensional points. There are a variety of possible data mining methods for the analysis (clustering, classification, visualization, etc.). A proper choice of method depends, e.g., on the goals of analysis, data structure, data amount, etc. Visualization is a powerful tool in data analysis. It makes easier the perception of data. The classic methods for visualization are, for example, Sammon’s mapping [33], multidimensional scaling (MDS) [3], principal components [25], and others. Representatives of the modern visualization methods are neural networks – the self-organizing map (SOM) [20], [21] and neural network-based realizations of Sammon’s mapping (SAMANN) [28]. At present, combinations of the classic methods with modern ones (in particular, with neural networks) are under the rapid development.

In this paper, we investigate the consequent combinations of the SOM with Sammon’s mapping or multidimensional scaling (MDS). The problems on the quality and accuracy of such visualization are discussed in this paper. Also, the best way of initialization for nonlinear mapping is suggested. When comparing different visualization methods, the problem arises to evaluate the quality of projection. Each method optimizes its own criterion of the quality or error. In this paper, a set of criteria that describe the projection quality is presented and applied.

This paper is organized as follows. In Section 2, a survey of basic data mapping algorithms is presented, some methods (SOM, MDS and Sammon’s mapping) are analysed in detail. In Section 3, the ways of combining the visualization methods are investigated. In Section 4, the problems of initialization of nonlinear projection methods are discussed. In Section 5, a set of criteria of the mapping quality is described. In Section 6, the results of analysis are presented. The final section summarizes the conclusions.

Section snippets

The basic data mapping algorithms

There exist a lot of methods that can be used for reducing the dimensionality of data, and, particularly, for visualizing the n-dimensional vectors X1,  , Xs  Rn, where s is the number of the vectors. A deep review of the methods is performed, e.g., by Kaski [19] and Kohonen [20], [21]. The discussion below is based mostly on these reviews. The discussion shows the place of our approaches (consequent application of the self-organizing map and Sammon’s mapping or MDS) in the general context of

Combining the visualization methods

Not only separate methods, but also their combinations are often used when the multidimensional data are projected onto the plane.

The self-organizing map provides structured information about the set of the analysed vectors: several elements (neurons) of a two-dimensional rectangular grid are activated (become winners), while the remaining elements are not activated. The activated elements of the grid may be considered as points on the plane. The location of these elements is fixed on the plane

Problems of initialization of mapping methods

The initial values of two-dimensional vectors influence a final result in nonlinear projection methods. Optimization methods, used in visualization algorithms, often find a local, but not the global, optimum of a function that characterizes the quality of projection. For this reason, location of the initial vectors is very important, i.e., different local optima are often obtained for different sets of initial vectors.

The two-dimensional vectors may be initialized in various ways. One of the

Quantitative criteria of mapping

The problem of objective comparison of the mapping results arises when the multidimensional data are visualized using various methods that optimize different criteria of the mapping quality. It is necessary to select a set of universal criteria that describe the projection quality and may be general for different methods. Three criteria of this kind are presented below.

Metric topology preserving (MTP). Given spaces Rn (∀X  Rn) and Rm (∀Y  Rm), where m < n, a map M : X  Y is called topology preserving,

Data for the analysis

Data sets of different nature were used in the experiments. The used data sets are of different dimension and have various numbers of points. It is known structure (cluster, outlier) of these data sets. This allows us to draw conclusions about visual results obtained using the combined methods.

  • 1.

    Clustered data. Ten 10-dimensional points are generated at random; in the area of each point, nine 10-dimensional points are generated by normal distribution: the total number of vectors in the data set

Conclusions

This paper focuses on the optimization of the visual presentation of multidimensional data. The paper deals with the consequent combinations of the self-organizing map with two methods of nonlinear projection of the multidimensional data on the space of lower dimensionality: Sammon’s mapping and multidimensional scaling. The research covers the analysis of the basic data mapping algorithms (the self-organizing map, multidimensional scaling, Sammon’s mapping), combining the visualization

References (36)

  • A.F. Fisher

    The use of multiple measurements in axonomic problems

    Annals of Eugenics

    (1936)
  • A. Flexer, Limitations of self-organizing maps for vector quantization and multidimensional scaling, in: M.C. Mozer,...
  • J.H. Friedman et al.

    A projection pursuit algorithm for exploratory data analysis

    IEEE Transactions on Computers

    (1974)
  • G.J. Goodhill, T. Sejnowski, Quantifying neighbourhood preservation in topographic mappings, in: Proceedings of the 3rd...
  • L. Guttman

    A general nonmetric technique for finding the smallest coordinate space for a configuration of points

    Psychometrika

    (1968)
  • J. Hartigan

    Clustering Algorithms

    (1975)
  • P. Hassinen, J. Elomaa, J. Rönkkö, J. Halme, P. Hodju, Screen shots taken from program called Nenet v1.1a, Neural...
  • T. Hastie et al.

    Principal curves

    Journal of the American Statistical Association

    (1989)
  • Cited by (18)

    • Synergies of Operations Research and Data Mining

      2010, European Journal of Operational Research
    • Pairwise elastic self-organizing maps

      2017, 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization, WSOM 2017 - Proceedings
    • Method for visual detection of similarities in medical streaming data

      2015, International Journal of Computers, Communications and Control
    View all citing articles on Scopus
    View full text