Elsevier

Neurocomputing

Volume 72, Issues 13–15, August 2009, Pages 2964-2978
Neurocomputing

RankVisu: Mapping from the neighborhood network

https://doi.org/10.1016/j.neucom.2009.04.008Get rights and content

Abstract

Most multidimensional scaling methods focus on the preservation of dissimilarities to map high dimensional items in a low-dimensional space. However, the mapping function usually does not consider the preservation of small dissimilarities as important, since the cost is small with respect to the preservation of large dissimilarities. As a consequence, an item's neighborhoods may be sacrificed for the benefit of the overall mapping. We have subsequently designed a mapping method devoted to the preservation of neighborhood ranks rather than their dissimilarities: RankVisu. A mapping of data is obtained in which neighborhood ranks are as close as possible according to the original space.

A comparison with both metric and non-metric MDS highlights the pros (in particular, cluster enhancement) and cons of RankVisu.

Introduction

Most of the methods belonging to the dimensionality reduction field focus on the preservation of distances. It has been shown [53] that such a purpose is closely related to the objectives of the famous principal components analysis (PCA) [26], [45] and the classical multidimensional scaling (Classical MDS) [6], [8], [53]. The linear projections involved in these approaches have obvious limitations, since high-dimensional data often live in low-dimensional manifolds that are not necessarily linear subspaces of the data space. In such cases, mappings featuring curvilinear projections, while eventually retaining local properties of the data could be of great value. This is the objective of methods known as non-linear multidimensional scaling (or non-linear MDS), which usually emphasize short distances. Sammon's mapping [49], curvilinear component analysis (CCA) [12], generative topographic mapping (GTM) [5], locally linear embedding (LLE) [48], and data-driven high dimensional scaling (DD-HDS) [38] (for example) fall within this category. Note that we do not mention self-organizing maps (SOM) [31], [33] here. Indeed, SOMs achieve a mapping on a discrete grid (vector quantization), with many data usually placed in similar positions in the output space. In the context of SOMs, the study of distances and ranks requires different methods than those considered in this paper.

Although they primarily focus on distances, MDS methods are often used in situations in which the preservation of neighborhood ranks should be more relevant [24], [54], [56], [57] (briefly, the neighborhood rank of item B from item A's point of view is the number of items that are closer to A than B. Additional details are given in Section 2). A typical example is given in the Torgerson's paper [53] that deals with the analysis of psychometrical data. Indeed, distances between data cannot always be easily defined nor interpreted (for example, in case of psychophysical, psychometric and phylogenetic data). The analyses of heterogeneous data may also be enhanced by using neighborhood ranks (referred to as NR in this paper) instead of distances: indeed, considering ranks instead of distances makes it possible to locally stretch or shrink a manifold.

The concentration of measure phenomenon must not be understated. When dealing with high-dimensional data, distances are strongly affected by the concentration of measure phenomenon, since distances between data are more or less the same [1], [15]. On the other hand, the distribution of NR is broad regardless of dimension: ranks are always evenly distributed. Whereas similar distances (generated by the concentration of measure phenomenon) are treated similarly by a mapping function, the corresponding ranks are given a progressive value, whatever the dimension of the data. Note that the noise may sometimes jeopardize ranks, especially when the concentration of measure phenomenon is noticeable. Whether the uniform distribution of ranks is or is not an advantage for the mapping is open for discussion. At the very least, the quality (and differences) of resulting mappings may provide some arguments for the debate.

Note that special attention should be given to small NR, because they often express meaningful neighborhood relationships. In particular, geodesic-based mappings (i) and non-metric MDS (ii) are expected to improve NR preservation for the following reasons:

  • (i)

    Geodesic based mappings with k-ary neighborhoods (Isomap [52] and curvilinear distance analysis [37], for example) are grounded in NR: a lattice is drawn that connects each data point to its neighbors (classically, each data point is connected to its k closest neighbors). Distances between data are then measured following the edges; the derived distance is usually called the “geodesic distance” or “curvilinear distance”. Distances (and data) are subsequently mapped: Isomap [52] extends “classical multidimensional scaling” (or Classical MDS [6], [8], [53]), while “curvilinear distance analysis” (CDA) [37] extends “curvilinear component analysis” (CCA) [12]. Geodesic distance should allow easier neighborhood preservation while the data space is “rolled out”. Indeed, because distances to closest neighbors are kept unchanged within the geodesic paradigm, while other distances can be enlarged, they could be better preserved after the mapping. Intuitive justifications about the benefits of using curvilinear distance in mapping are discussed in [52] and [37].

  • (ii)

    Kruskal's mapping [35], [36] and other non-metric MDS (e.g., [9], [21], [29], [40], [50], [51]) focus on a criterion (often called a stress function) related to the NR: the purpose is to build a mapping in which the sorting of distances among all data in the data set matches the sorting of the original distances. Many non-metric methods have been proposed in order to explore various stress functions. In particular, considerations about rank equality had led to the concept of “weak” [35], [36] and “strong” [21] “monotonicity”. In addition, with regards to the handling of ranks in the case of equality among original distances, the so-called “approach to ties” has drawn some attention [29]. Lastly, two “normalization factors” used to weight differences in distance sorting have been tested: the sum of square distances (in [35], [36], for example) and the sum of squared deviations from the mean distance [40]. A comprehensive comparison of these methods has been previously published [9].

Non-metric MDS are especially useful for the analysis of psychophysical data (e.g., [25], [30], [41]) and genetic data (e.g., [23], [27], [43]), as they are expected to preserve small NR. Unfortunately, in the present article, we show that this intuitive expectation is not supported by empirical results.

Our purpose here is to build a low-dimensional mapping system, in which NR are preserved “at best” (this is close to the non-metric MDS aim) with a special interest in preserving small NR (this idea is directly imported from non-linear MDS, which focuses on small distances). This paper is organized as follows. Section 2 briefly introduces the calculus of NR. Section 3 presents and justifies the use of the stress function, which is optimized according to the procedure presented in Section 4. Section 5 defines the measures of preservation of NR used in Section 6, to compare the mappings obtained by RankVisu, DD-HDS (a metric mapping) [38] and Kruskal's mapping (a non-metric mapping) [35], [36]. Section 7 shows the evolution of the mapping during optimization. The sensitivity of RankVisu to parameters is analyzed in Section 8.

Section snippets

Neighborhood ranks (NR)

We focus here on relationships between data points represented by a distance matrix. Let d be the n× n matrix quantifying the dissimilarity between every pair of items in the original data space, where n is the size of the data set. Any distance or dissimilarity measure is eligible, as long as the two following properties on d are met: (1)i,jdij0and(2)dij=0i=jNeither triangle inequality nor symmetry is needed to calculate the matrix of NR.

The corresponding rank matrix is D, where Dij is the

Stress function

Following most mapping methods, the RankVisu algorithm optimizes a stress function (sometimes called “error function”, “energy function” or “loss function”, depending on the literature) to monitor the mapping. The stress function is designed to quantify the difference between NR in original space (D) and output space (Δ) (see Section 3.1). The stress function also takes into account the distances in the output space (Section 3.2).

Mapping optimization

As occurs often in multidimensional scaling, positions of points have to be gradually modified to permit the decrease of stress. Usually, the final mapping cannot be analytically derived. Many optimization tools have been used to reach this goal: the generalized Newton-Raphson algorithm [46], TABU search [19], genetic algorithms [20], [47], simulated annealing [16] and neural networks [12] are among the most popular.

Force directed placement (FDP) is increasingly employed for multidimensional

Evaluation of rank preservation

Three complementary strategies are considered to evaluate rank preservation: (i) two indexes of mapping error (IME) that express the global stress satisfaction (ς1 and ς2); (ii) the pressure that locally characterizes rank preservation; and (iii) the confusion matrix and an associated curve that together globally express rank preservation according to the size of the neighborhood.

Experimental results

Six data sets were used to evaluate the efficiency of RankVisu. The first two are popular statistic benchmarks: Iris-data and Wine data. The third and the fourth are simulated data: the Swiss roll data set, which is a classical test for the mapping algorithms, and an original high-dimensional data set containing four distinct clusters. The last two data sets are real high-dimensional data: namely, a natural scenes data set holding 200 photos characterized by the energy of Gabor filters [22] and

Mapping during optimization

Improvement of rank preservation during optimization from the initial mapping to the end is shown in Fig. 10 using the Wine data as an example. Benefits resulting from permutation and force directed placement (FDP) can be observed, step-by-step.

The hybrid optimization much improves the preservation of NR in Wine data (Fig. 10). Ranks are spoiled somewhat (at least in terms of ς1) in the initial mapping obtained by Isomap, which considers ς2, but the overall mapping permits convergence towards a

Sensitivity of essential rankvisu parameters using wine data

Two main parameters must be set up when using RankVisu: the dimension of the output space and the number of considered neighbors. We explore here, the consequences of parameter modification on the Wine data mapping.

Conclusion

Within the dimensionality reduction framework, NR may be considered to be a very sensitive endpoint. It should be preserved as much as possible, especially when the visualization of high-dimensional data is of concern. However, we have observed that both metric MDS and non-metric MDS fail to preserve small NR for some difficult data sets. We have subsequently proposed RankVisu, a mapping method that focuses on the preservation of small NR.

Generally, the mapping of high-dimensional data onto

Acknowledgment

This work has been partly financed by the DataHighDim grant, which is supported by the “ministère français de la recherche”.

S. Lespinats received the M.S. degree in Biomathematics from Denis Diderot University, Paris VII, France in 2002 and the Ph.D. degree in Biomathematics from Pierre and Marie Curie University, Paris VI, France, in 2006. He is currently a Post-Doc Researcher in INSERM U722 in Paris, France.

References (58)

  • M. Chalmers

    A linear iteration time layout algorithm for visualizing high-dimensional data

  • T. Cox et al.

    Multidimensional Scaling

    (1994)
  • A.P.M. Coxon

    The User's Guide to Multidimensional Scaling

    (1982)
  • B.V. Dasarathy

    Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques

    (1990)
  • P. Demartines, Mesures d’organisation du réseau de Kohonen, Presented at Congrès Satellite du Congrès Européen de...
  • P. Demartines et al.

    Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets

    IEEE Transactions on Neural Networks

    (1997)
  • P.J. Deschavanne et al.

    Genomic signature: characterization and classification of species assessed by chaos game representation of sequences

    Molecular Biology and Evolution

    (1999)
  • G.D. Di Battista et al.

    Graph Drawing: Algorithms for the Visualization of Graphs

    (1999)
  • D.L. Donoho, High-dimensional data analysis: the curses and blessings of dimensionality, in: American Mathematical...
  • K.A. Dowsland, in: C.R. Reeves (Ed.), Simulated Annealing, McGraw-Hill, New York,...
  • P. Eades

    A heuristic for graph drawing, Congressus numerantium

  • T. Fruchterman et al.

    Graph drawing by force-directed placement

    Software—Practice and Experience

    (1991)
  • F. Glover, M. Laguna, in: C.R. Reeves (Eds.), Tabu Search, McGraw-Hill, New York,...
  • D.E. Goldberg

    Genetic Algorithms in Search, Optimization, and Machine Learning

    (1989)
  • L. Guttman

    A general nonmetric technique for finding the smallest coordinate space for a configuration of points

    Psychometrika

    (1968)
  • N. Guyader, Scènes visuelles: catégorisation basée sur des modèles de perception, Ph.D. Thesis, Université Joseph...
  • M.F. Hammer et al.

    Hierarchical patterns of global human y-chromosome diversity

    Molecular Biology and Evolution

    (2001)
  • G. Hinton, S. Roweis, Stochastic neighbor embedding, in: NIPS, vol. 15, 2002, pp....
  • I. Jolliffe

    Principal Component Analysis

    (2002)
  • Cited by (0)

    S. Lespinats received the M.S. degree in Biomathematics from Denis Diderot University, Paris VII, France in 2002 and the Ph.D. degree in Biomathematics from Pierre and Marie Curie University, Paris VI, France, in 2006. He is currently a Post-Doc Researcher in INSERM U722 in Paris, France.

    B. Fertil received the Ph.D. degree in Physics from Paris XI University (France) in 1975 and in Living Sciences from Paris VI university (France) in 1984. He is currently Research Director at the CNRS Institute, UMR 6168, LSIS Laboratory in Marseille (France). His research interests include data mining, image analysis, modeling, with specific applications to medical decision making, bioinformatics and radiobiology.

    P. Villemain received a Ph.D. in Solid State Physics and Magnetism in 1970 at the Department of Fundamental Research of the Nuclear Center of Grenoble where he worked on the theory of phase transition. From 1980 to 1995 its field of research relates to the methods in statistical analysis of data applied to the industrial control of production and the instrumentation; for which he developed real time software tools.

    He joined Laboratory of Images and Signals where he works on data analysis in large dimension systems. He is Assistant Professor at the University Joseph Fourier of Grenoble where he teaches Mathematics, Statistics and Real time computing at the School of Engineer Polytech’Grenoble.

    J. Herault is Professor Emeritus at the “Laboratoire des Images et des Signaux” where he has establish a research team working on visual perception. His research interests are related to artificial or natural models of neuronal networks, independent sources extraction (ICA), high-dimensional data analysis (CCA) self organizing networks and vision models. Within the vision thematic, he studied signal processing in the retina (spatio-temporal filter, colour and non-linearity) and their utilization by models of visual cortex. The applications aim to adaptive image proceeding, dense estimation of move, scene categorization and image indexation.

    View full text