Stochastic neighbor projection on manifold for feature extraction

doi:10.1016/j.neucom.2011.03.036

Neurocomputing

Volume 74, Issue 17, October 2011, Pages 2780-2789

https://doi.org/10.1016/j.neucom.2011.03.036 Get rights and content

Abstract

This paper develops a manifold-oriented stochastic neighbor projection (MSNP) technique for feature extraction. MSNP is designed to find a linear projection for the purpose of capturing the underlying pattern structure of observations that actually lie on a nonlinear manifold. In MSNP, the similarity information of observations is encoded with stochastic neighbor distribution based on geodesic distance metric, then the same distribution is required to be hold in feature space. This learning criterion not only empowers MSNP to extract nonlinear feature through a linear projection, but makes MSNP competitive as well by reason that distribution preservation is more workable and flexible than rigid distance preservation. MSNP is evaluated in three applications: data visualization for faces image, face recognition and palmprint recognition. Experimental results on several benchmark databases suggest that the proposed MSNP provides a unsupervised feature extraction approach with powerful pattern revealing capability for complex manifold data.

Introduction

The data to be processed in many applications of modern machine intelligence, e.g., pattern recognition, image retrieval, knowledge discovery and computer vision, are often acquired or modeled in high-dimensional form. It is well acknowledged that high-dimensional data pose many challenges, such as high computational complexity, huge storage demand and poor outcome performance, to data processing [1], [2]. Feature extraction, as a branch of dimensionality reduction, overcomes the curse of dimensionality [2] problem by mapping high-dimensional data into a low-dimensional subspace, in which data redundancy is reduced. The goal of feature extraction is to find meaningful low-dimensional representations of high-dimensional data and simultaneously discover the underlying pattern structure. Feature extraction methods can be broadly categorized into two classes: Linear subspace methods such as PCA and LDA, and nonlinear approaches such as kernel-based techniques and geometry-based techniques.

Linear feature extraction tries to find a linear subspace as feature space so as to preserve certain kind of characteristics of observed data. Specifically, PCA [3] projects data along the directions that maximize the total variance of features. MDS [4] seeks the low-rank projection that best preserves the inter-point distances given by pairwise distance matrix. PCA and MDS are equivalent in theory when Euclidean distance is employed. ICA [5] attempts to make the probability distribution of features in each projection direction is statistically independent of one another. Compared with PCA, MDS and ICA, whose linear subspace contains no discriminant information, LDA [6] learns a linear projection with the assistance of class labels. LDA magnifies the inter-class scatter meanwhile shrinks the intra-class scatter in feature space for the purpose of better separability. Recently, GMMS [7], HMSS [8] and MMDA [9] markedly improve the performance of LDA by solving the knotty problem of classical LDA that fractional classes could emerge with each other when feature dimension is lower than class number. GMMS achieves its improvement by employing a general geometric mean function in the criterion. HMSS implicitly replaces the arithmetic mean used in LDA with the harmonic mean. MMDA performs discriminant analysis with a novel criterion that directly maximizes the minimum pairwise distance of all classes so as to separate all classes in the low-dimensional subspace. Generally speaking, linear subspace method shows good performance on feature extraction for linear structure data, but may be suboptimal for the data containing complicated nonlinear structure, such as nonlinear submanifold embedded in observation space.

To deal with nonlinear structural data, a number of nonlinear approaches have been developed for dimensionality reduction or feature extraction, with two in particular attracting wide attentions: kernel-based techniques and geometry-based techniques. Kernel-based techniques implicitly map raw data into a potentially much higher dimensional feature space in order to convert data from nonlinear structure to linear structure. With the aid of kernel function, kernel-based methods extract nonlinear feature by applying linear techniques in the implicit feature space. The representative kernel-based methods include KPCA [10], KICA [11] and KLDA [12], [13] and they are proven to be effective for feature extraction task in nonlinear case. In contrast with kernel-based methods, geometry-based methods are motivated by adopting geometrical perspective to explore the immanent structure of data. The representative methods include the so-called manifold learning and its extensions. The well known manifold learning algorithms such as Isomap [14], LLE [15], LE [16], HLLE [17] and LTSA [18] are all developed for nonlinear dimensionality reduction with the help of differential geometry. Isomap calculates pairwise geodesic distance of observations and preserves the distance by classical MDS in embedding space so as to unfolding nonlinear manifold. LLE focus on local neighborhood of each data in which the error of linearly reconstructing the data with its neighbors is minimal for both the sample data and the corresponding embeddings. LE is developed based on Laplace Betltrami operator on manifold. LE constructs an undirected weighted graph that indicates the neighbor relations of the pairwise points, then recover the structure of manifold through graph manipulation. HLLE estimates the Hessian matrix based on neighborhoods to capture local property and obtains the low-dimensional embeddings through eigenvalue decomposition of the Hessian matrix. LTSA first uses approximated local tangent space to encode local geometry, then aligns all the local tangent spaces to obtain a global embedding. Since manifold learning methods obtain low-dimensional embeddings without a explicit mapping, it is intractable for them to extract feature beyond training sample set. Many endeavors have been done to overcome the out-of-sample problem [19]. NPE [20] try to find a linear subspace that preserves local structure under the same principle of LLE. LPP [21] seeks optimal linear approximation to eigenfuction of Laplacian Betltrami operator on manifold. LLTSA [22] finds the linear projection that approximates the affine transform of LTSA, and DLA [23] is another linear extension of LTSA which takes account of discriminant information. Besides, in order to understand the common properties of different approaches for dimensionality reduction and to investigate their intrinsic relationships, the graph embedding framework [24] and the patch alignment framework [25] have been proposed. The graph embedding framework models the dimensionality reduction problem with graph language and graph theory, and the patch alignment framework adopts the viewpoint of local property exploring with global alignment. The theoretical frameworks not only help deepen our understanding towards various algorithms that have been developed for dimensionality reduction, they also present potential to develop more effective methods. For instance, MEN [26] combines spars projection with manifold learning principle, and TCA [27] is a semi-supervised feature extraction method based on graph-theoretic framework with its orthogonal extension.

More recently, there are a lot of interest in a novel dimensionality reduction method called SNE [28]. SNE converts pairwise dissimilarity of inputs to probability distribution related to Gaussian in high-dimensional space, then requires the embeddings to retain the same probability distribution. SNE can be seen as a manifold learning approach as it captures the intrinsic structure of data through preserving neighboring identities. t-SNE [29] extends SNE by using student t-distribution to model pairwise dissimilarities in low-dimensional space. SNE and t-SNE achieve impressive results on recovering underlying structure of data manifold, but they are embarrassed by two congenital shortages. Firstly, SNE and t-SNE model pairwise similarity based on Euclidean distance in raw data space, then perform dimension reduction subject to the pairwise similarity. However, as Euclidean distance can not faithfully reflect the intrinsic similarity relation when data lie on a nonlinear manifold, the capability of SNE and t-SNE to unfold manifold is constrained by the inaccurate priori knowledge of similarity relationship. Secondly, SNE and t-SNE encounter the embarrassment of out-of-sample problem as Isomap, LLE and LE do, which incurs inconvenience for feature extraction task. The reason is SNE and t-SNE obtain the low-dimensional coordinates just for training samples without constructing an explicit mapping between input space and output space.

Inspired by SNE and t-SNE, we explored how to measure similarity on manifold more accurately, and proposed a projection approach called MSNP for feature extraction. To be specific, the pairwise similarities of raw data are calculated in the form of probability distribution based on geodesic distance, and in a similar way, the similarity relationship of feature points is modeled as another probability distribution based on Cauchy distribution. We construct a criterion with respect to the projection matrix that minimizes the KL-divergence of two distributions so as to preserve the intrinsic geometry of data. An efficient iterative algorithm is designed to solve our model by using conjugate gradient method. MSNP provides a simple unsupervised feature extraction approach that is sensitive to nonlinear manifold structure. Experimental results show that the feature produced by MSNP exactly recovers the intrinsic pattern structure, and demonstrate that MSNP outperforms many competing unsupervised feature extraction methods in biometrics.

The paper is structured as follows: in Section 2, we provide a brief review of SNE and t-SNE. Section 3 describes the detailed Algorithm derivation of MSNP. The experimental results and analysis are presented in Section 4. Finally, we provide some concluding remarks in Section 5.

Section snippets

SNE and t-SNE

Considering the problem of representing n-dimensional data vectors $x_{1}, x_{2}, \dots, x_{N}$ , by d-dimensional $(d ⪡ n)$ vectors $y_{1}, y_{2}, \dots, y_{N}$ such that y_i represents x_i. The basic principle of SNE is to convert pairwise Euclidean distances into probabilities of selecting neighbors to model pairwise similarities. In particular, the similarity of datapoint x_j to datapoint x_i is depicted as the following conditional probability $p_{j | i}$ which means x_i how possible to pick x_j as its neighbor: $p_{j | i} = \frac{\exp (- ∥ x_{i} - x_{j} ∥^{2} / 2 σ_{i}^{2})}{\sum_{k \neq i} \exp (}$

Stochastic neighborhood projection on manifold

In this section, we introduce the MSNP algorithm that focuses on both capturing nonlinear geometrical structure by preserving the probability distribution related to similarity and exploring an explicit mapping from raw data to features. We begin with a description of measuring the real pairwise similarities on manifold.

Experimental results

In this section, we evaluate the effectiveness of our MSNP method for feature extraction. Several experiments are carried out on typical databases to demonstrate its good behavior on exploring nonlinear pattern structure and extracting validate feature for recognition task.

Conclusion and future work

In this paper, we present a novel unsupervised feature extraction method, called MSNP. We design MSNP for the purpose of discovering the underlying structure of data manifold through a linear projection. MSNP models the structure of manifold by stochastic neighbor distribution in the high-dimensional observation space. Using Cauchy distribution to model stochastic distribution of features, MSNP recover the manifold structure through a linear projection by requiring the two distributions to be

Acknowledgements

The authors would like to thank the anonymous reviewers for their critical and constructive comments and suggestions. This project was partially supported by National Natural Science Foundation of China (Nos. 60632050, 60775015).

Songsong Wu obtained his Bachelor's degree in Information and Computing Sciences from Nanjing University of Posts and Telecommunications in 2004, and the Master's degree in Measuring and Testing Technologies and Instruments from Nanjing Forestry University in 2007. Currently, he is a PhD candidate in the school of Computer Science and Technology on the subject of Pattern Recognition and Intelligence Systems, Nanjing University of Science and Technology. His current research interests include

References (35)

T. Zhang et al.
Linear local tangent space alignment and application to face recognition
Neurocomputing
(2007)
D. Donoho, High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality, AMS Math Challenges Lecture,...
R. Bellman
Adaptive Control Processes: A Guided Tour
(1961)
M. Turk et al.
Eigenfaces for recognition
Journal of Cognitive Neuroscience
(1991)
T. Cox et al.
Multidimensional Scaling
(1994)
P. Common
Independent component analysis—a new concept?
Signal Processing
(1994)
P.N. Belhumeur et al.
Eigenfaces vs fisherfaces: recognition using class specific linear projection
IEEE Transaction on Pattern Analysis and Machine Intelligence
(1991)
D. Tao et al.
Geometric mean for subspace selection
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2009)
W. Bian, D. Tao, Harmonic mean for subspace selection, in: 19th International Conference on Pattern Recognition, 2008,...
W. Bian, D. Tao, Max–Min distance analysis by using sequential SDP relaxation for dimension reduction, IEEE...

B. Scholkopf et al.

Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation

(1998)

F.R. Bach et al.

Kernel independent component analysis

Journal of Machine Learning Research

(2002)

S. Mika et al.

Invariant feature extraction and classification in kernel spaces

Advances in Neural Information Processing Systems

(1999)

J. Yang et al.

KPCA plus LDA: a complete kernel fisher discriminant framework for feature extraction and recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2005)

J. Tenenbaum et al.

A global geometric framework for nonlinear dimensionality reduction

Science

(2000)

S.T. Roweis et al.

Nonlinear dimensionality reduction by locally linear embedding

Science

(2000)

M. Belkin et al.

Laplacian eigenmaps for dimensionality reduction and data representation

Neural Computation

(2003)

Cited by (0)

Mingming Sun obtained his Bachelor of Science in Mathematics at the Xinjiang University in 2002 and his PhD at the Nanjing University of Science and Technology (NUST) in the Department of Computer Science on the subject of Pattern Recognition and Intelligence Systems in 2007. He is now a lecture at the School of Computer Science and Technology at the Nanjing University of Science and Technology. His current research interests include pattern recognition, machine learning and image processing.

Jingyu Yang received the BS Degree in Computer Science from Nanjing University of Science and Technology (NUST), Nanjing, China. From 1982 to 1984 he was a visiting scientist at the Coordinated Science Laboratory, University of Illinois at Urbana—Champaign. From 1993 to 1994 he was a visiting professor at the Department of Computer Science, Missuria University. And in 1998, he acted as a visiting professor at Concordia University in Canada. He is currently a professor and Chairman in the department of Computer Science at NUST. He is the author of over 300 scientific papers in computer vision, pattern recognition, and artificial intelligence. He has won more than 20 provincial awards and national awards. His current research interests are in the areas of pattern recognition, robot vision, image processing, data fusion, and artificial intelligence.

View full text

Stochastic neighbor projection on manifold for feature extraction

Abstract

Introduction

Section snippets

SNE and t-SNE

Stochastic neighborhood projection on manifold

Experimental results

Conclusion and future work

Acknowledgements

Neurocomputing

Adaptive Control Processes: A Guided Tour

Eigenfaces for recognition

Journal of Cognitive Neuroscience

Multidimensional Scaling

Independent component analysis—a new concept?

Signal Processing

Eigenfaces vs fisherfaces: recognition using class specific linear projection

IEEE Transaction on Pattern Analysis and Machine Intelligence

Geometric mean for subspace selection

IEEE Transactions on Pattern Analysis and Machine Intelligence

Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation

Kernel independent component analysis

Journal of Machine Learning Research

Invariant feature extraction and classification in kernel spaces

Advances in Neural Information Processing Systems

KPCA plus LDA: a complete kernel fisher discriminant framework for feature extraction and recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence

A global geometric framework for nonlinear dimensionality reduction

Science

Nonlinear dimensionality reduction by locally linear embedding

Science

Laplacian eigenmaps for dimensionality reduction and data representation

Neural Computation