Elsevier

Pattern Recognition Letters

Volume 32, Issue 2, 15 January 2011, Pages 181-189
Pattern Recognition Letters

An improved local tangent space alignment method for manifold learning

https://doi.org/10.1016/j.patrec.2010.10.005Get rights and content

Abstract

Principal component analysis (PCA) is widely used in recently proposed manifold learning algorithms to provide approximate local tangent spaces. However, such approximations provided by PCA may be inaccurate when local neighborhoods of the data manifold do not lie in or close to a linear subspace. Furthermore, the approximated tangent spaces can not fit the change in data distribution density. In this paper, a new method is proposed for providing faithful approximations to the local tangent spaces of a data manifold, which is proved to be more accurate than PCA. With this new method, an improved local tangent space alignment (ILTSA) algorithm is developed, which can efficiently recover the geometric structure of data manifolds even in the case when data are sparse or non-uniformly distributed. Experimental results are presented to illustrate the better performance of ILTSA on both synthetic data and image data.

Research highlights

► The proposed method is an improvement of the LTSA method. ► A new method for local tangent space approximation is proposed.► It is more accurate than the widely used PCA approximation. ► The proposed method can deal with sparse or non-uniformly distributed data.

Introduction

In many real-world applications such as data visualization and visual tracking, we are often faced with high-dimensional data samples which have only a few intrinsic degrees of freedom. The set of such data samples can be modeled as a data manifold, and algorithms which aim to reduce dimensionality by revealing the manifold structure can be cast into the framework of manifold learning. Traditional algorithms such as principal component analysis (PCA) (Jolliffe, 1999) and multidimensional scaling (MDS) (Cox and Cox, 2001) are successful only when the data manifold is linear. Recently, progress has been made in developing efficient algorithms to be able to learn the low-dimensional structure of nonlinear data manifolds. These proposed methods include isometric feature mapping (ISOMAP) (Tenenbaum et al., 2000), locally linear embedding (LLE) (Roweis and Saul, 2000), Laplacian eigenmap (LE) (Belkin and Niyogi, 2003), Hessian LLE (HLLE) (Donoho and Grimes, 2003), local tangent space alignment (LTSA) (Zhang and Zha, 2004) and many others.

Among nonlinear manifold learning algorithms, LTSA has received wide attention since it is simple in geometric intuition and straightforward to implement. For data samples drawn from an m-dimensional manifold, LTSA first implements PCA on each neighborhood of data samples to get an m-dimensional subspace which approximates the local tangent space. LTSA then computes the local tangent coordinates of data samples and finally aligns them into a global coordinate system.

The performance of LTSA highly depends on the quality of the local tangent spaces approximated by PCA. However, the PCA approximation is accurate only when the following two assumptions hold:

  • (A1)

    data samples are uniformly distributed;

  • (A2)

    data samples in each local neighborhood of the manifold lie in or close to a linear subspace.

When data samples are sparse or non-uniformly distributed or when the data manifold has large curvatures, such assumptions can not be met, and the PCA approximation may be a bad estimation. This would make LTSA fail to reveal the manifold structure.

To overcome the drawbacks in using PCA for tangent space approximation, in this paper we propose a new method which can get accurate approximations to the local tangent spaces even when data samples are sparse or non-uniformly distributed or even when the data manifold has large curvatures. Compared with PCA, our method has the following two features.

  • First, each data sample itself, other than the mean of its neighborhood samples (as in PCA), is used as the origin of its approximated tangent space. Then the approximated local tangent space will not biased when data samples are non-uniformly distributed or sparse.

  • Secondly, the bases of the tangent space are obtained by minimizing a weighted sum of the projecting distances other than the sum of the projecting distances. Then the effect of curvatures can be taken care of when the data manifold has large curvatures.

Based on this new tangent space approximation, we propose an improved local tangent space alignment (ILTSA) algorithm which can reveal the underlying manifold structure by aligning the local tangent coordinates. Numerical experiments show that ILTSA can get faithful learning results even when data samples are sparse or non-uniformly distributed or even when the data manifold has large curvatures.

The remaining part of the paper is organized as follows. The PCA-based tangent space approximation as well as its limitations are described in Section 2. The ILTSA algorithm is presented in Section 3. A theoretical analysis of ILTSA is given in Section 4. Experimental results on both synthetic and real-world data sets are illustrated in Section 5. Some concluding remarks are stated in Section 6.

Section snippets

PCA-based tangent space approximation and its limitations

We first describe the basic steps of PCA-based tangent space approximation (PTSA) method and then illustrate its limitations using a synthetic example. Given a data manifold, the basic idea of PTSA is to find a linear subspace within each local neighborhood of the data manifold such that 1) the origin of the subspace is at the mean of the data samples of the neighborhood and 2) the sum is minimized of the projecting distances between the data samples of the neighborhood and their orthogonal

Improved local tangent space alignment algorithm

The improved local tangent space alignment algorithm (ILTSA) consists of two steps: local tangent space approximation and global alignment of local tangent coordinates. In Section 3.1, we propose a new method for local tangent space approximations. The global alignment step is then given in Section 3.2. Finally, the implementation details of ITLSA are stated in Section 3.3.

Theoretical analysis of ILTSA

In this section, we present a theoretical analysis of ILTSA. We first show why ILTSA can find more accurate tangent space approximations compared with LTSA by analyzing the error between the approximated tangent coordinates and their true values. We then explain why weights are introduced in minimizing the sum of the projecting distances.

Consider the application of ILTSA on an isometric Riemannian manifold M. Suppose M can be globally parameterized so M=F(Ω) for an open set Ω in Rm, where F = [F1 F

Experimental results

In this section, we apply the ILTSA algorithm to both synthetic data and high-dimensional image data to test its performance. Since ILTSA can be viewed as an improved version of LTSA, we mainly compare the performance of ILTSA with LTSA and LPCA. In addition, comparison is also made between ILTSA and other popular manifold learning methods on a synthetic data set.

Conclusions and discussions

In this paper, we first proposed a new method to effectively approximate the local tangent spaces of a data manifold. Compared with PCA-based tangent space approximations, our method can provide more accurate approximation to the local tangent basis and coordinates. Then, based on this new approximation method, we proposed an improved LTSA algorithm (called ILTSA algorithm) which can align the local tangent coordinates into a single global coordinate system. Compared with the LTSA algorithm

Acknowledgements

This work was partly supported by the NNSF of China Grant No. 90820007, the Outstanding Youth Fund of the NNSF of China Grant No. 60725310, the 863 Program of China Grant No. 2007AA04Z228 and the 973 Program of China Grant No. 2007CB311002. The authors thank the referees for their invaluable comments and suggestions which helped improve the paper greatly.

References (16)

  • F. Camastra

    Data dimensionality estimation methods: a survey

    Pattern Recognit.

    (2003)
  • M. Fan et al.

    Intrinsic dimension estimation of manifolds by incising balls

    Pattern Recognit.

    (2009)
  • Asuncion, A., Newman, D., 2007. UCI Machine Learning Repository [<http://www.ics.uci.edu/mlearn/MLRepository.html>]....
  • M. Belkin et al.

    Laplacian eigenmaps for dimensionality reduction and representation

    Neural Comput.

    (2003)
  • M. Carmo

    Riemannian Geometry

    (1992)
  • T.F. Cox et al.

    Multidimensional Scaling

    (2001)
  • D.L. Donoho et al.

    Hessian eigenmaps: new locally linear embedding techniques for high-dimensional data

    Proc. Nat. Acad. Sci. USA

    (2003)
  • I.T. Jolliffe

    Principal Component Analysis

    (1999)
There are more references available in the full text version of this article.

Cited by (59)

  • A novel signal representation in SEI: Manifold

    2023, Journal of the Franklin Institute
  • A novel method to recognize and classify based on an E-nose

    2021, Measurement: Journal of the International Measurement Confederation
    Citation Excerpt :

    When the assumption of linearity is not valid, we need to solve the problem by non-linear dimensionality reduction techniques, which do not require the linearity assumption and have been successfully adopted in various applications. There are some popular non-linear dimension reduction techniques such as isometric mapping (ISOMAP) [39], kernel PCA [40], diffusion maps [41], local tangent space alignment (LTSA) [42], and locally linear embedding (LLE) [43]. Among them, the LLE algorithm is an unsupervised dimensionality reduction algorithm that maps higher dimensional data into a lower-dimensional space while keeping the local spatial relationships.

  • Multi-manifold locality graph preserving analysis for hyperspectral image classification

    2020, Neurocomputing
    Citation Excerpt :

    LTSA seeks to characterize the local geometry at each neighborhood via its tangent space, and performs a global optimization to align these local tangent spaces to learn the embedding. However, all these methods are non-linear DR methods, and they have no direct mapping relationship from high-dimensional space to low-dimensional space [26]. To overcome this problem, some linear manifold learning methods were proposed to approximate nonlinear ones, such methods include locality preserving projections (LPP) [27], neighborhood preserving embedding (NPE) [28] and linear LTSA (LLTSA) [29].

View all citing articles on Scopus
View full text