Elsevier

Neurocomputing

Volume 187, 26 April 2016, Pages 109-118
Neurocomputing

Dimensionality reduction on Anchorgraph with an efficient Locality Preserving Projection

https://doi.org/10.1016/j.neucom.2015.07.128Get rights and content

Abstract

Manifold learning based dimensionality reduction methods have been successfully applied in many pattern recognition tasks, due to their ability to well capture the underlying relationship between data points. These methods, however, meet some challenges in terms of the storage cost and the computation complexity with the rapidly increasing data size. We propose an improved dimensionality reduction algorithm called Anchorgraph-based Locality Preserving Projection (AgLPP), trying to cope with the limitations via a novel estimation of the relationship between data points. We extend AgLPP into a kernel version, and reformulate it into a novel sparse representation. The experiments on several real-world datasets have demonstrated the effectiveness and efficiency of our methods.

Introduction

In many real world applications, vectors of data representation often lie in high dimensional spaces, which makes large deviation to the similarity measure of features and brings high computational costs for classification and retrieval tasks [1], [2], [3], [4], [5], [6] in various multimedia applications. Reducing the dimensionality of feature while capturing the discriminative information, therefore, becomes an important role in data preprocessing.

One of the classical dimensionality reduction algorithms is Principal Component Analysis (PCA) [7], which searches a set of orthogonal basis functions to capture the direction of maximum variance of data distribution. However, PCA has its limitations in addressing nonlinear data in many real applications, because it is based on the assumption that data can be embedded in a linear subspace of lower dimensionality.

To cope with the nonlinear data, dimensionality reduction methods based on manifold learning have been proposed and produced impressive results. For example, Isomap [8] is an approach which preserves certain inter-point relationships via the underlying global geometry. Locally Linear Embedding (LLE) [9] and Laplacian Eigenmap (LE) [10] are the methods that preserve the local geometry of the manifold with different measuring principles respectively. These approaches deal with fixed training sets well. However, as they do not produce an explicit mapping function between high and low dimensional spaces, they cannot be applied to new data points.

Later, He et al. [11] proposed an approach called Locality Preserving Projection (LPP), which incorporates the linear embedding assumption into manifold learning. LPP can effectively deal with both fixed training data and new data points. This method can be conducted in the reproducing kernel Hilbert space (RKHS) via kernel tricks as well.

These manifold-based dimensionality reduction methods, however, meet some challenges now because of the rapidly increasing data size. First, they are supposed to construct a graph with size O(N2) to measure the adjacency relationship. Second, they need to solve a generalized eigenvalue problem for an N×N matrix with O(N3) complexity when conducted in Hilbert space via kernel tricks. Both of the above processes introduce huge temporal and storage cost for the data preprocessing.

In this paper, we propose a novel manifold-based linear dimensionality reduction method, called Anchorgraph-based Locality Preserving Projection (AgLPP). Given a large set of data points, we first adopt the clustering algorithm to obtain a small number of clustering centers as virtual anchors, which have been validated to have a stronger representation power to adequately cover the vast point cloud in [12], [13], [14], [15]. Then by regarding these anchors as transformation points, we measure the point-to-point relationship between real data points in two steps. Finally, based on the adjacency matrix of this novel relationship, the time and storage cost of AgLPP can be linear with respect to data size. The contributions of this paper are highlighted as follows. First, as AgLPP keeps the linear embedding assumption of LPP, this method can be applied to any new data point to locate the mapped position in the reduced representation space. We additionally extend AgLPP to a kernel version and reformulate it into a novel sparse representation. Second, our AgLPP and its two extended variants compute the mapping functions for nonlinear data with Anchorgraph. As a result, the dimensionality reduction on a large scale database can be implemented with fewer temporal and storage costs. Third, we conduct experiments to empirically validate our methods on five datasets. The experimental results demonstrate better effectiveness and efficiency of our methods.

The rest of this paper is organized as follows. In Section 2, we briefly discuss some related work. Our AgLPP is introduced in Section 3. In Section 4, we present the experiment results on several real world databases. Finally we conclude the paper in Section 5.

Section snippets

Related work

We focus on the dimensionality reduction models which are designed to cope with nonlinear data, because many kinds of real-world data lie on or near the manifold of the high dimensional space.

One popular family of these models is the manifold-based dimensionality reduction method, which is designed to keep the inter-point similarity from the original high dimensional space to the low one. For many years, researchers have focused on preserving the global property of the whole point set. Kruskal

Dimensionality reduction on Anchorgraph

In order to incorporate scalable relationships of large databases into the manifold-based dimensionality reduction methods, a natural way is to perform learning tasks on Anchorgraph [12], [13]. In this section, we will describe our AgLPP which is based on Locality Preserving Projection and Anchorgraph construction (A flowchart is shown in Fig. 1). We begin with the description of Locality Preserving Projection [11].

Experiments setup

In this section, we show several experimental results and comparisons to evaluate the effectiveness and efficiency of our methods. The adopted datasets include two small scale datasets Digit1, and USPS, two middle scale datasets USPS-All and Newsgroups, and one large scale dataset MNIST-Train [42], [43]. The details of these datasets are listed in Table 1. We here briefly describe the five datasets used in our experiments.

For Digit1 and USPS datasets, their instances were both divided into two

Conclusions

In this paper, we propose AgLPP by addressing the challenges of the traditional dimensionality reduction methods in terms of computational complexity and storage cost. In addition, we extend AgLPP into two novel variants which are implemented in the new transformed feature space. As both the time and storage costs of the AgLPP grow linearly with the data size, this method is potentially useful in dealing with much larger datasets. Furthermore, our strategy that employs the Anchorgraph

Acknowledgments

This work was supported by National Natural Science Foundation of China (Grant no. 61301222), China Postdoctoral Science Foundation (Grant no. 2013M541821), Fundamental Research Funds for the Central Universities (Grant nos. 2013HGQC0018, 2013HGBH0027, 2013HGBZ0166)

Rui Jiang is pursing the master degree in School of Computer and Information, Hefei University of Technology (HFUT). His research interests include pattern recognition and image processing.

References (50)

  • J. Yu et al.

    High-order distance-based multiview stochastic learning in image classification

    IEEE Trans. Cybern.

    (2014)
  • R. Hong et al.

    Image annotation by multiple-instance learning with discriminative feature mapping and selection

    IEEE Trans. Cybern.

    (2014)
  • M. Wang et al.

    Multimodal graph-based reranking for web image search

    IEEE Trans. Image Process.

    (2012)
  • X. Liu, and B. Huet, Concept detector refinement using social videos, in: Proceedings of the International Workshop on...
  • J. Yu et al.

    Adaptive hypergraph learning and its application in image classification

    IEEE Trans. Image Process.

    (2012)
  • H. Hotelling

    Analysis of a complex of statistical variables into principal components

    J. Educ. Psychol.

    (1933)
  • J.B. Tenenbaum, Mapping a manifold of perceptual observations, in: Proceedings of the Conference on Advances in Neural...
  • S.T. Roweis et al.

    Nonlinear dimensionality reduction by locally linear embedding

    Science

    (2000)
  • M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in: Proceedings of the...
  • X. He, and P. Niyogi, Locality preserving projections, in: Proceedings of the Conference on Advances in Neural...
  • W. Liu, J. He, S.F. Chang, Large graph construction for scalable semi-supervised learning, in: Proceedings of the 27th...
  • W. Liu, J. Wang, S.F. Chang, Robust and scalable graph-based semisupervised learning, Proceedings of the IEEE, 100(9),...
  • B. Xu et al.

    EMR: a scalable graph-based ranking model for content-based image retrieval

    IEEE Trans. Knowl. Data Eng.

    (2013)
  • J.B. Kruskal

    Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis

    Psychometrika

    (1964)
  • K.Q. Weinberger, L.K. Saul, An introduction to nonlinear dimensionality reduction by maximum variance unfolding, In:...
  • Cited by (0)

    Rui Jiang is pursing the master degree in School of Computer and Information, Hefei University of Technology (HFUT). His research interests include pattern recognition and image processing.

    Weijie Fu is pursing the bachelor degree in School of Computer and Information, Hefei University of Technology (HFUT). His research focuses on machine learning.

    Li Wen is pursing the bachelor degree in School of Computer and Information, Hefei University of Technology (HFUT). Her research interests include information retrieval and pattern recognition.

    Shijie Hao received his Ph.D. from Hefei University of Technology (HFUT) in 2012. He is currently an assistant professor in School of Computer and Information, HFUT. His research interests are machine learning, image processing and multimedia content analysis.

    Richang Hong received his Ph.D. from University of Science and Technology of China (USTC) in 2008. He is currently a professor in School of Computer and Information, HFUT. His research interests are multimedia content analysis, pattern recognition and data mining.

    View full text