Dimensionality reduction on Anchorgraph with an efficient Locality Preserving Projection
Introduction
In many real world applications, vectors of data representation often lie in high dimensional spaces, which makes large deviation to the similarity measure of features and brings high computational costs for classification and retrieval tasks [1], [2], [3], [4], [5], [6] in various multimedia applications. Reducing the dimensionality of feature while capturing the discriminative information, therefore, becomes an important role in data preprocessing.
One of the classical dimensionality reduction algorithms is Principal Component Analysis (PCA) [7], which searches a set of orthogonal basis functions to capture the direction of maximum variance of data distribution. However, PCA has its limitations in addressing nonlinear data in many real applications, because it is based on the assumption that data can be embedded in a linear subspace of lower dimensionality.
To cope with the nonlinear data, dimensionality reduction methods based on manifold learning have been proposed and produced impressive results. For example, Isomap [8] is an approach which preserves certain inter-point relationships via the underlying global geometry. Locally Linear Embedding (LLE) [9] and Laplacian Eigenmap (LE) [10] are the methods that preserve the local geometry of the manifold with different measuring principles respectively. These approaches deal with fixed training sets well. However, as they do not produce an explicit mapping function between high and low dimensional spaces, they cannot be applied to new data points.
Later, He et al. [11] proposed an approach called Locality Preserving Projection (LPP), which incorporates the linear embedding assumption into manifold learning. LPP can effectively deal with both fixed training data and new data points. This method can be conducted in the reproducing kernel Hilbert space (RKHS) via kernel tricks as well.
These manifold-based dimensionality reduction methods, however, meet some challenges now because of the rapidly increasing data size. First, they are supposed to construct a graph with size to measure the adjacency relationship. Second, they need to solve a generalized eigenvalue problem for an matrix with complexity when conducted in Hilbert space via kernel tricks. Both of the above processes introduce huge temporal and storage cost for the data preprocessing.
In this paper, we propose a novel manifold-based linear dimensionality reduction method, called Anchorgraph-based Locality Preserving Projection (AgLPP). Given a large set of data points, we first adopt the clustering algorithm to obtain a small number of clustering centers as virtual anchors, which have been validated to have a stronger representation power to adequately cover the vast point cloud in [12], [13], [14], [15]. Then by regarding these anchors as transformation points, we measure the point-to-point relationship between real data points in two steps. Finally, based on the adjacency matrix of this novel relationship, the time and storage cost of AgLPP can be linear with respect to data size. The contributions of this paper are highlighted as follows. First, as AgLPP keeps the linear embedding assumption of LPP, this method can be applied to any new data point to locate the mapped position in the reduced representation space. We additionally extend AgLPP to a kernel version and reformulate it into a novel sparse representation. Second, our AgLPP and its two extended variants compute the mapping functions for nonlinear data with Anchorgraph. As a result, the dimensionality reduction on a large scale database can be implemented with fewer temporal and storage costs. Third, we conduct experiments to empirically validate our methods on five datasets. The experimental results demonstrate better effectiveness and efficiency of our methods.
The rest of this paper is organized as follows. In Section 2, we briefly discuss some related work. Our AgLPP is introduced in Section 3. In Section 4, we present the experiment results on several real world databases. Finally we conclude the paper in Section 5.
Section snippets
Related work
We focus on the dimensionality reduction models which are designed to cope with nonlinear data, because many kinds of real-world data lie on or near the manifold of the high dimensional space.
One popular family of these models is the manifold-based dimensionality reduction method, which is designed to keep the inter-point similarity from the original high dimensional space to the low one. For many years, researchers have focused on preserving the global property of the whole point set. Kruskal
Dimensionality reduction on Anchorgraph
In order to incorporate scalable relationships of large databases into the manifold-based dimensionality reduction methods, a natural way is to perform learning tasks on Anchorgraph [12], [13]. In this section, we will describe our AgLPP which is based on Locality Preserving Projection and Anchorgraph construction (A flowchart is shown in Fig. 1). We begin with the description of Locality Preserving Projection [11].
Experiments setup
In this section, we show several experimental results and comparisons to evaluate the effectiveness and efficiency of our methods. The adopted datasets include two small scale datasets Digit1, and USPS, two middle scale datasets USPS-All and Newsgroups, and one large scale dataset MNIST-Train [42], [43]. The details of these datasets are listed in Table 1. We here briefly describe the five datasets used in our experiments.
For Digit1 and USPS datasets, their instances were both divided into two
Conclusions
In this paper, we propose AgLPP by addressing the challenges of the traditional dimensionality reduction methods in terms of computational complexity and storage cost. In addition, we extend AgLPP into two novel variants which are implemented in the new transformed feature space. As both the time and storage costs of the AgLPP grow linearly with the data size, this method is potentially useful in dealing with much larger datasets. Furthermore, our strategy that employs the Anchorgraph
Acknowledgments
This work was supported by National Natural Science Foundation of China (Grant no. 61301222), China Postdoctoral Science Foundation (Grant no. 2013M541821), Fundamental Research Funds for the Central Universities (Grant nos. 2013HGQC0018, 2013HGBH0027, 2013HGBZ0166)
Rui Jiang is pursing the master degree in School of Computer and Information, Hefei University of Technology (HFUT). His research interests include pattern recognition and image processing.
References (50)
- et al.
Active learning on anchorgraph with an improved transductive experimental design
Neurocomputing
(2016) - et al.
A framework for optimal kernel-based manifold embedding of medical image data
Comput. Med. Imaging Graph.
(2015) - et al.
Parametric nonlinear dimensionality reduction using kernel t-sne
Neurocomputing
(2015) - et al.
Self-taught dimensionality reduction on the high-dimensional small-sized data
Pattern Recognit.
(2013) - et al.
Eigenanatomy: sparse dimensionality reduction for multi-modal medical image analysis
Methods
(2015) - et al.
Supervised kernel locality preserving projections for face recognition
Neurocomputing
(2005) - et al.
Discriminant sparse neighborhood preserving embedding for face recognition
Pattern Recognit.
(2012) - et al.
Nonparametric discriminant multi-manifold learning for dimensionality reduction
Neurocomputing
(2015) - et al.
Low-rank matrix factorization with multiple hypergraph regularizer
Pattern Recognit.
(2015) - M. Guillaumin, J. Verbeek and C. Schmid, Multimodal semi-supervised learning for image classification, in: Proceedings...
High-order distance-based multiview stochastic learning in image classification
IEEE Trans. Cybern.
Image annotation by multiple-instance learning with discriminative feature mapping and selection
IEEE Trans. Cybern.
Multimodal graph-based reranking for web image search
IEEE Trans. Image Process.
Adaptive hypergraph learning and its application in image classification
IEEE Trans. Image Process.
Analysis of a complex of statistical variables into principal components
J. Educ. Psychol.
Nonlinear dimensionality reduction by locally linear embedding
Science
EMR: a scalable graph-based ranking model for content-based image retrieval
IEEE Trans. Knowl. Data Eng.
Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis
Psychometrika
Cited by (0)
Rui Jiang is pursing the master degree in School of Computer and Information, Hefei University of Technology (HFUT). His research interests include pattern recognition and image processing.
Weijie Fu is pursing the bachelor degree in School of Computer and Information, Hefei University of Technology (HFUT). His research focuses on machine learning.
Li Wen is pursing the bachelor degree in School of Computer and Information, Hefei University of Technology (HFUT). Her research interests include information retrieval and pattern recognition.
Shijie Hao received his Ph.D. from Hefei University of Technology (HFUT) in 2012. He is currently an assistant professor in School of Computer and Information, HFUT. His research interests are machine learning, image processing and multimedia content analysis.
Richang Hong received his Ph.D. from University of Science and Technology of China (USTC) in 2008. He is currently a professor in School of Computer and Information, HFUT. His research interests are multimedia content analysis, pattern recognition and data mining.