Discriminative learning by sparse representation for classification
Introduction
High-dimensional data such as face images and gene microarrays exist in many pattern recognition and data mining applications. It presents many mathematical challenges as well as some opportunities, and bounds to give rise to new theoretical developments [1]. One of the problems is the so-called “curse of dimensionality” [2]. That is to say, the dimensionality of the data is much larger than the number of the samples. But it is interesting that, in many cases, “important” dimensionality is not so much. Thus, before applying some algorithms to high-dimensional data, we need to reduce the dimensionality of the original data. Dimensionality reduction plays a more and more important role in practical data processing and analyzing tasks, which has been used successfully in many machine learning fields such as face recognition [3], [4] and remote sense data analysis [5].
In many existing dimensionality reduction algorithms, principal component analysis (PCA) [6] and liner discriminative analysis (LDA) [7] are two classical subspace learning methods. PCA attempts to seek an optimal projection direction, so that covariance of the data is maximized. PCA is based on low representation of the original high-dimensional data, which does not exploit the label information of samples. Unlike PCA, LDA is a supervised method which seeks an optimal discriminative subspace by maximizing between-class scatter, meanwhile, minimizing within-class scatter. Recently, a number of manifold learning algorithms have been proposed, which are useful to analyze the data that lies on or near a submanifold of the original space. Locally linear embedding (LLE) [8], a Laplacian eigenmaps (LE) [9], neighborhood-preserving embedding (NPE) [10] and locality preserving projections (LPP) [11] are typical representations of the manifold learning algorithms. LLE is an unsupervised learning algorithm which is able to learn the local structure of nonlinear manifold by computing low-dimension, neighborhood-preserving embeddings of the original high-dimensional data. LE preserves proximity relations of pairwise samples by manipulations on an undirected weighted graph. NPE and LPP model the local submanifold structure by maintaining the relation of the neighborhood of the samples from the data set before and after being transformated.
In [12], PCA, LDA and LPP are unified to the graph embedding framework. That is to say, their differences are the graph structure and the edge weights. As a result, graph becomes the heart of many existing dimensionality reduction methods. How to construct a graph is a focus in designing different dimensionality reduction algorithms. k-nearest-neighbor and which are both the usual Euclidean norm are two popular ways of graph construction. They are simple to compute and easy to operate, but two parameters—the size of sample neighborhood k and the radius of the ball —are sensitive to noise and difficult to be determined in many real problems. Qiao [13] and Cheng [14] make use of sparse representation to construct a novel graph—l1-graph that inherits many merits of sparse reconstruction and reconstructs an adapting graph for lack of model parameters. But they do not use prior knowledge of class identities, that is, they are unsupervised.
In this paper, to enhance the classification performance of SPP algorithm, we propose a new algorithm, called discriminative learning by sparse representation or DLSP for short. For each sample , we use the label information to divide the robust sparse representation coefficients by the original data into two groups: one has the same label with the sample xi and the other has the different label from the sample xi. To achieve the better classification performance, our algorithm proposed in this paper incorporates the merits of both local interclass geometrical structure and sparsity property, DLSP demands local within-class compactness, and also needs the samples that have the different label from the sample xi are close to their class centroid, respectively. That makes DLSP possess the advantages of the sparse reconstruction, and more importantly, it has better capacity of discrimination, especially when the size of the training set is small.
The rest of the paper is organized as follows: in Section 2, we briefly review the original l1-graph and SPP algorithm, respectively. We propose our new algorithm in Section 3. In Section 4, the experimental results are given, and we conclude this paper in Section 5.
Section snippets
Briefly review l1-graph and SPP
For convenience, we firstly give some notations used in this paper. Suppose that is a set of m-dimensional samples of size n and composed of c class (each class contains samples, ). denotes the class label of sample xk. For example, xk comes from the cth class if . The kth class centroid is . For each sample , the matrix Ai denotes a sample set: . The robust sparse representation
Discriminative learning by sparse representation projections
SPP algorithm preserves many merits of LLE and l1-graph, and summarizes the overall behavior of the whole sample set in sparse reconstruction process, but it is an unsupervised learning method which does not use the prior knowledge of class identities. To enhance the classification performance of SPP algorithm, we propose a new method, called discriminative learning by sparse representation projections or DLSP for short, which incorporates the merits of both local geometry structure and global
Experiments
In this section, to evaluate the effectiveness of the proposed algorithm for classification, we compare the proposed discriminative learning by sparse representation projection (DLSP) algorithm with the five dimensionality reduction methods—PCA, LPP, NPE, SPP and LDA on several publicly available face databases including ORL [15], UMIST [16] and JAFFE [17] which involve different illumination, expression and pose changes. In all experiments, we keep 98% information in the sense of
Conclusions
In this paper, based on sparse representation, we have proposed DLSP algorithm, which is a new dimensionality reduction method. DLSP incorporates the merits of both local geometry structure and global sparse property. The advantages of DLSP are as follows: DLSP preserves discriminative information, especially when the size of the training set is small; DLSP reconstructs graph adapting and avoids the difficulty of parameter selection as in LPP and NPE; DLSP performs better than SPP algorithm in
Acknowledgments
The authors would like to express their gratitude to the anonymous referees as well as the Editor and Associate Editor for their valuable comments which lead to substantial improvements of the paper. This work was supported by the National Basic Research Program of China (973 Program)(Grant no. 2007CB311002) and the National Natural Science Foundation of China (Grant nos. 60675013, 10531030, 61075006/F030401).
Fei Zang received his master degree from the School of Mathematical Science at the University of Electronic Science and Technology of China in 2007. Now, he is a candidate of the Ph.D. degree in School of Science in Xi’an Jiaotong University, China. He is interested in pattern recognition and image processing.
References (17)
- et al.
Sparsity preserving projections with applications to face recognition
Pattern Recognition
(2010) - D.L. Donoho, High-dimensional data analysis: the curses and blessings of dimensionality. Lecture delivered at the...
Adaptive Control Processes: A Guided Tour
(1961)- et al.
Nonparametric discriminant analysis for face recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2009) - et al.
Discriminant common vectors for face recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2005) - et al.
Dimensionality reduction of hyperspectral data via spectral feature extraction
IEEE Transactions on Geoscience and Remote Sensing
(2009) Principal Component Analysis
(2002)Introduction to Statistical Pattern Recognition
(1990)
Cited by (0)
Fei Zang received his master degree from the School of Mathematical Science at the University of Electronic Science and Technology of China in 2007. Now, he is a candidate of the Ph.D. degree in School of Science in Xi’an Jiaotong University, China. He is interested in pattern recognition and image processing.
Jiangshe Zhang was born in 1962. He received his M.S. and Ph.D. degrees in Applied Mathematics from Xi’an Jiaotong University, Xi’an, China, in 1987 and 1993, respectively. He joined Xi’an Jiaotong University, China, in 1987, where he is currently a full Professor in Faculty of Science in Xi’an Jiaotong University, China. Up to now, he has authored and coauthored one monograph and over 50 journal papers on robust clustering, optimization, short-term load forecasting for electric power system, etc. His current research focus is on Bayesian learning, global optimization, computer vision and image processing, support vector machines, neural networks and ensemble learning.