Elsevier

Neurocomputing

Volume 74, Issues 12–13, June 2011, Pages 2176-2183
Neurocomputing

Discriminative learning by sparse representation for classification

https://doi.org/10.1016/j.neucom.2011.02.012Get rights and content

Abstract

Recently, sparsity preserving projections (SPP) algorithm has been proposed, which combines l1-graph preserving the sparse reconstructive relationship of the data with the classical dimensionality reduction algorithm. However, when applied to classification problem, SPP only focuses on the sparse structure but ignores the label information of samples. To enhance the classification performance, a new algorithm termed discriminative learning by sparse representation projections or DLSP for short is proposed in this paper. DLSP algorithm incorporates the merits of both local interclass geometrical structure and sparsity property. That makes it possess the advantages of the sparse reconstruction, and more importantly, it has better capacity of discrimination, especially when the size of the training set is small. Extensive experimental results on serval publicly available data sets show the feasibility and effectiveness of the proposed algorithm.

Introduction

High-dimensional data such as face images and gene microarrays exist in many pattern recognition and data mining applications. It presents many mathematical challenges as well as some opportunities, and bounds to give rise to new theoretical developments [1]. One of the problems is the so-called “curse of dimensionality” [2]. That is to say, the dimensionality of the data is much larger than the number of the samples. But it is interesting that, in many cases, “important” dimensionality is not so much. Thus, before applying some algorithms to high-dimensional data, we need to reduce the dimensionality of the original data. Dimensionality reduction plays a more and more important role in practical data processing and analyzing tasks, which has been used successfully in many machine learning fields such as face recognition [3], [4] and remote sense data analysis [5].

In many existing dimensionality reduction algorithms, principal component analysis (PCA) [6] and liner discriminative analysis (LDA) [7] are two classical subspace learning methods. PCA attempts to seek an optimal projection direction, so that covariance of the data is maximized. PCA is based on low representation of the original high-dimensional data, which does not exploit the label information of samples. Unlike PCA, LDA is a supervised method which seeks an optimal discriminative subspace by maximizing between-class scatter, meanwhile, minimizing within-class scatter. Recently, a number of manifold learning algorithms have been proposed, which are useful to analyze the data that lies on or near a submanifold of the original space. Locally linear embedding (LLE) [8], a Laplacian eigenmaps (LE) [9], neighborhood-preserving embedding (NPE) [10] and locality preserving projections (LPP) [11] are typical representations of the manifold learning algorithms. LLE is an unsupervised learning algorithm which is able to learn the local structure of nonlinear manifold by computing low-dimension, neighborhood-preserving embeddings of the original high-dimensional data. LE preserves proximity relations of pairwise samples by manipulations on an undirected weighted graph. NPE and LPP model the local submanifold structure by maintaining the relation of the neighborhood of the samples from the data set before and after being transformated.

In [12], PCA, LDA and LPP are unified to the graph embedding framework. That is to say, their differences are the graph structure and the edge weights. As a result, graph becomes the heart of many existing dimensionality reduction methods. How to construct a graph is a focus in designing different dimensionality reduction algorithms. k-nearest-neighbor and ɛballneighborhood which are both the usual Euclidean norm are two popular ways of graph construction. They are simple to compute and easy to operate, but two parameters—the size of sample neighborhood k and the radius of the ball ɛ—are sensitive to noise and difficult to be determined in many real problems. Qiao [13] and Cheng [14] make use of sparse representation to construct a novel graph—l1-graph that inherits many merits of sparse reconstruction and reconstructs an adapting graph for lack of model parameters. But they do not use prior knowledge of class identities, that is, they are unsupervised.

In this paper, to enhance the classification performance of SPP algorithm, we propose a new algorithm, called discriminative learning by sparse representation or DLSP for short. For each sample xi,i=1,2,,n, we use the label information to divide the robust sparse representation coefficients by the original data into two groups: one has the same label with the sample xi and the other has the different label from the sample xi. To achieve the better classification performance, our algorithm proposed in this paper incorporates the merits of both local interclass geometrical structure and sparsity property, DLSP demands local within-class compactness, and also needs the samples that have the different label from the sample xi are close to their class centroid, respectively. That makes DLSP possess the advantages of the sparse reconstruction, and more importantly, it has better capacity of discrimination, especially when the size of the training set is small.

The rest of the paper is organized as follows: in Section 2, we briefly review the original l1-graph and SPP algorithm, respectively. We propose our new algorithm in Section 3. In Section 4, the experimental results are given, and we conclude this paper in Section 5.

Section snippets

Briefly review l1-graph and SPP

For convenience, we firstly give some notations used in this paper. Suppose that X=[x1,x2,,xn]Rm×n is a set of m-dimensional samples of size n and composed of c class (each class contains nk,k=1,2,,c samples, k=1cnk=n). l(xk) denotes the class label of sample xk. For example, xk comes from the cth class if l(xk)=c. The kth class centroid is mk=1/nkl(xi)=kxi,k=1,,c. For each sample xiX, the matrix Ai denotes a sample set: Ai=[x1,,xi1,xi+1,,xn]Rm×(n1). The robust sparse representation

Discriminative learning by sparse representation projections

SPP algorithm preserves many merits of LLE and l1-graph, and summarizes the overall behavior of the whole sample set in sparse reconstruction process, but it is an unsupervised learning method which does not use the prior knowledge of class identities. To enhance the classification performance of SPP algorithm, we propose a new method, called discriminative learning by sparse representation projections or DLSP for short, which incorporates the merits of both local geometry structure and global

Experiments

In this section, to evaluate the effectiveness of the proposed algorithm for classification, we compare the proposed discriminative learning by sparse representation projection (DLSP) algorithm with the five dimensionality reduction methods—PCA, LPP, NPE, SPP and LDA on several publicly available face databases including ORL [15], UMIST [16] and JAFFE [17] which involve different illumination, expression and pose changes. In all experiments, we keep 98% information in the sense of

Conclusions

In this paper, based on sparse representation, we have proposed DLSP algorithm, which is a new dimensionality reduction method. DLSP incorporates the merits of both local geometry structure and global sparse property. The advantages of DLSP are as follows: DLSP preserves discriminative information, especially when the size of the training set is small; DLSP reconstructs graph adapting and avoids the difficulty of parameter selection as in LPP and NPE; DLSP performs better than SPP algorithm in

Acknowledgments

The authors would like to express their gratitude to the anonymous referees as well as the Editor and Associate Editor for their valuable comments which lead to substantial improvements of the paper. This work was supported by the National Basic Research Program of China (973 Program)(Grant no. 2007CB311002) and the National Natural Science Foundation of China (Grant nos. 60675013, 10531030, 61075006/F030401).

Fei Zang received his master degree from the School of Mathematical Science at the University of Electronic Science and Technology of China in 2007. Now, he is a candidate of the Ph.D. degree in School of Science in Xi’an Jiaotong University, China. He is interested in pattern recognition and image processing.

References (17)

  • L. Qiao et al.

    Sparsity preserving projections with applications to face recognition

    Pattern Recognition

    (2010)
  • D.L. Donoho, High-dimensional data analysis: the curses and blessings of dimensionality. Lecture delivered at the...
  • R. Bellman

    Adaptive Control Processes: A Guided Tour

    (1961)
  • Z. Li et al.

    Nonparametric discriminant analysis for face recognition

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2009)
  • H. Cevikalp et al.

    Discriminant common vectors for face recognition

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2005)
  • B. Mojaradi et al.

    Dimensionality reduction of hyperspectral data via spectral feature extraction

    IEEE Transactions on Geoscience and Remote Sensing

    (2009)
  • I.T. Jolliffe

    Principal Component Analysis

    (2002)
  • K. Fukunaga

    Introduction to Statistical Pattern Recognition

    (1990)
There are more references available in the full text version of this article.

Cited by (0)

Fei Zang received his master degree from the School of Mathematical Science at the University of Electronic Science and Technology of China in 2007. Now, he is a candidate of the Ph.D. degree in School of Science in Xi’an Jiaotong University, China. He is interested in pattern recognition and image processing.

Jiangshe Zhang was born in 1962. He received his M.S. and Ph.D. degrees in Applied Mathematics from Xi’an Jiaotong University, Xi’an, China, in 1987 and 1993, respectively. He joined Xi’an Jiaotong University, China, in 1987, where he is currently a full Professor in Faculty of Science in Xi’an Jiaotong University, China. Up to now, he has authored and coauthored one monograph and over 50 journal papers on robust clustering, optimization, short-term load forecasting for electric power system, etc. His current research focus is on Bayesian learning, global optimization, computer vision and image processing, support vector machines, neural networks and ensemble learning.

View full text