Enhanced regularized least square based discriminative projections for feature extraction

doi:10.1016/j.sigpro.2017.04.018

Signal Processing

Volume 139, October 2017, Pages 182-189

https://doi.org/10.1016/j.sigpro.2017.04.018 Get rights and content

Abstract

The regularized least square based discriminative projections (RLSDP) for extracting features was recently proposed, which aims to seek discriminant projection directions that maximize the between-class scatter and minimize the within-class compactness. However, in RLSDP, the retrieval samples are reconstructed by the coefficients only associated with the same class, and may have large errors. Moreover, the distances between each sample and other within-class samples characterize the most important within-class compactness information, and are not minimized in RLSDP. To deal with the above two problems, we propose an enhanced regularized least square based discriminative projections (ERLSDP). ERLSDP utilizes all the related coefficients of each sample for reconstruction and explicitly minimizes the distances between all the within-class samples, and thus it has better reconstruction accuracy and more discriminating power than that of RLSDP. Experimental results demonstrate that ERLSDP gets a clear improvement over RLSDP when the training sample size is small.

Introduction

Feature extraction, which aims to produce compact and effective low-dimensional feature representations of high-dimensional data, has been extensively studied over the past several decades. Compared with the global based principal component analysis (PCA) [1] and linear discriminant analysis (LDA) [2] approaches, manifold learning methods are more appealing since they can discover the local intrinsic structure of data. Representative manifold learning methods include locality preserving projections (LPP) [3], locality preserving discriminant projections (LPDP) [4], discriminative locality alignment (DLA) [5], discriminant locality preserving projections (DLPP) [6], marginal Fisher analysis (MFA) [7], etc. Although their motivations are different, they all can be unified in the graph embedding (GE) framework [7], and their differences lie in graph construction. Manifold learning has found its wide applications in various fields. For example, Li et al. [8] developed a discriminative distance metric learning (DML) algorithm based on manifold learning, and further derived a distributed and parallel computational scheme to deal with the large-scale metric learning problem. Reference [9] exploited the manifold learning method to analyze multivariate variable-length sequence data. Gao et al. [10] integrated local and global manifold structures for face and image classification.

Recently, sparse representation has shown its promising performance in many domains [11], [12], [13], [14], [15]. For instance, Wright et al. [11] proposed a sparse representation based classification (SRC) for face recognition. Zhou et al. [12] proposed a double shrinking algorithm (DSA) for sparse projection eigenvectors. Moreover, many research efforts [16], [17] had shown that the neighborhood relationship of each data could be adaptively obtained by sparse representation methods, and the resulted ℓ₁-graph was robust to noise. Based on ℓ₁-graph, Qiao et al. [16] proposed a sparsity preserving projections (SPP) for feature extraction, which aims at preserving the sparse reconstruction relationship of the data both in original space and low-dimensional embedding space. By combining the supervised SPP and maximum margin criterion, Gui et al. [18] introduced a discriminant sparse neighborhood preserving embedding (DSNPE) algorithm. Gao et al. [10] gave a discriminative sparsity preserving projections (DSPP), which first employs sparse representation to build an intrinsic graph and a penalty graph, and then integrates global within-class structure for dimensionality reduction. Despite their good performance, sparse representation methods need to solve ℓ₁ norm minimization problem, which has higher computational complexity.

Zhang et al. [19], [20] claimed that the collaborative representation mechanism was the key factor for the success of SRC, and proposed a collaborative representation based classification (CRC) method. CRC replaces the ℓ₁ norm in SRC with simpler ℓ₂ norm. CRC has similar properties and competitive classification performance to SRC. Based on CRC, Yang et al. [21] constructed a ℓ₂-graph and developed a collaborative representation based projections (CRP) to preserve the collaborative reconstruction relationship of the data. Yin et al. [22] proposed a collaborative representation reconstruction based projections (CRRP). The projection matrix in CRRP is obtained by maximizing the collaborative reconstruction between-class scatter and minimizing the collaborative reconstruction within-class scatter. Another method was proposed in [23], which is similar to [22]. In [24], Yang et al. developed a regularized least square based discriminative projections (RLSDP). It maximizes the between-class scatter adopted by LDA and minimizes the within-class compactness by the reconstruction residual from the same class. However, RLSDP includes two main problems. First, the reconstructions by the coefficients only corresponding to the same class will have large errors, and thus RLSDP cannot give the best reconstruction for each sample. Second, it does not minimize the distances between each sample and other within-class samples, which is important for minimizing the within-class compactness.

To address the above two problems, we propose an enhanced regularized least square based discriminative projections (ERLSDP). In ERLSDP, each sample is reconstructed by all the associated coefficients, which results in smaller reconstruction error. More importantly, the distances between each sample and all its reconstructed within-class samples, which characterize the most important within-class compactness, are minimized. The optimal discriminant projection of ERLSDP is achieved by maximizing the between-class scatter and minimizing the within-class compactness simultaneously. Experiments on three face databases indicate that our ERLSDP performs better than RLSDP.

The main contributions of our work are as follows. (1) We make use of the whole representation coefficients to reconstruct each sample. In contrast, the original RLSDP only utilizes the partial representation coefficients corresponding to the same class for reconstruction of each sample. Thus, our ERLSDP achieves smaller reconstruction error and better classification performance. (2) We build a weight matrix to explicitly characterize the within-class geometry of the data, and minimize the distances between all the within-class samples. Meanwhile, by maximizing the between-class scatter, the samples sharing the same class label will be pulled together and those from different classes will be pushed apart, which is a very desirable property for classification tasks.

The rest of this paper is structured as follows. In Section 2, the regularized least square (RLS) and RLSDP are briefly reviewed. The proposed ERLSDP is detailed in Section 3. The experimental results are illustrated in Section 4, and the conclusions are given in Section 5.

Section snippets

RLS and RLSDP

Given a set of n training samples $X = [x_{1}, x_{2}, \dots, x_{n}] \in R^{m \times n}$ with C classes, where $x_{i} \in R^{m}$ is the ith sample. Based on the class labels, X can also be portioned as $X = [X_{1}, X_{2}, \dots, X_{C}]$ , where $X_{c} = [x_{1}^{c}, x_{2}^{c}, \dots, x_{n_{c}}^{c}] \in R^{m \times n_{c}}$ contains the samples associated with class c, $x_{j}^{c}$ denotes the jth sample in the cth class and n_c is the number of samples in class c.

Motivations

It is seen from Eq. (2) that to minimize the within-class compactness, RLSDP only minimizes the reconstruction error of each sample x_i and its reconstructed form by the coefficients $s_{i}^{+}$ , where the element of $s_{i}^{+}$ is defined in Eq. (4). There are mainly two problems. First, using $s_{i}^{+}$ to reconstruct x_i will have larger error since these non-zero values in $s_{i}^{+}$ only associated with the same class as x_i. Second, RLSDP neglects the within-class geometry which is very important for characterizing the

Experimental results

To show the effectiveness of ERLSDP, we compare it with CRP [21], LDA [2], MFA [7], DSNPE [18], DLPP [6], CRRP [23] and RLSDP [24] on three face databases, namely ORL [25], AR [26] and FERET [27]. For MFA, we empirically set the neighbor parameters k₁ as $n_{i} - 1$ and select k₂ from {1C, 3C, 5C, 7C, 9C}, where n_i and C are the number of training samples in class i and the number of classes, respectively. The public available solver ℓ₁-magic (http://users.ece.gatech.edu/∼justin/l1magic/) is used for

Conclusions and future work

In this paper, we propose an ERLSDP for feature extraction. Compared with the original RLSDP, ERLSDP utilizes all the corresponding coefficients of each sample so that it achieves better reconstruction accuracy. In addition, ERLSDP also explicitly minimizes the distances between all the within-class samples at the same time. Therefore, it is able to make the within-class samples more compact that is desirable for classification. Experimental results on three face databases validate its

Acknowledgment

This work was supported by the National Natural Science Foundation of China under Grant 61271293.

References (38)

J. Gui et al.
Locality preserving discriminant projections for face and palmprint recognition
Neurocomputing
(2010)
W. Yu et al.
Face recognition using discriminant locality preserving projections
Image Vis. Comput.
(2006)
Q. Gao et al.
Discriminative sparsity preserving projections for image recognition
Pattern Recognit
(2015)
H. Cheng et al.
Sparse representation and learning in visual recognition: theory and applications
Signal Process
(2013)
S. Huang et al.
Class specific sparse representation for classification
Signal Process
(2015)
L. Qiao et al.
Sparsity preserving projections with applications to face recognition
Pattern Recognit
(2010)
J. Gui et al.
Discriminant sparse neighborhood preserving embedding for face recognition
Pattern Recognit
(2012)
W. Yang et al.
A collaborative representation based projections method for feature extraction
Pattern Recognit
(2015)
J. Yin et al.
Optimized projection for collaborative representation based classification and its applications to face recognition
Pattern Recognit. Lett.
(2016)
J. Hua et al.
Dimension reduction using collaborative representation reconstruction based projections
Neurocomputing
(2016)

W. Yang et al.

A regularized least square based discriminative projections for feature extraction

Neurocomputing

(2016)

P.J. Phillips et al.

The FERET database and evaluation procedure for face-recognition algorithms

Image Vis. Comput.

(1998)

G.-F. Lu et al.

L1-norm and maximum margin criterion based discriminant locality preserving projections via trace Lasso

Pattern Recognit

(2016)

C.-X. Ren et al.

Robust classification using ℓ_{2, 1}-norm based regression model

Pattern Recognit

(2012)

M. Turk et al.

Eigenfaces for recognition

J. Cogn. Neurosci.

(1991)

P.N. Belhumeur et al.

Eigenfaces vs. fisherfaces: recognition using class specific linear projection

IEEE Trans. Pattern Anal. Mach. Intell.

(1997)

X. He et al.

Face recognition using Laplacianfaces

IEEE Trans. Pattern Anal. Mach. Intell.

(2005)

T. Zhang et al.

Patch alignment for dimensionality reduction

IEEE Trans. Knowl. Data Eng.

(2009)

S. Yan et al.

Graph embedding and extensions: a general framework for dimensionality reduction

IEEE Trans. Pattern Anal. Mach. Intell.

(2007)

Cited by (6)

Multiple discriminant analysis for collaborative representation-based classification
2021, Pattern Recognition
Collaborative Representation-based Classifier (CRC) has shown its advantages and impressive results in face recognition. To further imporve the performance of CRC, we propose a novel dimensionality reduction method termed Multiple Discriminant Analysis for Collaborative Representation-based Classification (MDA-CRC). Considering the labeling criterion of CRC is class-specific, MDA-CRC solves a group of binary classification problems where specific feature subspaces are learned for each class. In each binary classification problem, an orthogonal discriminant analysis method based on collaborative representation is adopted. Hence, MDA-CRC can improve the discriminant ability of collaborative representation and be consistent with the labeling criterion of CRC simultaneously. Further, the convergence of MDA-CRC is proven. Extensive experiments on several benchmark datasets demonstrate the effectiveness of MDA-CRC.
Dimensionality reduction by collaborative preserving Fisher discriminant analysis
2019, Neurocomputing
Citation Excerpt :
RLSDP obtains a discriminant subspace by maximizing the between-class scatter in LDA and minimizing the collaborative reconstruction error from the same class simultaneously. Nevertheless, RLSDP fails to minimize the distances between the samples with the same class label which characterizes the most important compactness information [52], and it also has the same limitation as LDA that the number of available projection axes is less than the number of classes. In addition, joint discriminative dimensionality reduction and dictionary learning (JDDRDL) method [53] coupled the discriminative DR and dictionary learning into a unified energy minimization framework, which further enhances the representation accuracy and discriminant ability of CRC.
Sparse representation-based classifier (SRC) and collaborative representation-based classifier (CRC) are two commonly used classifiers. There has been pointed out that the utilization of all the training samples in representing a query sample (i.e. the least square part), which reflects the collaborative representation mechanism of SRC and CRC, is more important than the norm constraint on the coding coefficients for classification. From this perspective, both SRC and CRC can be viewed as collaborative representation (CR) but with different norm (i.e. L1 and L2) constraints on the coding coefficients. In this paper, two collaborative preserving Fisher discriminant analysis approaches are proposed for linear dimensionality reduction, in which both the local geometric information hidden in the CR coefficients and the global discriminant information inherited from Fisher/linear discriminant analysis (FDA/LDA) are effectively fused. Specifically, a datum adaptive graph is first built via CR with L1 or L2 norm constraint (corresponding to L1CPFDA and L2CPFDA, respectively), and then incorporated into the LDA framework to seek a powerful projection subspace with analytic solution. Both theoretical and experimental analysis of L1CPFDA and L2CPFDA show that they can best preserve the collaborative reconstruction relationship of the data and discriminate samples of different classes as well. Moreover, LDA is a special case of L1CPFDA and L2CPFDA and the available number of projection directions of them are twice that of LDA empirically. Experimental results on ORL, AR and FERET face databases and COIL-20 object database demonstrate their effectiveness, especially in low dimensions and small training sample size.
Jointly discriminative projection and dictionary learning for domain adaptive collaborative representation-based classification
2019, Pattern Recognition
Citation Excerpt :
First, we verify the effectiveness of proposed optimization method in JD2-CRC. Some state-of-the-art subspace learning algorithms, including SRC-DP [20], OP-CRC [24], LDA [15], PCA [14], CRP [21], KCRP [23], ERLSDP [42] and CRLDP [22], are used to compared on several face image datasets (AR face image dataset [53], the extend Yale B face image dataset [54] and CMU PIE face image dataset [55]) and a digit image dataset (MNIST digit dataset [56]). In these experiments, we only consider the standard machine learning problem, i.e., the testing data have the same distribution with the training data.
In recent years, collaborative representation-based classification (CRC) methods have shown impressive performance in many recognition tasks. However, when the training data have different distributions with the testing data, the performance of CRC will be degraded significantly. On the other hand, concatenating training data from different sources as a single data set will affect the performance of CRC, as the shift exists between the different source domains. To address these problems, in this paper, we propose a Jointly Discriminative projection and Dictionary learning for domain adaptive Collaborative Representation-based Classification method (JD²-CRC). As the distributions of different source domains may be dissimilar, the data from all domains are projected into a common feature subspace where the latent shared structures can be found. Then a compact dictionary is learned to represent the projected data well. To find the most suitable projection matrices and dictionary for CRC, we design the objective function of JD²-CRC,according to the classification rule of CRC in feature subspace, which minimizes the ratio of within-class reconstruction errors over between-class reconstruction errors. Different to traditional optimization methods, an effective optimization procedure is presented based on gradient descent. Thus, the obtained collaborative representations have a better discriminability and suit the classification rule of CRC well. The experimental results demonstrate that the proposed method can achieve superior performance against other state-of-the-art methods.
A Cooperative Block-variant Monitoring Mechanism Based on Spectral Clustering for Internet of Things
2020, Journal of Physics: Conference Series
Joint low-rank project embedding and optimal mean principal component analysis
2020, IET Image Processing
An Optimized Residual Network with Block-soft Clustering for Road Extraction from Remote Sensing Imagery
2019, Proceedings of 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2019

View full text

Short communicationEnhanced regularized least square based discriminative projections for feature extraction

Abstract

Introduction

Section snippets

RLS and RLSDP

Motivations

Experimental results

Conclusions and future work

Acknowledgment

Neurocomputing

Image Vis. Comput.

Pattern Recognit

Signal Process

Signal Process

Pattern Recognit

Pattern Recognit

Pattern Recognit

Pattern Recognit. Lett.

Neurocomputing

Neurocomputing

Image Vis. Comput.

Pattern Recognit

Pattern Recognit

Eigenfaces for recognition

J. Cogn. Neurosci.

Eigenfaces vs. fisherfaces: recognition using class specific linear projection

IEEE Trans. Pattern Anal. Mach. Intell.

Face recognition using Laplacianfaces

IEEE Trans. Pattern Anal. Mach. Intell.

Patch alignment for dimensionality reduction

IEEE Trans. Knowl. Data Eng.

Graph embedding and extensions: a general framework for dimensionality reduction

IEEE Trans. Pattern Anal. Mach. Intell.

Short communication
Enhanced regularized least square based discriminative projections for feature extraction