Dimensionality reduction by collaborative preserving Fisher discriminant analysis

doi:10.1016/j.neucom.2019.05.014

Neurocomputing

Volume 356, 3 September 2019, Pages 228-243

https://doi.org/10.1016/j.neucom.2019.05.014 Get rights and content

Highlights

•
Our proposed L1CPFDA/L2CPFDA can not only preserve the collaborative reconstruction relationship of the data, but also hold the strong discriminant power as LDA simultaneously.
•
Both theoretical and experimental analyses reveal that the number of available projection directions of L1CPFDA and L2CPFDA is twice that of LDA.
•
L1CPFDA and L2CPFDA are more stable with the variation of the dimensions, and also outperform many compared methods especially in low dimensions and small sample size case.

Abstract

Sparse representation-based classifier (SRC) and collaborative representation-based classifier (CRC) are two commonly used classifiers. There has been pointed out that the utilization of all the training samples in representing a query sample (i.e. the least square part), which reflects the collaborative representation mechanism of SRC and CRC, is more important than the norm constraint on the coding coefficients for classification. From this perspective, both SRC and CRC can be viewed as collaborative representation (CR) but with different norm (i.e. L1 and L2) constraints on the coding coefficients. In this paper, two collaborative preserving Fisher discriminant analysis approaches are proposed for linear dimensionality reduction, in which both the local geometric information hidden in the CR coefficients and the global discriminant information inherited from Fisher/linear discriminant analysis (FDA/LDA) are effectively fused. Specifically, a datum adaptive graph is first built via CR with L1 or L2 norm constraint (corresponding to L1CPFDA and L2CPFDA, respectively), and then incorporated into the LDA framework to seek a powerful projection subspace with analytic solution. Both theoretical and experimental analysis of L1CPFDA and L2CPFDA show that they can best preserve the collaborative reconstruction relationship of the data and discriminate samples of different classes as well. Moreover, LDA is a special case of L1CPFDA and L2CPFDA and the available number of projection directions of them are twice that of LDA empirically. Experimental results on ORL, AR and FERET face databases and COIL-20 object database demonstrate their effectiveness, especially in low dimensions and small training sample size.

Introduction

Dimensionality reduction (DR), as an effective means for data analysis tasks such as visualization, classification and clustering, has demonstrated its potential in many domains. The past several decades have witnessed the rapid development of DR techniques [1], [2], [3], [4], [5], [6]. Among them, principal component analysis (PCA) [1] and Fisher/linear discriminant analysis (FDA/LDA) [2], are two of the most well-known ones since their simplicity and effectiveness. Different from the unsupervised PCA which is the optimal data representation method, LDA aims at seeking discriminant projection axes that separate patterns with different class labels. Despite its success in practice, LDA still suffers from some limitations. For instance, the number of the available projection axes of LDA is theoretically less than the number of classes, which is due to the fact that the rank of the between-class scatter matrix is at most $C - 1$ , where C is the number of classes. Note that both PCA and LDA can only see the global Euclidean structure of data, so that they cannot accommodate to the data with complex nonlinear structure. Motivated by the kernel trick in support vector machines (SVM) [7], some nonlinear methods have also been proposed such as kernel PCA (KPCA) [8], kernel Fisher discriminant (KFD) [9] and eigenspectrum-based regularized KDA algorithm (ER-KDA) [10]. However, these methods do not explicitly consider the local structure of the data, which is of great importance for classification [11], [12], [13].

Previous research efforts have shown that natural images, especially face images, are possibly lying on or close to a low-dimensional submanifold embedded in the original high-dimensional ambient space [14], [15], [16]. A large family of manifold learning algorithms has been developed since then. The representative methods are some unsupervised ones like Laplacian eigenmap (LE) [16], locally linear embedding (LLE) [14], locality preserving projections (LPP) [12] and neighborhood preserving embedding (NPE) [13], and some supervised ones, such as marginal Fisher analysis (MFA) [17], double adjacency graphs-based discriminant neighborhood embedding (DAG-DNE) [18] and others presented in [19], [20], [21]. However, these methods are implemented mainly from the local point of view, while the global structure of the data also benefits for classification. Recently, some works [11], [22], [23], [24], [25] have attempted to explore both the local and global structure of the data for classification. For example, locality preserving discriminant projections (LPDP) [11] and locally linear discriminant embedding (LLDE) [22] improved LPP and NPE by combining the modified maximum margin criterion (MMC) with LPP and NPE, respectively. By combining the idea of LPP and LDA, Sugiyama [23] presented a local Fisher discriminant analysis (LFDA) to deal with the multimodal distribution data. Although their motivations are different, all these approaches can be explained and understood in a unified view by the general framework called graph embedding (GE) [17], where graph construction plays a key role. In these methods, the graphs are constructed by k-nearest neighbor (k-NN) or ɛ-ball criterion, which first seeks the neighbors of each sample manually and then assigns the corresponding edge weights by Gaussian kernel [16] or local reconstruction based approach [14]. However, the above two graph construction schemes suffer from some weaknesses as follows [26], [27]. Firstly, they usually specify the same predefined parameter k or ɛ for all the samples, which may fail to capture the intrinsic structures of the data since the local structures are varied between samples. Secondly, the value of parameter k or ɛ usually has great impact on the performance of the final tasks [28], [29] and the neighbors of each sample are also sensitive to noises and outliers. Finally, they separate the sample neighborhood selection and edge weight assignment into two independent steps, leading to suboptimal graphs.

The authors in [26], [30], [31] argued that the sample neighborhood selection and edge weight assignment are interrelated steps and thus should not be derived separately. Recently, sparse representation (SR) [32], which reconstructs each sample by a sparse linear combination of all training samples using L1 norm optimization, has received considerable interest. The process of L1 norm optimization automatically produces a sparse solution, which can be used as the indication of the neighbors for each sample. Moreover, the resulted sparse coefficients mainly characterize the locality of the data which essentially reflects the relation among the samples and thus can be utilized as the edge weights of the graph. That is to say, the sample neighborhood selection and edge weight assignment can be performed within one step by SR. Based on SR, Wright et al. [32] gave a sparse representation based classification (SRC) for face recognition. Qiao et al. [31] proposed sparsity preserving projections (SPP), whose aim is to preserve the sparse reconstruction relationship of the data. Yan et al. [26], [30] advocated L1-graph for semi-supervised learning, clustering and subspace learning, whose core idea is similar to SPP. As an extension of SPP, graph optimization for dimensionality reduction with sparsity constraints (GODRSC) [33] learnt the L1 graph and the projection matrix iteratively. With the available label information of the training samples, Yang et al. [34] developed an iterative SRC steered discriminative projection (SRC-DP) approach for feature extraction, which is designed according to the decision rule of SRC. The optimal linear transformation of SRC-DP is achieved by maximizing the between-class reconstruction residual and minimizing the within-class reconstruction residual in the low-dimensional projected space. By using SRC as classifier, SRC-DP performs very well in face recognition. Gui et al. [35] gave discriminant sparse neighborhood preserving embedding (DSNPE) by integrating class-specific sparse graph and MMC [5] for face recognition. However, if the number of the training samples in each class is small, the class-specific SR may be inaccurate. Besides seeking sparse graphs for DR, some approaches devote to pursuing the sparse projection vectors, such as sparse PCA (SPCA) [36], double shrinking algorithm (DSA) [37] and sparse two-dimensional locality discriminant projections (S2DLDP) [38], and some others adopt SR to learn discriminant dictionaries for better classification [39], [40]. Although SR based approaches have achieved impressive results in various fields, their computational complexity can be very high since they need to solve the L1 norm optimization problem iteratively. Moreover, the working mechanism of SR is also not fully revealed which needs further investigation.

Great efforts have been made on the working mechanism of SR [41], [42], [43], [44]. For example, Zhang et al. [43], [44] claimed that it is the collaborative representation mechanism, rather than the L1 norm constraint on the representation coefficients, that contributes more to the success of SRC for face recognition. In other words, the collaborative representation mechanism is reflected by the use of the linear combination of all the training samples in representing a query sample (i.e. the least square part), but not the norm constraint on the coding coefficients (i.e., the regularization part). Based on this finding, they presented collaborative representation based classification (CRC) [43] by just replacing the expensive L1 norm in SRC with cheaper L2 norm. From this point of view, both SRC and CRC can be viewed as collaborative representation essentially but with different norm constraints (i.e. L1 and L2 norm respectively), and belong to the regularized least square framework. Despite that the L2 norm in CRC introduces much weaker sparsity on the representation coefficients than that of L1 norm, L2 norm regularization in CRC not only can be derived more efficiently in closed form but also leads to very competitive performance [44]. Some kernel variants of CRC were also presented in [45], [46], [47] to further boost the performance of CRC. As indicated in [31], [48], CR mainly characterizes the local geometric information of the data as that of SR. To make full use of these nice properties, Yang et al. [48] first built a L2 graph by CR (with L2 norm regularization) and then designed an unsupervised collaborative representation based projections (CRP) method to preserve the collaborative reconstruction relationship of the data. In [49], Hua et al. proposed a collaborative representation reconstruction based projections (CRRP) for DR by considering the classification rule of CRC. A similar method was also presented in [50]. The authors in [51] developed a supervised method called regularized least square based discriminative projections (RLSDP) for feature extraction. RLSDP obtains a discriminant subspace by maximizing the between-class scatter in LDA and minimizing the collaborative reconstruction error from the same class simultaneously. Nevertheless, RLSDP fails to minimize the distances between the samples with the same class label which characterizes the most important compactness information [52], and it also has the same limitation as LDA that the number of available projection axes is less than the number of classes. In addition, joint discriminative dimensionality reduction and dictionary learning (JDDRDL) method [53] coupled the discriminative DR and dictionary learning into a unified energy minimization framework, which further enhances the representation accuracy and discriminant ability of CRC. However, it has up to 7 hyperparameters (including the size of the dictionary), which makes it hard for real applications.

In this paper, two collaborative preserving Fisher discriminant analysis (CPFDA) approaches are proposed for DR. We denote them by L1CPFDA and L2CPFDA hereafter according to the norms (i.e., L1 and L2) applied on the coding coefficients in the graph construction. More specifically, a L1 graph (corresponding to L1 norm) or L2 graph (corresponding to L2 norm) is first built via least square with different norm regularizations, and subsequently incorporated into the LDA framework to search for a discriminant projection subspace. We summarize several characteristics of our proposed L1CPFDA and L2CPFDA as follows:

•
Our proposed L1CPFDA and L2CPFDA adopt CR with different norm regularizations (i.e., L1 and L2) on the coding coefficients for L1 graph and L2 graph constructions respectively, and the whole processes are data adaptive and there is no need to specify the local neighborhood parameters.
•
Both L1CPFDA and L2CPFDA can effectively explore the local geometric information hidden in the CR coefficients and the global discriminant information inherited from LDA simultaneously, since L1 graph [31] and L2 graph [48] mainly character the local geometric structure and LDA well captures the global discriminant structure of the data set. Theoretical and experimental analysis of L1CPFDA and L2CPFDA also reveal that they can best preserve the collaborative reconstruction relationship of the data and discriminate samples of different classes as well.
•
Our L1CPFDA and L2CPFDA are able to obtain more meaningful projection vectors than that of LDA, thanks to the localization properties of L1 graph and L2 graph. Further investigation of the ranks of the corresponding scatter matrices shows that the achievable number of the projection directions of the proposed approaches is generally twice that of LDA. Moreover, both L1CPFDA and L2CPFDA cast LDA as a special case.

The remainder of this paper is organized as follows. In Section 2, we briefly review CRP and LDA. The unified CR framework, the proposed L1CPFDA and L2CPFDA, along with their properties, computational complexity and connections with other methods are presented in Section 3. The experimental results are presented in Section 4. Finally, the conclusion remarks and future work are given in Section 5.

Section snippets

Related works

Suppose there are n training samples depicted as $X = [x_{1}, x_{2}, \dots, x_{n}] \in R^{m \times n}$ belonging to C classes, where $x_{i} \in R^{m}$ is the ith sample. Let n_c be the number of samples in the cth class, and $\sum_{c = 1}^{C} n_{c} = n$ . In what follows, we make a brief review of the representative CRP and LDA methods.

Collaborative preserving Fisher discriminant analysis (CPFDA)

Here, we first present a unified collaborative representation framework (UCRF) to cover both CR and SR. Then, we put forward two CPFDA approaches (i.e., L1CPFDA and L2CPFDA) in detail, and analyze their computational complexity and discuss their fundamental properties. Finally, we make a comparison between our proposed methods with other related ones.

Experimental results

In this section, we conduct experiments on ORL, AR, FERET face databases and object database COIL20 to evaluate the effectiveness of our proposed L1CPFDA and L2CPFDA.

Conclusions and further work

We have presented in this paper two supervised dimensionality reduction methods, coined collaborative preserving Fisher discriminant analysis (L1CPFDA/L2CPFDA), for image recognition tasks. In L1CPFDA/L2CPFDA, a L1/L2 graph is first built by collaborative representation with L1/L2 norm regularization, which has closed form solution and can be derived efficiently. Different from MFA and LFDA, both L1/L2 graph is datum adaptive and the manual selection of the neighbors for each sample is avoided.

Conflict of interest

None.

Acknowledgments

The authors would like to thank Editors and anonymous reviewers for their valuable comments and suggestions to improve the quality of this paper. This work was supported by the National Natural Science Foundation of China under Grants 61271293 and 61803293.

Ming-Dong Yuan received the B.S. Degree in Agricultural Electrification and Automation from Sichuan Agricultural University, Ya'an, China, in 2009, and the Ph.D. degree in Signal and Information Processing at Xidian University, Xi'an, China, in 2017. Now, he is with CETC Key Laboratory of Smart City Modeling Simulation and Intelligent Technology, Shenzhen, China. His current research interests involve subspace learning, feature selection, matrix factorization, sparse and low-rank representation.

References (71)

A. Hyvärinen et al.
Independent component analysis: algorithms and applications
Neural Netw.
(2000)
GuiJ. et al.
Locality preserving discriminant projections for face and palmprint recognition
Neurocomputing
(2010)
DingC. et al.
Double adjacency graphs-based discriminant neighborhood embedding
Pattern Recognit.
(2015)
GaoQ. et al.
Stable locality sensitive discriminant analysis for image recognition
Neural Netw.
(2014)
LiB. et al.
Constrained discriminant neighborhood embedding for high dimensional data feature extraction
Neurocomputing
(2016)
LiB. et al.
Locally linear discriminant embedding: an efficient method for face recognition
Pattern Recognit.
(2008)
ZhangD. et al.
Global plus local: a complete framework for feature extraction and recognition
Pattern Recognit.
(2014)
YangB. et al.
Sample-dependent graph construction with application to dimensionality reduction
Neurocomputing
(2010)
F. Dornaika et al.
Adaptive graph construction using data self-representativeness for pattern classification
Inf. Sci.
(2015)
QiaoL. et al.
Sparsity preserving projections with applications to face recognition
Pattern Recognit.
(2010)

ZhangL. et al.

Graph optimization for dimensionality reduction with sparsity constraints

Pattern Recognit.

(2012)

GuiJ. et al.

Discriminant sparse neighborhood preserving embedding for face recognition

Pattern Recognit.

(2012)

LaiZ. et al.

Sparse two-dimensional local discriminant projections for feature extraction

Neurocomputing

(2011)

YangM. et al.

Gabor feature based robust representation and classification for face recognition with Gabor occlusion dictionary

Pattern Recognit.

(2013)

YangJ. et al.

Beyond sparsity: the role of L1-optimizer in pattern classification

Pattern Recognit.

(2012)

YangW. et al.

Image classification using kernel collaborative representation with regularized least square

Appl. Math. Comput.

(2013)

WangD. et al.

Kernel collaborative face recognition

Pattern Recognit.

(2015)

YangW. et al.

A collaborative representation based projections method for feature extraction

Pattern Recognit.

(2015)

HuaJ. et al.

Dimension reduction using collaborative representation reconstruction based projections

Neurocomputing

(2016)

YinJ. et al.

Optimized projection for collaborative representation based classification and its applications to face recognition

Pattern Recognit. Lett.

(2016)

YangW. et al.

A regularized least square based discriminative projections for feature extraction

Neurocomputing

(2016)

YuanM.-D. et al.

Enhanced regularized least square based discriminative projections for feature extraction

Signal Process.

(2017)

FengZ. et al.

Joint discriminative dimensionality reduction and dictionary learning for face recognition

Pattern Recognit.

(2013)

YuanM.-D. et al.

Collaborative representation discriminant embedding for image classification

J. Vis. Commun. Image Represent.

(2016)

P.J. Phillips et al.

The FERET database and evaluation procedure for face-recognition algorithms

Image Vis. Comput.

(1998)

M. Turk et al.

Eigenfaces for recognition

J. Cogn. Neurosci.

(1991)

P.N. Belhumeur et al.

Eigenfaces vs. fisherfaces: recognition using class specific linear projection

IEEE Trans. Pattern Anal. Mach. Intell.

(1997)

LeeD.D. et al.

Learning the parts of objects by non-negative matrix factorization

Nature

(1999)

GuanN. et al.

NeNMF: an optimal gradient method for nonnegative matrix factorization

IEEE Trans. Signal Process.

(2012)

LiH. et al.

Efficient and robust feature extraction by maximum margin criterion

IEEE Trans. Neural Netw.

(2006)

M.A. Hearst et al.

Support vector machines

IEEE Intell. Syst. Appl.

(1998)

Kwang InK. et al.

Face recognition using kernel principal component analysis

IEEE Signal Process. Lett.

(2002)

S. Mika et al.

Fisher discriminant analysis with kernels

S. Zafeiriou et al.

Regularized kernel discriminant analysis with a robust kernel for face recognition and verification

IEEE Trans. Neural Netw. Learn. Syst.

(2012)

HeX. et al.

Face recognition using Laplacianfaces

IEEE Trans. Pattern Anal. Mach. Intell.

(2005)

Cited by (0)

Da-Zheng Feng received the Diploma degree from Xi'an University of Technology, Xi'an, China, in 1982, the M. S. degree from Xi'an Jiaotong University, Xi'an, China, in 1986, and the Ph.D. degree in Electronic Engineering from Xidian University, Xi'an, China, in 1995. From May 1996 to May 1998, he was a Postdoctoral Research Affiliate with Xi'an Jiaotong University, China. From May 1998 to June 2000, he was an Associate Professor with Xidian University. Since July 2000, he has been a Professor at Xidian University. His current research interests include signal processing, intelligence and brain information processing, and radar techniques.

Ya Shi received the B.S. Degree in Electronic and Information Engineering from Xidian University, Xi'an, China, in 2008, and the M.S. Degree in Pattern Recognition and Intelligent System from Xidian University in 2011, and the Ph.D. Degree in Pattern Recognition and Intelligent System from the same university in 2015. Since then, she has been a lecturer in Xi'an University of Architecture and Technology, Xi'an, China. Her current research interests include Machine Learning and Pattern Recognition.

Wen-Juan Liu received the B.S. Degree in Electronic and Information Engineering from Xidian University, Xi'an, China, in 2009, and the Ph.D. degree in Signal and Information Processing from the same university, in 2016. After graduation, she joined Beijing Xiaomi Intelligent Technology Co., Ltd, and currently works as a Signal Processing Engineer for the design of Binaural Speech Separation System in Reverberant Environments. Her current research interests include blind speech separation, adaptive signal processing and statistical signal processing.

View full text

Dimensionality reduction by collaborative preserving Fisher discriminant analysis

Highlights

Abstract

Introduction

Section snippets

Related works

Collaborative preserving Fisher discriminant analysis (CPFDA)

Experimental results

Conclusions and further work

Conflict of interest

Acknowledgments

Neural Netw.

Neurocomputing

Pattern Recognit.

Neural Netw.

Neurocomputing

Pattern Recognit.

Pattern Recognit.

Neurocomputing

Inf. Sci.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Neurocomputing

Pattern Recognit.

Pattern Recognit.

Appl. Math. Comput.

Pattern Recognit.

Pattern Recognit.

Neurocomputing

Pattern Recognit. Lett.

Neurocomputing

Signal Process.

Pattern Recognit.

J. Vis. Commun. Image Represent.

Image Vis. Comput.

Eigenfaces for recognition

J. Cogn. Neurosci.

Eigenfaces vs. fisherfaces: recognition using class specific linear projection

IEEE Trans. Pattern Anal. Mach. Intell.

Learning the parts of objects by non-negative matrix factorization

Nature

NeNMF: an optimal gradient method for nonnegative matrix factorization

IEEE Trans. Signal Process.

Efficient and robust feature extraction by maximum margin criterion

IEEE Trans. Neural Netw.

Support vector machines

IEEE Intell. Syst. Appl.

Face recognition using kernel principal component analysis

IEEE Signal Process. Lett.

Fisher discriminant analysis with kernels

Regularized kernel discriminant analysis with a robust kernel for face recognition and verification

IEEE Trans. Neural Netw. Learn. Syst.

Face recognition using Laplacianfaces

IEEE Trans. Pattern Anal. Mach. Intell.