Elsevier

Neurocomputing

Volume 356, 3 September 2019, Pages 228-243
Neurocomputing

Dimensionality reduction by collaborative preserving Fisher discriminant analysis

https://doi.org/10.1016/j.neucom.2019.05.014Get rights and content

Highlights

  • Our proposed L1CPFDA/L2CPFDA can not only preserve the collaborative reconstruction relationship of the data, but also hold the strong discriminant power as LDA simultaneously.

  • Both theoretical and experimental analyses reveal that the number of available projection directions of L1CPFDA and L2CPFDA is twice that of LDA.

  • L1CPFDA and L2CPFDA are more stable with the variation of the dimensions, and also outperform many compared methods especially in low dimensions and small sample size case.

Abstract

Sparse representation-based classifier (SRC) and collaborative representation-based classifier (CRC) are two commonly used classifiers. There has been pointed out that the utilization of all the training samples in representing a query sample (i.e. the least square part), which reflects the collaborative representation mechanism of SRC and CRC, is more important than the norm constraint on the coding coefficients for classification. From this perspective, both SRC and CRC can be viewed as collaborative representation (CR) but with different norm (i.e. L1 and L2) constraints on the coding coefficients. In this paper, two collaborative preserving Fisher discriminant analysis approaches are proposed for linear dimensionality reduction, in which both the local geometric information hidden in the CR coefficients and the global discriminant information inherited from Fisher/linear discriminant analysis (FDA/LDA) are effectively fused. Specifically, a datum adaptive graph is first built via CR with L1 or L2 norm constraint (corresponding to L1CPFDA and L2CPFDA, respectively), and then incorporated into the LDA framework to seek a powerful projection subspace with analytic solution. Both theoretical and experimental analysis of L1CPFDA and L2CPFDA show that they can best preserve the collaborative reconstruction relationship of the data and discriminate samples of different classes as well. Moreover, LDA is a special case of L1CPFDA and L2CPFDA and the available number of projection directions of them are twice that of LDA empirically. Experimental results on ORL, AR and FERET face databases and COIL-20 object database demonstrate their effectiveness, especially in low dimensions and small training sample size.

Introduction

Dimensionality reduction (DR), as an effective means for data analysis tasks such as visualization, classification and clustering, has demonstrated its potential in many domains. The past several decades have witnessed the rapid development of DR techniques [1], [2], [3], [4], [5], [6]. Among them, principal component analysis (PCA) [1] and Fisher/linear discriminant analysis (FDA/LDA) [2], are two of the most well-known ones since their simplicity and effectiveness. Different from the unsupervised PCA which is the optimal data representation method, LDA aims at seeking discriminant projection axes that separate patterns with different class labels. Despite its success in practice, LDA still suffers from some limitations. For instance, the number of the available projection axes of LDA is theoretically less than the number of classes, which is due to the fact that the rank of the between-class scatter matrix is at most C1, where C is the number of classes. Note that both PCA and LDA can only see the global Euclidean structure of data, so that they cannot accommodate to the data with complex nonlinear structure. Motivated by the kernel trick in support vector machines (SVM) [7], some nonlinear methods have also been proposed such as kernel PCA (KPCA) [8], kernel Fisher discriminant (KFD) [9] and eigenspectrum-based regularized KDA algorithm (ER-KDA) [10]. However, these methods do not explicitly consider the local structure of the data, which is of great importance for classification [11], [12], [13].

Previous research efforts have shown that natural images, especially face images, are possibly lying on or close to a low-dimensional submanifold embedded in the original high-dimensional ambient space [14], [15], [16]. A large family of manifold learning algorithms has been developed since then. The representative methods are some unsupervised ones like Laplacian eigenmap (LE) [16], locally linear embedding (LLE) [14], locality preserving projections (LPP) [12] and neighborhood preserving embedding (NPE) [13], and some supervised ones, such as marginal Fisher analysis (MFA) [17], double adjacency graphs-based discriminant neighborhood embedding (DAG-DNE) [18] and others presented in [19], [20], [21]. However, these methods are implemented mainly from the local point of view, while the global structure of the data also benefits for classification. Recently, some works [11], [22], [23], [24], [25] have attempted to explore both the local and global structure of the data for classification. For example, locality preserving discriminant projections (LPDP) [11] and locally linear discriminant embedding (LLDE) [22] improved LPP and NPE by combining the modified maximum margin criterion (MMC) with LPP and NPE, respectively. By combining the idea of LPP and LDA, Sugiyama [23] presented a local Fisher discriminant analysis (LFDA) to deal with the multimodal distribution data. Although their motivations are different, all these approaches can be explained and understood in a unified view by the general framework called graph embedding (GE) [17], where graph construction plays a key role. In these methods, the graphs are constructed by k-nearest neighbor (k-NN) or ɛ-ball criterion, which first seeks the neighbors of each sample manually and then assigns the corresponding edge weights by Gaussian kernel [16] or local reconstruction based approach [14]. However, the above two graph construction schemes suffer from some weaknesses as follows [26], [27]. Firstly, they usually specify the same predefined parameter k or ɛ for all the samples, which may fail to capture the intrinsic structures of the data since the local structures are varied between samples. Secondly, the value of parameter k or ɛ usually has great impact on the performance of the final tasks [28], [29] and the neighbors of each sample are also sensitive to noises and outliers. Finally, they separate the sample neighborhood selection and edge weight assignment into two independent steps, leading to suboptimal graphs.

The authors in [26], [30], [31] argued that the sample neighborhood selection and edge weight assignment are interrelated steps and thus should not be derived separately. Recently, sparse representation (SR) [32], which reconstructs each sample by a sparse linear combination of all training samples using L1 norm optimization, has received considerable interest. The process of L1 norm optimization automatically produces a sparse solution, which can be used as the indication of the neighbors for each sample. Moreover, the resulted sparse coefficients mainly characterize the locality of the data which essentially reflects the relation among the samples and thus can be utilized as the edge weights of the graph. That is to say, the sample neighborhood selection and edge weight assignment can be performed within one step by SR. Based on SR, Wright et al. [32] gave a sparse representation based classification (SRC) for face recognition. Qiao et al. [31] proposed sparsity preserving projections (SPP), whose aim is to preserve the sparse reconstruction relationship of the data. Yan et al. [26], [30] advocated L1-graph for semi-supervised learning, clustering and subspace learning, whose core idea is similar to SPP. As an extension of SPP, graph optimization for dimensionality reduction with sparsity constraints (GODRSC) [33] learnt the L1 graph and the projection matrix iteratively. With the available label information of the training samples, Yang et al. [34] developed an iterative SRC steered discriminative projection (SRC-DP) approach for feature extraction, which is designed according to the decision rule of SRC. The optimal linear transformation of SRC-DP is achieved by maximizing the between-class reconstruction residual and minimizing the within-class reconstruction residual in the low-dimensional projected space. By using SRC as classifier, SRC-DP performs very well in face recognition. Gui et al. [35] gave discriminant sparse neighborhood preserving embedding (DSNPE) by integrating class-specific sparse graph and MMC [5] for face recognition. However, if the number of the training samples in each class is small, the class-specific SR may be inaccurate. Besides seeking sparse graphs for DR, some approaches devote to pursuing the sparse projection vectors, such as sparse PCA (SPCA) [36], double shrinking algorithm (DSA) [37] and sparse two-dimensional locality discriminant projections (S2DLDP) [38], and some others adopt SR to learn discriminant dictionaries for better classification [39], [40]. Although SR based approaches have achieved impressive results in various fields, their computational complexity can be very high since they need to solve the L1 norm optimization problem iteratively. Moreover, the working mechanism of SR is also not fully revealed which needs further investigation.

Great efforts have been made on the working mechanism of SR [41], [42], [43], [44]. For example, Zhang et al. [43], [44] claimed that it is the collaborative representation mechanism, rather than the L1 norm constraint on the representation coefficients, that contributes more to the success of SRC for face recognition. In other words, the collaborative representation mechanism is reflected by the use of the linear combination of all the training samples in representing a query sample (i.e. the least square part), but not the norm constraint on the coding coefficients (i.e., the regularization part). Based on this finding, they presented collaborative representation based classification (CRC) [43] by just replacing the expensive L1 norm in SRC with cheaper L2 norm. From this point of view, both SRC and CRC can be viewed as collaborative representation essentially but with different norm constraints (i.e. L1 and L2 norm respectively), and belong to the regularized least square framework. Despite that the L2 norm in CRC introduces much weaker sparsity on the representation coefficients than that of L1 norm, L2 norm regularization in CRC not only can be derived more efficiently in closed form but also leads to very competitive performance [44]. Some kernel variants of CRC were also presented in [45], [46], [47] to further boost the performance of CRC. As indicated in [31], [48], CR mainly characterizes the local geometric information of the data as that of SR. To make full use of these nice properties, Yang et al. [48] first built a L2 graph by CR (with L2 norm regularization) and then designed an unsupervised collaborative representation based projections (CRP) method to preserve the collaborative reconstruction relationship of the data. In [49], Hua et al. proposed a collaborative representation reconstruction based projections (CRRP) for DR by considering the classification rule of CRC. A similar method was also presented in [50]. The authors in [51] developed a supervised method called regularized least square based discriminative projections (RLSDP) for feature extraction. RLSDP obtains a discriminant subspace by maximizing the between-class scatter in LDA and minimizing the collaborative reconstruction error from the same class simultaneously. Nevertheless, RLSDP fails to minimize the distances between the samples with the same class label which characterizes the most important compactness information [52], and it also has the same limitation as LDA that the number of available projection axes is less than the number of classes. In addition, joint discriminative dimensionality reduction and dictionary learning (JDDRDL) method [53] coupled the discriminative DR and dictionary learning into a unified energy minimization framework, which further enhances the representation accuracy and discriminant ability of CRC. However, it has up to 7 hyperparameters (including the size of the dictionary), which makes it hard for real applications.

In this paper, two collaborative preserving Fisher discriminant analysis (CPFDA) approaches are proposed for DR. We denote them by L1CPFDA and L2CPFDA hereafter according to the norms (i.e., L1 and L2) applied on the coding coefficients in the graph construction. More specifically, a L1 graph (corresponding to L1 norm) or L2 graph (corresponding to L2 norm) is first built via least square with different norm regularizations, and subsequently incorporated into the LDA framework to search for a discriminant projection subspace. We summarize several characteristics of our proposed L1CPFDA and L2CPFDA as follows:

  • Our proposed L1CPFDA and L2CPFDA adopt CR with different norm regularizations (i.e., L1 and L2) on the coding coefficients for L1 graph and L2 graph constructions respectively, and the whole processes are data adaptive and there is no need to specify the local neighborhood parameters.

  • Both L1CPFDA and L2CPFDA can effectively explore the local geometric information hidden in the CR coefficients and the global discriminant information inherited from LDA simultaneously, since L1 graph [31] and L2 graph [48] mainly character the local geometric structure and LDA well captures the global discriminant structure of the data set. Theoretical and experimental analysis of L1CPFDA and L2CPFDA also reveal that they can best preserve the collaborative reconstruction relationship of the data and discriminate samples of different classes as well.

  • Our L1CPFDA and L2CPFDA are able to obtain more meaningful projection vectors than that of LDA, thanks to the localization properties of L1 graph and L2 graph. Further investigation of the ranks of the corresponding scatter matrices shows that the achievable number of the projection directions of the proposed approaches is generally twice that of LDA. Moreover, both L1CPFDA and L2CPFDA cast LDA as a special case.

The remainder of this paper is organized as follows. In Section 2, we briefly review CRP and LDA. The unified CR framework, the proposed L1CPFDA and L2CPFDA, along with their properties, computational complexity and connections with other methods are presented in Section 3. The experimental results are presented in Section 4. Finally, the conclusion remarks and future work are given in Section 5.

Section snippets

Related works

Suppose there are n training samples depicted as X=[x1,x2,,xn]Rm×n belonging to C classes, where xiRm is the ith sample. Let nc be the number of samples in the cth class, and c=1Cnc=n. In what follows, we make a brief review of the representative CRP and LDA methods.

Collaborative preserving Fisher discriminant analysis (CPFDA)

Here, we first present a unified collaborative representation framework (UCRF) to cover both CR and SR. Then, we put forward two CPFDA approaches (i.e., L1CPFDA and L2CPFDA) in detail, and analyze their computational complexity and discuss their fundamental properties. Finally, we make a comparison between our proposed methods with other related ones.

Experimental results

In this section, we conduct experiments on ORL, AR, FERET face databases and object database COIL20 to evaluate the effectiveness of our proposed L1CPFDA and L2CPFDA.

Conclusions and further work

We have presented in this paper two supervised dimensionality reduction methods, coined collaborative preserving Fisher discriminant analysis (L1CPFDA/L2CPFDA), for image recognition tasks. In L1CPFDA/L2CPFDA, a L1/L2 graph is first built by collaborative representation with L1/L2 norm regularization, which has closed form solution and can be derived efficiently. Different from MFA and LFDA, both L1/L2 graph is datum adaptive and the manual selection of the neighbors for each sample is avoided.

Conflict of interest

None.

Acknowledgments

The authors would like to thank Editors and anonymous reviewers for their valuable comments and suggestions to improve the quality of this paper. This work was supported by the National Natural Science Foundation of China under Grants 61271293 and 61803293.

Ming-Dong Yuan received the B.S. Degree in Agricultural Electrification and Automation from Sichuan Agricultural University, Ya'an, China, in 2009, and the Ph.D. degree in Signal and Information Processing at Xidian University, Xi'an, China, in 2017. Now, he is with CETC Key Laboratory of Smart City Modeling Simulation and Intelligent Technology, Shenzhen, China. His current research interests involve subspace learning, feature selection, matrix factorization, sparse and low-rank representation.

References (71)

  • ZhangL. et al.

    Graph optimization for dimensionality reduction with sparsity constraints

    Pattern Recognit.

    (2012)
  • GuiJ. et al.

    Discriminant sparse neighborhood preserving embedding for face recognition

    Pattern Recognit.

    (2012)
  • LaiZ. et al.

    Sparse two-dimensional local discriminant projections for feature extraction

    Neurocomputing

    (2011)
  • YangM. et al.

    Gabor feature based robust representation and classification for face recognition with Gabor occlusion dictionary

    Pattern Recognit.

    (2013)
  • YangJ. et al.

    Beyond sparsity: the role of L1-optimizer in pattern classification

    Pattern Recognit.

    (2012)
  • YangW. et al.

    Image classification using kernel collaborative representation with regularized least square

    Appl. Math. Comput.

    (2013)
  • WangD. et al.

    Kernel collaborative face recognition

    Pattern Recognit.

    (2015)
  • YangW. et al.

    A collaborative representation based projections method for feature extraction

    Pattern Recognit.

    (2015)
  • HuaJ. et al.

    Dimension reduction using collaborative representation reconstruction based projections

    Neurocomputing

    (2016)
  • YinJ. et al.

    Optimized projection for collaborative representation based classification and its applications to face recognition

    Pattern Recognit. Lett.

    (2016)
  • YangW. et al.

    A regularized least square based discriminative projections for feature extraction

    Neurocomputing

    (2016)
  • YuanM.-D. et al.

    Enhanced regularized least square based discriminative projections for feature extraction

    Signal Process.

    (2017)
  • FengZ. et al.

    Joint discriminative dimensionality reduction and dictionary learning for face recognition

    Pattern Recognit.

    (2013)
  • YuanM.-D. et al.

    Collaborative representation discriminant embedding for image classification

    J. Vis. Commun. Image Represent.

    (2016)
  • P.J. Phillips et al.

    The FERET database and evaluation procedure for face-recognition algorithms

    Image Vis. Comput.

    (1998)
  • M. Turk et al.

    Eigenfaces for recognition

    J. Cogn. Neurosci.

    (1991)
  • P.N. Belhumeur et al.

    Eigenfaces vs. fisherfaces: recognition using class specific linear projection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1997)
  • LeeD.D. et al.

    Learning the parts of objects by non-negative matrix factorization

    Nature

    (1999)
  • GuanN. et al.

    NeNMF: an optimal gradient method for nonnegative matrix factorization

    IEEE Trans. Signal Process.

    (2012)
  • LiH. et al.

    Efficient and robust feature extraction by maximum margin criterion

    IEEE Trans. Neural Netw.

    (2006)
  • M.A. Hearst et al.

    Support vector machines

    IEEE Intell. Syst. Appl.

    (1998)
  • Kwang InK. et al.

    Face recognition using kernel principal component analysis

    IEEE Signal Process. Lett.

    (2002)
  • S. Mika et al.

    Fisher discriminant analysis with kernels

  • S. Zafeiriou et al.

    Regularized kernel discriminant analysis with a robust kernel for face recognition and verification

    IEEE Trans. Neural Netw. Learn. Syst.

    (2012)
  • HeX. et al.

    Face recognition using Laplacianfaces

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2005)
  • Cited by (0)

    Ming-Dong Yuan received the B.S. Degree in Agricultural Electrification and Automation from Sichuan Agricultural University, Ya'an, China, in 2009, and the Ph.D. degree in Signal and Information Processing at Xidian University, Xi'an, China, in 2017. Now, he is with CETC Key Laboratory of Smart City Modeling Simulation and Intelligent Technology, Shenzhen, China. His current research interests involve subspace learning, feature selection, matrix factorization, sparse and low-rank representation.

    Da-Zheng Feng received the Diploma degree from Xi'an University of Technology, Xi'an, China, in 1982, the M. S. degree from Xi'an Jiaotong University, Xi'an, China, in 1986, and the Ph.D. degree in Electronic Engineering from Xidian University, Xi'an, China, in 1995. From May 1996 to May 1998, he was a Postdoctoral Research Affiliate with Xi'an Jiaotong University, China. From May 1998 to June 2000, he was an Associate Professor with Xidian University. Since July 2000, he has been a Professor at Xidian University. His current research interests include signal processing, intelligence and brain information processing, and radar techniques.

    Ya Shi received the B.S. Degree in Electronic and Information Engineering from Xidian University, Xi'an, China, in 2008, and the M.S. Degree in Pattern Recognition and Intelligent System from Xidian University in 2011, and the Ph.D. Degree in Pattern Recognition and Intelligent System from the same university in 2015. Since then, she has been a lecturer in Xi'an University of Architecture and Technology, Xi'an, China. Her current research interests include Machine Learning and Pattern Recognition.

    Wen-Juan Liu received the B.S. Degree in Electronic and Information Engineering from Xidian University, Xi'an, China, in 2009, and the Ph.D. degree in Signal and Information Processing from the same university, in 2016. After graduation, she joined Beijing Xiaomi Intelligent Technology Co., Ltd, and currently works as a Signal Processing Engineer for the design of Binaural Speech Separation System in Reverberant Environments. Her current research interests include blind speech separation, adaptive signal processing and statistical signal processing.

    View full text