Elsevier

Pattern Recognition

Volume 45, Issue 7, July 2012, Pages 2733-2742
Pattern Recognition

Feature space locality constraint for kernel based nonlinear discriminant analysis

https://doi.org/10.1016/j.patcog.2012.01.012Get rights and content

Abstract

Subspace learning is an important approach in pattern recognition. Nonlinear discriminant analysis (NDA), due to its capability of describing nonlinear manifold structure of samples, is considered to be more powerful to undertake classification tasks in image related problems. In kernel based NDA representation, there are three spaces involved, i.e., original data space, implicitly mapped high dimension feature space and the target low dimension subspace. Existing methods mainly focus on the information in original data space to find the most discriminant low dimension subspace. The implicit high dimension feature space plays a role that connects the original space and the target subspace to realize the nonlinear dimension reduction, but the sample geometric structure information in feature space is not involved. In this work, we try to utilize and explore this information. Specifically, the locality information of samples in feature space is modeled and integrated into the traditional kernel based NDA methods. In this way, both the sample distributions in original data space and the mapped high dimension feature space are modeled and more information is expected to be explored to improve the discriminative ability of the subspace. Two algorithms, named FSLC-KDA and FSLC-KSR, are presented. Extensive experiments on ORL, Extended-YaleB, PIE, Multi-PIE and FRGC databases validate the efficacy of the proposed method.

Highlights

► The relationship of three spaces in nonlinear discriminant analysis is analyzed. ► Locality constraint in mapped high dimension feature space is proposed and integrated into the nonlinear discriminant analysis method. ► Two examples FSLC-KDA and FSLC-KSR are presented to show how FSLC can be integrated with the existing NDA methods. ► The linear version of FSLC based method is presented. ► Extensive experiments are conducted to show the superiority of the proposed FSLC based methods.

Introduction

Subspace learning, due to its efficacy and efficiency, has achieved great success in pattern recognition and computer vision field, such as face recognition and image retrieval, etc. Objects in natural world usually exhibit diverse and distinctive visual characteristics for human perception. For pattern recognition, however, the essential factors hidden behind these various representations are considered to be more important and are the goals that we pursuit. Subspace learning is such a tool that is used to exploit the essential manifold structures (usually low dimension) of objects from diverse high dimension observations. Without loss of generality, in this work, we mainly focus on the image related classification problem.

Among various subspace learning methods, principle component analysis (PCA) [1] and linear discriminant analysis (LDA) [2] are the two representative ones. PCA uses the Karhunen–Loeve transform to produce the most expressive subspace for image representation by capturing the maximum-variation directions in the data space. Then classification is fulfilled by minimizing the residua of the reconstruction. The objective of LDA is to find a subspace that maximizes the sample distance from different classes (between-class scatter Sb) and meanwhile minimizes the sample distance from the same class (within-class scatter Sw). Thus, the derived subspace is discriminative to classify different samples correctly. In real world application, due to the high dimension of samples and relatively small sample size, the within-class scatter is always singular and the traditional LDA solution cannot be obtained directly. In order to address this problem, a lot of LDA variants like Fisher LDA (PCA+LDA) [2], Null space LDA [3] and Direct LDA [4], etc. are proposed.

Locality preserving projections (LPP) [5] is another popular method that has been successfully applied to object recognition. In LPP, the neighboring structure of each sample is considered and the objective of LPP is to find the subspace that preserves the local neighboring structure as much as possible. The original LPP is an unsupervised method and the class information is not utilized. A number of LPP variants [6], [7], [8] that incorporate LPP and class information are further proposed to improve the classification performance.

Recently, Yan et al. [9] unify and re-interpret the subspace learning in a graph embedding framework. The graph is composed of vertex set and similarity matrix. Each vertex in vertex set denotes a sample in data space and the element of similarity matrix is used to describe the relationship between different samples. The graph defined in this way is able to characterize various statistical or geometric properties of the data set. The purpose of graph embedding is to find a low dimension representation for each vertex to preserve the relationship among them. With proper definition of similarity matrix, almost all the aforementioned subspace learning methods can be implemented in this framework. Moreover, graph embedding provides a direction and platform to design novel subspace learning methods. Marginal Fisher analysis (MFA) [9] and spectral regression (SR) [10] are the two methods derived directly from the graph embedding view.

It is well known that image samples usually lie in a nonlinear manifold structure in data space [11]. Although linear methods are efficient in dimensionality reduction, they are not accurate enough to describe subtleties of image manifolds. This is due to their limitations in handling nonlinearity of manifold structure. Protrusions of nonlinear manifolds may be smoothed and concavities may be filled in, causing unfavorable consequences. Therefore, the nonlinear discriminant analysis (NDA) is consequently proposed to deal with the nonlinear dimensionality reduction problem. A lot of nonlinear techniques that attempt to preserve the global/local properties of the original data or perform global alignment of a mixture of linear models have been proposed [12], such as ISOMAP [13], LLE [14], LLC [15], etc. Among various NDA methods, one important branch is kernel based nonlinear discriminant analysis method [16]. It adopts the kernel tricks as in SVM [17] to map the original data into high dimension feature space nonlinearly and then the traditional linear discriminant analysis method is applied on the mapped data. In this way, a series of kernel formulations of linear subspace methods like kernel PCA (KPCA) [16], kernel LDA (KDA) [18] and kernel LPP (KLPP) [19], etc. are derived.

Existing kernel based NDA methods, including KPCA, KDA, KLPP, kernel LFDA [20], etc. mainly focus on the sample distribution in original data space. That is to say, the geometric structure of original data is considered during the NDA process. The high dimension feature space is no more than a linkage of the original space and the target subspace. No sample geometric structure in feature space is involved. On the other hand, in kernel based NDA, the target subspace is derived from the feature space rather than the original one. We argue that the sample geometric structure information contained in high dimension feature space is also important and useful for discriminant analysis. In this paper, we try to model and explore this information and integrate it into the kernel based nonlinear discriminant analysis process. Specifically, the locality information of samples in feature space is extracted as a constraint for kernel based NDA methods. In this way, both sample geometric structure information in original and mapped high dimension feature space is utilized and better discriminative generalization of the derived subspace will be expected. Fig. 1 illustrates the different sample distribution information utilized in linear, traditional and our feature space locality constraint based kernel NDA methods.

The remainder of this paper is organized as follows. Section 2 introduces the feature space locality constraint (FSLC) and describes how FSLC can be integrated into kernel NDA. Section 3 details the solutions of two representative FSLC based kernel NDA methods, FSLC-KDA and FSLC-KSR. Extensive experiments on five databases and the discussions are given in Section 4 and in Section 5, we conclude the paper.

Section snippets

Feature space locality constraint based kernel nonlinear discriminant analysis

In this part, we introduce the locality constraint in feature space and exhibit how it can be integrated into kernel based nonlinear discriminant analysis.

In graph embedding formulation [9], the sample distribution is characterized by a graph G={X,W}, where X=[x1,x2,,xN] is the vertex set, and xi(i=1,2,,N) denotes a sample in data space. W is the similarity matrix, whose element Wij defines the similarity between samples i and j. The purpose of graph embedding is to find a low dimension

FSLC-KDA and FSLC-KSR

The spirit of feature space locality constraint based kernel nonlinear discriminant analysis (FSLC-KNDA) is illustrated in Fig. 2. FSLC-KNDA takes into account sample distributions in both original and mapped high dimension feature spaces in the process of discriminant analysis. In the following, we take kernel discriminant analysis (KDA) and kernel spectral regression (KSR) as two examples to show how the solutions of FSLC-KDA and FSLC-KSR can be derived. It is worth noting that the feature

Experiments

The proposed methods FSLC-KDA, FSLC-KSR are compared with ordinary KDA and KSR ones. For nonlinear kernel based methods, the RBF kernel k(xi,xj)=exp{xixj2/2σ} is used. Specifically, we also use the identity kernel (i.e., ϕ(x)=x, k(xi,xj)=xiTxj) to realize linear versions of FSLC based methods. It is easy to see that KDA and KSR are equivalent with LDA and linear SR (LSR) if the identity kernel is used. For FSLC based methods, with the identity kernel, the high dimension feature space and

Conclusions

We analyze the relationship of original data space, nonlinear mapped high dimension feature space and the target low dimension discriminant subspace in nonlinear discriminant analysis methods. The locality information in high dimension feature space is usually ignored in existing NDA methods. Since the low discriminant subspace is learned from the feature space directly, the sample geometric structure information contained in high dimension feature space is important and useful for discriminant

Acknowledgement

This work was supported by the Chinese National Natural Science Foundation Project #61070146, #61105023, #61103156, #61105037, National IoT R&D Project #2150510, European Union FP7 Project #257289 (TABULA RASA http://www.tabularasa-euproject.org), and AuthenMetric R&D Funds.

Zhen Lei received the B.S. degree in automation from the University of Science and Technology of China (USTC), Hefei, China, in 2005 and the Ph.D. degree from Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2010. He is currently with the Center for Biometrics and Security Research and National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests are in computer vision, pattern recognition, image

References (29)

  • S. Yan et al.

    Graph embedding and extensions: a general framework for dimensionality reduction

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2007)
  • D. Cai et al.

    Spectral regression for efficient regularized subspace learning

  • S.Z. Li, A.K. Jain (Eds.), Handbook of Face Recognition, Springer-Verlag, New York,...
  • L. van der Maaten, E. Postma, H. van den Herik, Dimensionality Reduction: A Comparative Review, Technical Report,...
  • Cited by (5)

    • Nonlinear discriminant analysis based on vanishing component analysis

      2016, Neurocomputing
      Citation Excerpt :

      Yang et al. [20] proposed a complete kernel Fisher discriminant framework based on KPCA plus LDA. Lei et al. [21] proposed a feature space locality constraint-based kernel nonlinear discriminant analysis method to integrate the sample geometric structure information into the traditional kernel-based methods. Zhang et al. [22] proposed generalized nonlinear discriminant analysis to deal with the small sample size problem in LDA.

    • Subspace learning with frequency regularizer: Its application to face recognition

      2015, Proceedings of 2015 International Conference on Biometrics, ICB 2015
    • Exploit more information of the sample for representation based face recognition

      2015, International Journal of Wireless and Mobile Computing
    • Novel method for bearing performance degradation assessment - A kernel locality preserving projection-based approach

      2014, Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science
    • Coupled discriminant analysis for heterogeneous face recognition

      2012, IEEE Transactions on Information Forensics and Security

    Zhen Lei received the B.S. degree in automation from the University of Science and Technology of China (USTC), Hefei, China, in 2005 and the Ph.D. degree from Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2010. He is currently with the Center for Biometrics and Security Research and National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests are in computer vision, pattern recognition, image processing, and face recognition in particular.

    Zhiwei Zhang received the B.S. degree from Sichuan University, China, in 2009. He is now a graduate student in Center for Biometrics and Security Research and National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests are in image processing and biometrics.

    Stan Z. Li received the B.Eng. degree from Hunan University, Changsha, China, the M.Eng. degree from the National University of Defense Technology, China, and the Ph.D. degree from Surrey University, Surrey, UK. He is currently a Professor and the Director of Center for Biometrics and Security Research (CBSR), Institute of Automation, Chinese Academy of Sciences (CASIA). He worked at Microsoft Research Asia as a Researcher from 2000 to 2004. Prior to that, he was an Associate Professor at Nanyang Technological University, Singapore. His research interest includes pattern recognition and machine learning, image and vision processing, face recognition, biometrics, and intelligent video surveillance. He has published over 200 papers in international journals and conferences, and authored and edited eight books.

    Dr. Li is currently an Associate Editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence and is acting as the Editor-in-Chief for the Encyclopedia of Biometrics. He served as a co-chair for the International Conference on Biometrics 2007 and 2009, and has been involved in organizing other international conferences and workshops in the fields of his research interest.

    View full text