Feature space locality constraint for kernel based nonlinear discriminant analysis

doi:10.1016/j.patcog.2012.01.012

Pattern Recognition

Volume 45, Issue 7, July 2012, Pages 2733-2742

https://doi.org/10.1016/j.patcog.2012.01.012 Get rights and content

Abstract

Subspace learning is an important approach in pattern recognition. Nonlinear discriminant analysis (NDA), due to its capability of describing nonlinear manifold structure of samples, is considered to be more powerful to undertake classification tasks in image related problems. In kernel based NDA representation, there are three spaces involved, i.e., original data space, implicitly mapped high dimension feature space and the target low dimension subspace. Existing methods mainly focus on the information in original data space to find the most discriminant low dimension subspace. The implicit high dimension feature space plays a role that connects the original space and the target subspace to realize the nonlinear dimension reduction, but the sample geometric structure information in feature space is not involved. In this work, we try to utilize and explore this information. Specifically, the locality information of samples in feature space is modeled and integrated into the traditional kernel based NDA methods. In this way, both the sample distributions in original data space and the mapped high dimension feature space are modeled and more information is expected to be explored to improve the discriminative ability of the subspace. Two algorithms, named FSLC-KDA and FSLC-KSR, are presented. Extensive experiments on ORL, Extended-YaleB, PIE, Multi-PIE and FRGC databases validate the efficacy of the proposed method.

Highlights

► The relationship of three spaces in nonlinear discriminant analysis is analyzed. ► Locality constraint in mapped high dimension feature space is proposed and integrated into the nonlinear discriminant analysis method. ► Two examples FSLC-KDA and FSLC-KSR are presented to show how FSLC can be integrated with the existing NDA methods. ► The linear version of FSLC based method is presented. ► Extensive experiments are conducted to show the superiority of the proposed FSLC based methods.

Introduction

Subspace learning, due to its efficacy and efficiency, has achieved great success in pattern recognition and computer vision field, such as face recognition and image retrieval, etc. Objects in natural world usually exhibit diverse and distinctive visual characteristics for human perception. For pattern recognition, however, the essential factors hidden behind these various representations are considered to be more important and are the goals that we pursuit. Subspace learning is such a tool that is used to exploit the essential manifold structures (usually low dimension) of objects from diverse high dimension observations. Without loss of generality, in this work, we mainly focus on the image related classification problem.

Among various subspace learning methods, principle component analysis (PCA) [1] and linear discriminant analysis (LDA) [2] are the two representative ones. PCA uses the Karhunen–Loeve transform to produce the most expressive subspace for image representation by capturing the maximum-variation directions in the data space. Then classification is fulfilled by minimizing the residua of the reconstruction. The objective of LDA is to find a subspace that maximizes the sample distance from different classes (between-class scatter S_b) and meanwhile minimizes the sample distance from the same class (within-class scatter S_w). Thus, the derived subspace is discriminative to classify different samples correctly. In real world application, due to the high dimension of samples and relatively small sample size, the within-class scatter is always singular and the traditional LDA solution cannot be obtained directly. In order to address this problem, a lot of LDA variants like Fisher LDA (PCA+LDA) [2], Null space LDA [3] and Direct LDA [4], etc. are proposed.

Locality preserving projections (LPP) [5] is another popular method that has been successfully applied to object recognition. In LPP, the neighboring structure of each sample is considered and the objective of LPP is to find the subspace that preserves the local neighboring structure as much as possible. The original LPP is an unsupervised method and the class information is not utilized. A number of LPP variants [6], [7], [8] that incorporate LPP and class information are further proposed to improve the classification performance.

Recently, Yan et al. [9] unify and re-interpret the subspace learning in a graph embedding framework. The graph is composed of vertex set and similarity matrix. Each vertex in vertex set denotes a sample in data space and the element of similarity matrix is used to describe the relationship between different samples. The graph defined in this way is able to characterize various statistical or geometric properties of the data set. The purpose of graph embedding is to find a low dimension representation for each vertex to preserve the relationship among them. With proper definition of similarity matrix, almost all the aforementioned subspace learning methods can be implemented in this framework. Moreover, graph embedding provides a direction and platform to design novel subspace learning methods. Marginal Fisher analysis (MFA) [9] and spectral regression (SR) [10] are the two methods derived directly from the graph embedding view.

It is well known that image samples usually lie in a nonlinear manifold structure in data space [11]. Although linear methods are efficient in dimensionality reduction, they are not accurate enough to describe subtleties of image manifolds. This is due to their limitations in handling nonlinearity of manifold structure. Protrusions of nonlinear manifolds may be smoothed and concavities may be filled in, causing unfavorable consequences. Therefore, the nonlinear discriminant analysis (NDA) is consequently proposed to deal with the nonlinear dimensionality reduction problem. A lot of nonlinear techniques that attempt to preserve the global/local properties of the original data or perform global alignment of a mixture of linear models have been proposed [12], such as ISOMAP [13], LLE [14], LLC [15], etc. Among various NDA methods, one important branch is kernel based nonlinear discriminant analysis method [16]. It adopts the kernel tricks as in SVM [17] to map the original data into high dimension feature space nonlinearly and then the traditional linear discriminant analysis method is applied on the mapped data. In this way, a series of kernel formulations of linear subspace methods like kernel PCA (KPCA) [16], kernel LDA (KDA) [18] and kernel LPP (KLPP) [19], etc. are derived.

Existing kernel based NDA methods, including KPCA, KDA, KLPP, kernel LFDA [20], etc. mainly focus on the sample distribution in original data space. That is to say, the geometric structure of original data is considered during the NDA process. The high dimension feature space is no more than a linkage of the original space and the target subspace. No sample geometric structure in feature space is involved. On the other hand, in kernel based NDA, the target subspace is derived from the feature space rather than the original one. We argue that the sample geometric structure information contained in high dimension feature space is also important and useful for discriminant analysis. In this paper, we try to model and explore this information and integrate it into the kernel based nonlinear discriminant analysis process. Specifically, the locality information of samples in feature space is extracted as a constraint for kernel based NDA methods. In this way, both sample geometric structure information in original and mapped high dimension feature space is utilized and better discriminative generalization of the derived subspace will be expected. Fig. 1 illustrates the different sample distribution information utilized in linear, traditional and our feature space locality constraint based kernel NDA methods.

The remainder of this paper is organized as follows. Section 2 introduces the feature space locality constraint (FSLC) and describes how FSLC can be integrated into kernel NDA. Section 3 details the solutions of two representative FSLC based kernel NDA methods, FSLC-KDA and FSLC-KSR. Extensive experiments on five databases and the discussions are given in Section 4 and in Section 5, we conclude the paper.

Section snippets

Feature space locality constraint based kernel nonlinear discriminant analysis

In this part, we introduce the locality constraint in feature space and exhibit how it can be integrated into kernel based nonlinear discriminant analysis.

In graph embedding formulation [9], the sample distribution is characterized by a graph $G = {X, W}$ , where $X = [x_{1}, x_{2}, \dots, x_{N}]$ is the vertex set, and $x_{i} (i = 1,2, \dots, N)$ denotes a sample in data space. $W$ is the similarity matrix, whose element W_ij defines the similarity between samples i and j. The purpose of graph embedding is to find a low dimension

FSLC-KDA and FSLC-KSR

The spirit of feature space locality constraint based kernel nonlinear discriminant analysis (FSLC-KNDA) is illustrated in Fig. 2. FSLC-KNDA takes into account sample distributions in both original and mapped high dimension feature spaces in the process of discriminant analysis. In the following, we take kernel discriminant analysis (KDA) and kernel spectral regression (KSR) as two examples to show how the solutions of FSLC-KDA and FSLC-KSR can be derived. It is worth noting that the feature

Experiments

The proposed methods FSLC-KDA, FSLC-KSR are compared with ordinary KDA and KSR ones. For nonlinear kernel based methods, the RBF kernel $k (x_{i}, x_{j}) = \exp {- ∥ x_{i} - x_{j} ∥^{2} / 2 σ}$ is used. Specifically, we also use the identity kernel (i.e., $ϕ (x) = x$ , $k (x_{i}, x_{j}) = x_{i}^{T} x_{j}$ ) to realize linear versions of FSLC based methods. It is easy to see that KDA and KSR are equivalent with LDA and linear SR (LSR) if the identity kernel is used. For FSLC based methods, with the identity kernel, the high dimension feature space and

Conclusions

We analyze the relationship of original data space, nonlinear mapped high dimension feature space and the target low dimension discriminant subspace in nonlinear discriminant analysis methods. The locality information in high dimension feature space is usually ignored in existing NDA methods. Since the low discriminant subspace is learned from the feature space directly, the sample geometric structure information contained in high dimension feature space is important and useful for discriminant

Acknowledgement

This work was supported by the Chinese National Natural Science Foundation Project #61070146, #61105023, #61103156, #61105037, National IoT R&D Project #2150510, European Union FP7 Project #257289 (TABULA RASA http://www.tabularasa-euproject.org), and AuthenMetric R&D Funds.

Zhen Lei received the B.S. degree in automation from the University of Science and Technology of China (USTC), Hefei, China, in 2005 and the Ph.D. degree from Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2010. He is currently with the Center for Biometrics and Security Research and National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests are in computer vision, pattern recognition, image

References (29)

L. Chen et al.
A new lda-based face recognition system which can solve the small sample size problem
Pattern Recognition
(2000)
H. Yu et al.
A direct lda algorithm for high-dimensional data with application to face recognition
Pattern Recognition
(2001)
J. Gui et al.
Locality preserving discriminant projections for face and palmprint recognition
Neurocomputing
(2010)
Y. Xu et al.
One improvement to two-dimensional locality preserving projection method for use with face recognition
Neurocomputing
(2009)
G. Feng et al.
An alternative formulation of kernel lpp with application to image recognition
Neurocomputing
(2006)
R. Gross et al.
Multi-pie
Image and Vision Computing
(2010)
M.A. Turk, A.P. Pentland, Face recognition using eigenfaces, in: Proceedings of IEEE Computer Society Conference on...
P. Belhumeur et al.
Eigenfaces vs. Fisherfaces: recognition using class specific linear projection
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1997)
X. He et al.
Face recognition using laplacianfaces
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2005)
J. Chen, J. Ye, Q. Li, Integrating global and local structures: a least squares framework for dimensionality reduction,...

S. Yan et al.

Graph embedding and extensions: a general framework for dimensionality reduction

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2007)

D. Cai et al.

Spectral regression for efficient regularized subspace learning

S.Z. Li, A.K. Jain (Eds.), Handbook of Face Recognition, Springer-Verlag, New York,...

L. van der Maaten, E. Postma, H. van den Herik, Dimensionality Reduction: A Comparative Review, Technical Report,...

Cited by (5)

Nonlinear discriminant analysis based on vanishing component analysis
2016, Neurocomputing
Citation Excerpt :
Yang et al. [20] proposed a complete kernel Fisher discriminant framework based on KPCA plus LDA. Lei et al. [21] proposed a feature space locality constraint-based kernel nonlinear discriminant analysis method to integrate the sample geometric structure information into the traditional kernel-based methods. Zhang et al. [22] proposed generalized nonlinear discriminant analysis to deal with the small sample size problem in LDA.
Most kernel-based nonlinear discriminant analysis methods need to compute the kernel distance between test samples and all of the training samples, but this approach consumes large volumes of time and memory, and it may be impractical when there is a large number of training samples. In this study, we propose a vanishing component analysis (VCA) based nonlinear discriminant analysis (VNDA) method. First, VNDA learns nonlinear mapping functions explicitly using the modified VCA method, before employing these functions to map the input feature onto a high-dimensional polynomial feature space, where the linear discriminant analysis (LDA) method is then applied. We prove that principal components analysis plus LDA is a special case of VNDA and that the set of mapping functions learned by VNDA is the best solution to the ratio trace problem in the degree bounded polynomial feature space. Unlike kernel-based methods, VNDA only stores these mapping functions instead of all the training samples in the test step. Experimental results obtained based on four simulated data sets and 15 real data sets demonstrate that the proposed method yields highly competitive test recognition results compared to the state-of-the-art methods, while consuming less memory and time resources.
Subspace learning with frequency regularizer: Its application to face recognition
2015, Proceedings of 2015 International Conference on Biometrics, ICB 2015
Exploit more information of the sample for representation based face recognition
2015, International Journal of Wireless and Mobile Computing
Novel method for bearing performance degradation assessment - A kernel locality preserving projection-based approach
2014, Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science
Coupled discriminant analysis for heterogeneous face recognition
2012, IEEE Transactions on Information Forensics and Security

Zhiwei Zhang received the B.S. degree from Sichuan University, China, in 2009. He is now a graduate student in Center for Biometrics and Security Research and National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests are in image processing and biometrics.

Stan Z. Li received the B.Eng. degree from Hunan University, Changsha, China, the M.Eng. degree from the National University of Defense Technology, China, and the Ph.D. degree from Surrey University, Surrey, UK. He is currently a Professor and the Director of Center for Biometrics and Security Research (CBSR), Institute of Automation, Chinese Academy of Sciences (CASIA). He worked at Microsoft Research Asia as a Researcher from 2000 to 2004. Prior to that, he was an Associate Professor at Nanyang Technological University, Singapore. His research interest includes pattern recognition and machine learning, image and vision processing, face recognition, biometrics, and intelligent video surveillance. He has published over 200 papers in international journals and conferences, and authored and edited eight books.

Dr. Li is currently an Associate Editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence and is acting as the Editor-in-Chief for the Encyclopedia of Biometrics. He served as a co-chair for the International Conference on Biometrics 2007 and 2009, and has been involved in organizing other international conferences and workshops in the fields of his research interest.

View full text

Feature space locality constraint for kernel based nonlinear discriminant analysis

Abstract

Highlights

Introduction

Section snippets

Feature space locality constraint based kernel nonlinear discriminant analysis

FSLC-KDA and FSLC-KSR

Experiments

Conclusions

Acknowledgement

Pattern Recognition

Pattern Recognition

Neurocomputing

Neurocomputing

Neurocomputing

Image and Vision Computing

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

IEEE Transactions on Pattern Analysis and Machine Intelligence

Face recognition using laplacianfaces

IEEE Transactions on Pattern Analysis and Machine Intelligence

Graph embedding and extensions: a general framework for dimensionality reduction

IEEE Transactions on Pattern Analysis and Machine Intelligence

Spectral regression for efficient regularized subspace learning