Discriminant common vectors versus neighbourhood components analysis and Laplacianfaces: A comparative study in small sample size problem

doi:10.1016/j.imavis.2005.11.007

Image and Vision Computing

Volume 24, Issue 3, 1 March 2006, Pages 249-262

https://doi.org/10.1016/j.imavis.2005.11.007 Get rights and content

Abstract

Discriminant common vectors (DCV), neighbourhood components analysis (NCA) and Laplacianfaces (LAP) are three recently proposed methods which can effectively learn linear projection matrices for dimensionality reduction in face recognition, where the dimension of the sample space is typically larger than the number of samples in the training set and consequently the so-called small sample size (SSS) problem exists. The three methods obtained their respective projection matrices based on different objective functions and all claimed to be superior to such methods as Principal component analysis (PCA) and PCA plus Linear discriminant analysis (PCA+LDA) in terms of classification accuracy. However, in literature, no comparative study is carried out among them. In this paper, we carry out a comparative study among them in face recognition (or generally in the SSS problem), and argue that the projection matrix yielded by DCV is the optimal solution to both NCA and LAP in terms of their respective objective functions, whereas neither NCA nor LAP may get their own optimal solutions. In addition, we show that DCV is more efficient than both NCA and LAP for both linear dimensionality reduction and subsequent classification in SSS problem. Finally, experiments are conducted on ORL, AR and YALE face databases to verify our arguments and to present some insights for future study.

Introduction

In face recognition, we usually employ appearance-based methods [1], [2]. One primary advantage of appearance-based methods is that it is not necessary to create representations or models for face images, since for a given face image, its model is now implicitly defined in the face image itself [3]. When using appearance-based methods, we usually represent an image of size r×c pixels by a vector in a d-dimensional space, where d=rc. Although such an appearance based representation is simple in form, the corresponding dimensionality d is too large to realize robust and fast recognition [3], and is typically lager than the number of samples in the training set which leads to the so-called small sample size (SSS) problem. A common way to resolve this problem is to use dimensionality reduction techniques. Discriminant common vectors (DCV) [7], [8], Laplacianfaces (LAP) [17] and neighbourhood components analysis (NCA) [18] are three recently proposed methods, which can effectively learn linear projection matrices for dimensionality reduction in face recognition.

DCV [7], [8] aims at solving the small sample size (SSS) problem in linear discriminant analysis (LDA) [4], [5], [6] which maximizes the Fisher's linear discriminat criterion as follows $J_{FLD} (W_{opt}) = arg max_{W} | W^{T} S_{b} W | / | W^{T} S_{w} W |$ where S_w is the within-class scatter matrix and S_b is the between-scatter matrix. When the SSS problem takes place, S_w will be typically singular and LDA cannot be applied directly. DCV¹ remedies this by calculating the projection matrix in the null space of S_w, and as a result gets an optimum (infinite) of objective function (1).

LAP [17] originates from viewpoint of preserving the locality structure of the image space. To this end, it models a manifold [13], [14], [15] structure by a nearest-neighbor graph, constructs a face subspace by locality preserving projections (LPP) [16], and performs dimensionality reduction by a set of feature images called Laplacianfaces.

NCA [18] aims at learning a Mahalanobis distance measure to be used in the k nearest neighbor (KNN) classification. Subtly, it boils learning such Mahalanobis distance down to learning a linear projection (or transformation) matrix, and at the same time avoids the inverse operation of the matrix in calculating traditional Mahalanobis distance metric. The projection matrix in NCA is obtained through optimizing the KNN leave-one-out (LOO) classification performance on the training set, so the learned Mahalanobis distance metric or equivalently the projection matrix is directly related to the classification performance. This is the main characteristic of NCA, and is quite different from the dimensionality reduction methods mentioned above (e.g. DCV, LAP) whose objective functions are not directly associated with the classification decision. By restricting the projection matrix in the distance measure learning to a non-square one, NCA can be used for dimensionality reduction [18].

The three methods all claimed to be superior to principal component analysis (PCA) [1] and PCA+LDA [4], [5], [6], namely: (1) DCV is superior to PCA and PCA+LDA in terms of recognition accuracy, efficiency and numerical stability [8]; (2) PCA and PCA+LDA can be obtained from different graph models in LAP, and LAP provides a better representation and achieves lower classification error rates in face recognition [17]; and (3) When labeled data is available, NCA performs better both in terms of classification performance in the projected representation and in terms of visualization of class separation as compared to the standard methods of PCA and LDA [18]. However, there is no comparative study among them in literature. The purpose of this paper is to compensate this by a comparative study among them, and to get some sight from such comparative study. It is worthwhile to highlight our contributions in this paper as follows:

(1)
We for the first time in literature perform a comparative study among DCV, LAP and NCA, and argue that in SSS problem (e.g. face recognition) the projection matrix yielded by DCV is the optimal solution to both NCA and LAP in terms of their respective objective functions, whereas neither NCA nor LAP may get their own optimal solutions.
(2)
We show that DCV is more efficient than both NCA and LAP for both linear dimensionality reduction and subsequent classification in SSS problem.
(3)
We reveal the essence of DCV, i.e. calculating the projection matrix is equivalent to solving a thin QR decomposition problem, which is easier for both understanding and being extended to its nonlinear version by kernel trick [22], [23], [24], [25].
(4)
We experimentally give the application scope of DCV, namely, when MSV (defined in Section 4) is relatively small, it performs well while on the contrary when MSV is relatively large, it performs poorly.

The rest of the paper is organized as follows. In Section 2, DCV, NCA and LAP are, respectively, reviewed. In Section 3, we carry out a comparative study among these three methods in SSS problem. In Section 4, we report experimental results on several face databases. Finally in Section 5, we provide some concluding remarks and suggestions for future work.

Section snippets

Review of the three methods

Let the training set be composed of C classes, with the ith class containing N_i(>1) d-dimensional samples. Suppose that the training samples are linearly independent, which can be generally satisfied in such applications as face classification. Then there will be a total of $M = N_{1} + N_{2} + \dots + N_{C}$ linearly independent training samples. Note that in such high dimension data classification as face recognition, the SSS problem exists, namely d≫M generally holds.

We give two equivalent descriptions of the

A comparative study among the three methods

From Section 2, we can clearly observe that the three methods are originated from different starting points, namely: (a) DCV aims at solve the small sample size problem in LDA, and to this end, it restricts the solution to the null space of the within-class matrix S_w, and gets an optimum (infinite in value) of objective function (1); (b) NCA intends to learn a Mahalanobis distance metric or equivalently a projection matrix through maximizing the KNN LOO classification performance on the

Experiments

In Section 3, we have presented our theoretical arguments. And now, we carry out experiments on three face datasets: ORL, YALE and AR in order to: (1) verify experimentally that the projection matrix of DCV provides the optimal solution to both NCA and LAP in terms of (14) and (16), respectively; (2) investigate the values of objective functions (14) and (16) by the projection matrices of NCA and LAP, respectively; (3) report the classification accuracies of DCV, NCA and LAP, and give the

Conclusion and future work

In this paper, we have a comparative study among DCV, NCA and LAP to find that in SSS problem, the projection matrix of DCV is the optimal solution to both NCA and LAP in case of their respective objective functions, whereas neither NCA nor LAP may achieve their optimal objective function value. Both theoretical analysis and experimental simulations are presented to verify our arguments. In addition, we show that DCV is much more efficient than both NCA and LAP in both calculating the

Acknowledgements

Thank Xiaofei He for valuable discussion, and Jacob Goldberger for supplying the NCA code. And we thank Natural Science Foundation of China under Grant No. 60473035 for support.

References (27)

L.F. Chen et al.
A new LDA-based face recognition system which can solve the small sample size problem
Pattern Recognition
(2000)
M.A. Turk, A.P. Pentland, Face recognition using eigenfaces, Proceedings of the IEEE Conference on Computer Vision and...
H. Murase et al.
Visual learning and recognition of 3-D objects from appearance
International Journal of Computer Vision
(1995)
A.M. Martinez et al.
PCA versus LDA
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2001)
D.L. Swets et al.
Using discriminant eigenfeatures for image retrieval
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1996)
P.N. Belhumeur et al.
Eigenfaces vs. fisherfaces: recognition using class specific linear projection
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1997)
C. Liu et al.
Robust coding schemes for indexing and retrieval from large face databases
IEEE Transactions on Image Processing
(2000)
H. Cevikalp, M. Wilke, Face recognition by using discriminative common vectors, Proceedings of the 17th International...
H. Cevikalp et al.
Discriminative common vectors for face recognition
IEEE Transaction on Pattern Analysis and Machine Intelligence
(2005)
J. Liu, S. Chen, Equivalences among different null space based feature extraction methods for small sample size...

J. Yang et al.

A generalised K–L expansion method which can deal with small sample size and high-dimensional problems

Pattern Analysis and Applications

(2003)

R. Huang, Q. Liu, H. Lu, S. Ma, Solving the small size problem of LDA, Proceedings of the 16th International Conference...

Y. Chang, C. Hu, M. Turk, Manifold of facial expression, Proceedings of the IEEE International Workshop Analysis and...

Cited by (39)

A learning model for automated construction site monitoring using ambient sounds
2022, Automation in Construction
Citation Excerpt :
Step 8: Select the optimal number of features utilizing the INCA selector. INCA selector is a parametric and improved/developed version of the NCA [45] feature selector. INCA objectives to select optimal feature vector without using a trial and error model.
Construction site monitoring is an important task to analyze, measure, and monitor the activities in the construction site. In order to present/develop an automated construction site monitoring model, many machine learning methods have been presented in the literature. This work aims to develop an automated activity identification and construction vehicle classification model using sounds. Thus, two ambient sound datasets were collected. A new learning method is proposed to classify the collected sounds, and this model is named BTPNet21 since our proposal uses a binary and ternary pattern with a pooling function to extract features. Iterative neighborhood component analysis selector chooses the most significant features, and the support vector machine is utilized as a classifier. Our proposal attained 99.45% and 99.17% accuracy rates on the collected sound datasets consecutively. These results demonstrate that the success of the introduced BTPNet21 for sound-based automated construction site monitoring.
Extracting the optimal dimensionality for local tensor discriminant analysis
2009, Pattern Recognition
Supervised dimensionality reduction with tensor representation has attracted great interest in recent years. It has been successfully applied to problems with tensor data, such as image and video recognition tasks. However, in the tensor-based methods, how to select the suitable dimensions is a very important problem. Since the number of possible dimension combinations exponentially increases with respect to the order of tensor, manually selecting the suitable dimensions becomes an impossible task in the case of high-order tensor. In this paper, we aim at solving this important problem and propose an algorithm to extract the optimal dimensionality for local tensor discriminant analysis. Experimental results on a toy example and real-world data validate the effectiveness of the proposed method.
A new synergetic model of neighbourhood component analysis and artificial intelligence method for blast-induced noise prediction
2023, Modeling Earth Systems and Environment
A Small-Sample Text Classification Model Based on Pseudo-Label Fusion Clustering Algorithm
2023, Applied Sciences (Switzerland)
Rapid identification of breast cancer subtypes using micro-FTIR and machine learning methods
2023, Applied Optics
Pathological voice classification based on multi-domain features and deep hierarchical extreme learning machine
2023, Journal of the Acoustical Society of America

View all citing articles on Scopus

View full text

Discriminant common vectors versus neighbourhood components analysis and Laplacianfaces: A comparative study in small sample size problem

Abstract

Introduction

Section snippets

Review of the three methods

A comparative study among the three methods

Experiments

Conclusion and future work

Acknowledgements

Pattern Recognition

Visual learning and recognition of 3-D objects from appearance

International Journal of Computer Vision

PCA versus LDA

IEEE Transactions on Pattern Analysis and Machine Intelligence

Using discriminant eigenfeatures for image retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence

Eigenfaces vs. fisherfaces: recognition using class specific linear projection

IEEE Transactions on Pattern Analysis and Machine Intelligence

Robust coding schemes for indexing and retrieval from large face databases

IEEE Transactions on Image Processing

Discriminative common vectors for face recognition

IEEE Transaction on Pattern Analysis and Machine Intelligence

A generalised K–L expansion method which can deal with small sample size and high-dimensional problems

Pattern Analysis and Applications