Discriminant common vectors versus neighbourhood components analysis and Laplacianfaces: A comparative study in small sample size problem

https://doi.org/10.1016/j.imavis.2005.11.007Get rights and content

Abstract

Discriminant common vectors (DCV), neighbourhood components analysis (NCA) and Laplacianfaces (LAP) are three recently proposed methods which can effectively learn linear projection matrices for dimensionality reduction in face recognition, where the dimension of the sample space is typically larger than the number of samples in the training set and consequently the so-called small sample size (SSS) problem exists. The three methods obtained their respective projection matrices based on different objective functions and all claimed to be superior to such methods as Principal component analysis (PCA) and PCA plus Linear discriminant analysis (PCA+LDA) in terms of classification accuracy. However, in literature, no comparative study is carried out among them. In this paper, we carry out a comparative study among them in face recognition (or generally in the SSS problem), and argue that the projection matrix yielded by DCV is the optimal solution to both NCA and LAP in terms of their respective objective functions, whereas neither NCA nor LAP may get their own optimal solutions. In addition, we show that DCV is more efficient than both NCA and LAP for both linear dimensionality reduction and subsequent classification in SSS problem. Finally, experiments are conducted on ORL, AR and YALE face databases to verify our arguments and to present some insights for future study.

Introduction

In face recognition, we usually employ appearance-based methods [1], [2]. One primary advantage of appearance-based methods is that it is not necessary to create representations or models for face images, since for a given face image, its model is now implicitly defined in the face image itself [3]. When using appearance-based methods, we usually represent an image of size r×c pixels by a vector in a d-dimensional space, where d=rc. Although such an appearance based representation is simple in form, the corresponding dimensionality d is too large to realize robust and fast recognition [3], and is typically lager than the number of samples in the training set which leads to the so-called small sample size (SSS) problem. A common way to resolve this problem is to use dimensionality reduction techniques. Discriminant common vectors (DCV) [7], [8], Laplacianfaces (LAP) [17] and neighbourhood components analysis (NCA) [18] are three recently proposed methods, which can effectively learn linear projection matrices for dimensionality reduction in face recognition.

DCV [7], [8] aims at solving the small sample size (SSS) problem in linear discriminant analysis (LDA) [4], [5], [6] which maximizes the Fisher's linear discriminat criterion as followsJFLD(Wopt)=argmaxW|WTSbW|/|WTSwW|where Sw is the within-class scatter matrix and Sb is the between-scatter matrix. When the SSS problem takes place, Sw will be typically singular and LDA cannot be applied directly. DCV1 remedies this by calculating the projection matrix in the null space of Sw, and as a result gets an optimum (infinite) of objective function (1).

LAP [17] originates from viewpoint of preserving the locality structure of the image space. To this end, it models a manifold [13], [14], [15] structure by a nearest-neighbor graph, constructs a face subspace by locality preserving projections (LPP) [16], and performs dimensionality reduction by a set of feature images called Laplacianfaces.

NCA [18] aims at learning a Mahalanobis distance measure to be used in the k nearest neighbor (KNN) classification. Subtly, it boils learning such Mahalanobis distance down to learning a linear projection (or transformation) matrix, and at the same time avoids the inverse operation of the matrix in calculating traditional Mahalanobis distance metric. The projection matrix in NCA is obtained through optimizing the KNN leave-one-out (LOO) classification performance on the training set, so the learned Mahalanobis distance metric or equivalently the projection matrix is directly related to the classification performance. This is the main characteristic of NCA, and is quite different from the dimensionality reduction methods mentioned above (e.g. DCV, LAP) whose objective functions are not directly associated with the classification decision. By restricting the projection matrix in the distance measure learning to a non-square one, NCA can be used for dimensionality reduction [18].

The three methods all claimed to be superior to principal component analysis (PCA) [1] and PCA+LDA [4], [5], [6], namely: (1) DCV is superior to PCA and PCA+LDA in terms of recognition accuracy, efficiency and numerical stability [8]; (2) PCA and PCA+LDA can be obtained from different graph models in LAP, and LAP provides a better representation and achieves lower classification error rates in face recognition [17]; and (3) When labeled data is available, NCA performs better both in terms of classification performance in the projected representation and in terms of visualization of class separation as compared to the standard methods of PCA and LDA [18]. However, there is no comparative study among them in literature. The purpose of this paper is to compensate this by a comparative study among them, and to get some sight from such comparative study. It is worthwhile to highlight our contributions in this paper as follows:

  • (1)

    We for the first time in literature perform a comparative study among DCV, LAP and NCA, and argue that in SSS problem (e.g. face recognition) the projection matrix yielded by DCV is the optimal solution to both NCA and LAP in terms of their respective objective functions, whereas neither NCA nor LAP may get their own optimal solutions.

  • (2)

    We show that DCV is more efficient than both NCA and LAP for both linear dimensionality reduction and subsequent classification in SSS problem.

  • (3)

    We reveal the essence of DCV, i.e. calculating the projection matrix is equivalent to solving a thin QR decomposition problem, which is easier for both understanding and being extended to its nonlinear version by kernel trick [22], [23], [24], [25].

  • (4)

    We experimentally give the application scope of DCV, namely, when MSV (defined in Section 4) is relatively small, it performs well while on the contrary when MSV is relatively large, it performs poorly.

The rest of the paper is organized as follows. In Section 2, DCV, NCA and LAP are, respectively, reviewed. In Section 3, we carry out a comparative study among these three methods in SSS problem. In Section 4, we report experimental results on several face databases. Finally in Section 5, we provide some concluding remarks and suggestions for future work.

Section snippets

Review of the three methods

Let the training set be composed of C classes, with the ith class containing Ni(>1) d-dimensional samples. Suppose that the training samples are linearly independent, which can be generally satisfied in such applications as face classification. Then there will be a total of M=N1+N2++NC linearly independent training samples. Note that in such high dimension data classification as face recognition, the SSS problem exists, namely dM generally holds.

We give two equivalent descriptions of the

A comparative study among the three methods

From Section 2, we can clearly observe that the three methods are originated from different starting points, namely: (a) DCV aims at solve the small sample size problem in LDA, and to this end, it restricts the solution to the null space of the within-class matrix Sw, and gets an optimum (infinite in value) of objective function (1); (b) NCA intends to learn a Mahalanobis distance metric or equivalently a projection matrix through maximizing the KNN LOO classification performance on the

Experiments

In Section 3, we have presented our theoretical arguments. And now, we carry out experiments on three face datasets: ORL, YALE and AR in order to: (1) verify experimentally that the projection matrix of DCV provides the optimal solution to both NCA and LAP in terms of (14) and (16), respectively; (2) investigate the values of objective functions (14) and (16) by the projection matrices of NCA and LAP, respectively; (3) report the classification accuracies of DCV, NCA and LAP, and give the

Conclusion and future work

In this paper, we have a comparative study among DCV, NCA and LAP to find that in SSS problem, the projection matrix of DCV is the optimal solution to both NCA and LAP in case of their respective objective functions, whereas neither NCA nor LAP may achieve their optimal objective function value. Both theoretical analysis and experimental simulations are presented to verify our arguments. In addition, we show that DCV is much more efficient than both NCA and LAP in both calculating the

Acknowledgements

Thank Xiaofei He for valuable discussion, and Jacob Goldberger for supplying the NCA code. And we thank Natural Science Foundation of China under Grant No. 60473035 for support.

References (27)

  • L.F. Chen et al.

    A new LDA-based face recognition system which can solve the small sample size problem

    Pattern Recognition

    (2000)
  • M.A. Turk, A.P. Pentland, Face recognition using eigenfaces, Proceedings of the IEEE Conference on Computer Vision and...
  • H. Murase et al.

    Visual learning and recognition of 3-D objects from appearance

    International Journal of Computer Vision

    (1995)
  • A.M. Martinez et al.

    PCA versus LDA

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2001)
  • D.L. Swets et al.

    Using discriminant eigenfeatures for image retrieval

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1996)
  • P.N. Belhumeur et al.

    Eigenfaces vs. fisherfaces: recognition using class specific linear projection

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1997)
  • C. Liu et al.

    Robust coding schemes for indexing and retrieval from large face databases

    IEEE Transactions on Image Processing

    (2000)
  • H. Cevikalp, M. Wilke, Face recognition by using discriminative common vectors, Proceedings of the 17th International...
  • H. Cevikalp et al.

    Discriminative common vectors for face recognition

    IEEE Transaction on Pattern Analysis and Machine Intelligence

    (2005)
  • J. Liu, S. Chen, Equivalences among different null space based feature extraction methods for small sample size...
  • J. Yang et al.

    A generalised K–L expansion method which can deal with small sample size and high-dimensional problems

    Pattern Analysis and Applications

    (2003)
  • R. Huang, Q. Liu, H. Lu, S. Ma, Solving the small size problem of LDA, Proceedings of the 16th International Conference...
  • Y. Chang, C. Hu, M. Turk, Manifold of facial expression, Proceedings of the IEEE International Workshop Analysis and...
  • Cited by (39)

    • A learning model for automated construction site monitoring using ambient sounds

      2022, Automation in Construction
      Citation Excerpt :

      Step 8: Select the optimal number of features utilizing the INCA selector. INCA selector is a parametric and improved/developed version of the NCA [45] feature selector. INCA objectives to select optimal feature vector without using a trial and error model.

    View all citing articles on Scopus
    View full text