Generalized multiple maximum scatter difference feature extraction using QR decomposition

https://doi.org/10.1016/j.jvcir.2014.04.009Get rights and content

Highlights

  • GMMSD employs QR decomposition rather than SVD.

  • GMMSD allows relatively-free selection of a suitable matrix to reduce dimension.

  • We reveal GMMSD’s relationship to other feature extraction methods.

  • Experimental results are presented to demonstrate the effectiveness of GMMSD.

Abstract

Multiple maximum scatter difference (MMSD) discriminant criterion is an effective feature extraction method that computes the discriminant vectors from both the range of the between-class scatter matrix and the null space of the within-class scatter matrix. However, singular value decomposition (SVD) of two times is involved in MMSD, rendering this method impractical for high dimensional data. In this paper, we propose a generalized MMSD (GMMSD) criterion for feature extraction and classification. GMMSD allows relatively-free selection of a suitable transformation matrix to reduce dimensions. Based on GMMSD criterion, we demonstrate that the same discriminant information can be extracted by QR decomposition, which is more efficient than SVD. Next, GMMSD is compared with several classical feature extraction methods to justify the validity of the proposed method. Our experiments on three face databases and two facial expression databases demonstrate that GMMSD provides favorable recognition performance with high computational efficiency.

Introduction

In pattern recognition, feature extraction techniques are widely employed to reduce the dimensionality of data and to enhance the discrimination of information [1]. Generally, the methods can be divided into two classes, subspace-based methods and manifold learning-based methods. The popular linear methods include linear discriminant analysis (LDA) [2] and its extensions, including the proposal of geometric mean for subspace selection to obtain potential discriminative information [3]. The manifold learning methods, mostly developed in the past decade, include feature Isometric mapping (Isomap) [4], locally linear embedding (LLE) [6], laplacian eigenmaps (LEM) [6], Hessian Eigenmaps [7], locality preserving projection (LPP) [8], [9] and Multiview Hessian [10], [11]. Nonnegative matrix factorization method [12], [13] and double shrinking model based data compression method [14] were introduced to manifold learning for encoding geometric structure of the data and interpreting features for improved recognition results. While the manifold learning methods likely lead to better results, many of them require tedious parameter tuning in order to get good performance [15], [16] and some lack the ability to deal with the testing data directly [17]. In addition, in order to take advantage of the two classes, there is a novel proposal to reformulate most of the existing subspace methods and manifold methods into a unified form [18], [19].

LDA is one of the most important linear method for pattern recognition and representation which maximizes the ratio of the trace of the between-class scatter matrix to the trace of the within-class scatter matrix. However, for a task with very high-dimensional data such as facial images, conventional LDA method may suffer from the problem of singularity. To avoid this problem, PCA has been applied to reduce the dimensions of the high dimensional vector space before employing the LDA method [20]. While the PCA seeks projections which are optimal for image reconstruction from a low dimensional space, it may remove dimensions that contain discriminant information required for pattern recognition and lead to low recognition accuracy.When the within scatter matrix Sw is singular or ill-conditioned, Friedman [21] added a diagonal matrix αI for α>0 to Sw. Since Sw is a symmetric positive semi-definite matrix, Sw+αI is nonsingular for any α>0. Therefore we can employ this so-called R-LDA method to solve the singularity problem. The weakness of R-LDA is that the dimensionality of covariance matrix is often more than ten thousand. It is not practical for R-LDA to process such large covariance matrix, especially when the computing platform is not sufficiently powerful. Huang et al. [22] introduced a more efficient null space method. The basic notion behind the method is that the null space of Sw is particularly effective in terms of discriminating ability, whereas, that of the between-class scatter Sb is useless. They proved that the null space of the total scatter matrix St is the common null space of both Sw and Sb. However, the method is often criticized for the high storage requirement and computational cost in such application as face recognition or facial expression recognition. Chen et al. [23] argued that eigenvectors corresponding to eigenvalues equal to zero or close to zero of Sw contain the most discriminant information. Under this context, Yu and Yang [24] proposed a direct linear discriminant analysis method by diagonalising the between-class scatter matrix first and then diagonalising the within-class scatter matrix.

It is known that between-class scatter and within-class scatter are two important measures of the separability of the projected samples. There are two different principles to jointly optimize the two measures: multiplicative and additive. By applying the former principle to the multi-objective programming problem, we can obtain the Fisher discriminant criterion, e.g. LDA, R-LDA. The latter uses a difference criterion, in which the projection axes are attempted to be found from between-class scatter deducing within-class scatter. Song et al. [25] put forward a binary discriminant criterion, the maximum scatter difference (MSD), for pattern recognition, which utilizes the generalized scatter difference rather than the generalized Raleigh quotient as a class separability measure to avoid the singularity problem when suffering small sample size problem. The drawback of MSD classifier is that, being a binary classifier, it cannot be applied directly to multiple-class classification tasks. In addition, the efficiency of MSD will be greatly affected with the variation in the number of classes. Thus, it is not suitable for large-scale pattern recognition tasks. In [26], Song and colleagues generalizes the classification-oriented binary criterion to multiple MSD (MMSD) discriminant criterion for feature extraction. MMSD computes its discriminant vectors from both the range of the between-class scatter matrix and the null space of the within-class scatter matrix. Although MMSD improves the speed and performance compared with MSD, it needs to calculate SVD twice and the computational complexity is still very high for databases of high dimensionality.

Tao al.et [27] made a comparison between the two criterions—quotient and difference finding that quotient-LDA can produce some optimal discriminant vectors which are uncorrelated mutually, but difference-LDA cannot. Therefore, we can find that the recognition rate of quotient-LDA is usually higher than that of difference-LDA under the situation of large sample size. However, as to quotient criterion, the denominator of Rayleigh quotient is sometimes unstable under the situation of small sample size. It leads to sharp decrease in recognition rate because the eigenvalues of Sw-1Sb is quite sensitive to the swing of Sw, if some eigenvalues of Sw approximately approach zeros. While in difference criterion, the stability of Sw matters little since the inverse of Sw is avoided. Yang et.al [28] proposed a Laplacian transform method based on difference criterion for feature extraction and recognition, which formulate the Laplacian between-class scatter matrix and Laplacian within-class scatter matrix based on image matrices, and the objective function of the proposed method is directly solved by generalized eigenvalue decomposition. Lu et.al [29] presented a discriminant locality preserving projections (DLPP) method based on difference criterion, DLPP seeks to maximize the difference between the locality preserving between-class scatter and locality preserving within-class scatter.

In this paper, the generalized MMSD is proposed to calculate the discriminant vectors. GMMSD employs QR decomposition rather than SVD. It not only avoids the high dimensional complex problem of MMSD, but also allows relatively-free selection of a suitable matrix to improve the performance of classification. We conduct detailed comparisons between GMMSD and other competing methods, in particular the comparisons to the methods based on difference criterion (i.e. MSD, MMSD). As such, we are able to provide performance evaluations of these methods for other researchers. In line with the essence of GMMSD, we also reveal its relationship with other feature extraction method, and we evaluate them under our experimental framework. Interesting results have been shown that, unlike MSD and MMSD, GMMSD performs more effectively and efficiently.

The remaining sections of this paper are organized as follows. In Section 2, we give a brief review of MMSD method for feature extraction. In Section 3, we propose a novel method for feature extraction and classification based on MMSD criterion, called generalized MMSD. We also reveal its relationship to a number of other key feature extraction methods in Section 4. In Section 5, experiments with face images and facial expression images are presented to demonstrate the effectiveness and efficiency of GMMSD method. Concluding remarks is given in Section 6.

Section snippets

Feature extraction method based on MMSD

Assume that the dataset given in Rm contains n samples from C classes xki,i=1,2,,nk, where nk denotes the sample size of the k-th class, k=1Cnk=n and xki is the i-th sample in the k-th class. The centroid of the k-th class is defined by μk=1nki=1nkxki and the total centroid of the dataset is defined by μ=1nk=1Ci=1nkxki. The between-class scatter matrix Sb and within-class scatter matrix Sw are then defined respectively as follows:Sb=k=1Cnk(μk-μ)(μk-μ)TSw=k=1Ci=1nk(xki-μk)(xki-μk)T

It is

The proposed method

In this Section, we first describe GMMSD discriminant criterion in Section 3.1, then present the feature extraction method based on GMMSD in Section 3.2. Finally, we compare the computational complexity and memory requirement of GMMSD with PCA+LDA and MMSD.

Theoretical analysis of GMMSD, N-LDA, and MMSD

To further investigate GMMSD feature extraction method, we reveal its relationship to other feature extraction methods.

Experiment results

Because the difference criterion is better for small sample size problems such as face recognition or facial expression recognition (while the quotient criterion is better than the difference criterion for large sample size problems such as the character recognition) [27], in this section, we evaluate the performance of the proposed method using two category databases: face database and facial expression database, where face database includes the Yale [34], AR [35] and FERET database [36], and

Conclusions

In this paper, we propose a novel method, GMMSD, to extract features for pattern recognition and apply it to small sample size problem. We compute the discriminant vectors from both range of whitenizated input data matrix A and the null space of the within-class scatter matrix using QR decomposition, instead of the computationally expensive SVD used by the original MMSD. Experimental results demonstrate that our method has good discriminating power and outperforms its direct competitors such as

Acknowledgments

The authors would like to thank Xin Guo for her help finalize the revision of the manuscript. The work is partially supported by the National Natural Science Foundation of China under Grant Nos. 61331021, 61071211, and the Canada Research Chair Program.

References (49)

  • M. Kyperountas et al.

    Salient feature and reliable classifier selection for facial expression classification

    Pattern Recogn.

    (2010)
  • R. Chellappa et al.

    Human and machine recognition of faces: a survey

    Proc. IEEE

    (1995)
  • R. Fisher

    The use of multiple measurements in taxonomic problems

    Ann. Human Genet.

    (1936)
  • D. Tao et al.

    Geometric mean for subspace selection

    IEEE Trans. Pattern Anal. Machine Intell.

    (2009)
  • J.B. Tenenbaum et al.

    A global geometric framework for nonlinear dimensionality reduction

    Science

    (2000)
  • D. De Ridder, R.P. Duin, Locally linear embedding for classification, Pattern Recognition Group, Dept. of Imaging...
  • M. Belkin et al.

    Laplacian eigenmaps and spectral techniques for embedding and clustering

    Adv. Neural Inform. Process. Syst.

    (2001)
  • D.L. Donoho et al.

    Hessian eigenmaps: locally linear embedding techniques for high-dimensional data

    Proc. Natl. Acad. Sci.

    (2003)
  • P. Niyogi

    Locality preserving projections

    Adv. Neural Inform. Process. Syst.

    (2004)
  • X. He et al.

    Face recognition using laplacianfaces

    IEEE Trans. Pattern Anal. Machine Intell.

    (2005)
  • W. Liu et al.

    Multiview hessian regularization for image annotation

    IEEE Trans. Image Process.

    (2013)
  • N. Guan et al.

    Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent

    IEEE Trans. Image Process.

    (2011)
  • N. Guan et al.

    Nenmf: an optimal gradient method for nonnegative matrix factorization

    IEEE Trans. Signal Process.

    (2012)
  • T. Zhou et al.

    Double shrinking sparse dimension reduction

    IEEE Trans. Image Process.

    (2013)
  • Cited by (8)

    View all citing articles on Scopus
    View full text