Generalized multiple maximum scatter difference feature extraction using QR decomposition

doi:10.1016/j.jvcir.2014.04.009

Journal of Visual Communication and Image Representation

Volume 25, Issue 6, August 2014, Pages 1460-1471

https://doi.org/10.1016/j.jvcir.2014.04.009 Get rights and content

Highlights

•
GMMSD employs QR decomposition rather than SVD.
•
GMMSD allows relatively-free selection of a suitable matrix to reduce dimension.
•
We reveal GMMSD’s relationship to other feature extraction methods.
•
Experimental results are presented to demonstrate the effectiveness of GMMSD.

Abstract

Multiple maximum scatter difference (MMSD) discriminant criterion is an effective feature extraction method that computes the discriminant vectors from both the range of the between-class scatter matrix and the null space of the within-class scatter matrix. However, singular value decomposition (SVD) of two times is involved in MMSD, rendering this method impractical for high dimensional data. In this paper, we propose a generalized MMSD (GMMSD) criterion for feature extraction and classification. GMMSD allows relatively-free selection of a suitable transformation matrix to reduce dimensions. Based on GMMSD criterion, we demonstrate that the same discriminant information can be extracted by QR decomposition, which is more efficient than SVD. Next, GMMSD is compared with several classical feature extraction methods to justify the validity of the proposed method. Our experiments on three face databases and two facial expression databases demonstrate that GMMSD provides favorable recognition performance with high computational efficiency.

Introduction

In pattern recognition, feature extraction techniques are widely employed to reduce the dimensionality of data and to enhance the discrimination of information [1]. Generally, the methods can be divided into two classes, subspace-based methods and manifold learning-based methods. The popular linear methods include linear discriminant analysis (LDA) [2] and its extensions, including the proposal of geometric mean for subspace selection to obtain potential discriminative information [3]. The manifold learning methods, mostly developed in the past decade, include feature Isometric mapping (Isomap) [4], locally linear embedding (LLE) [6], laplacian eigenmaps (LEM) [6], Hessian Eigenmaps [7], locality preserving projection (LPP) [8], [9] and Multiview Hessian [10], [11]. Nonnegative matrix factorization method [12], [13] and double shrinking model based data compression method [14] were introduced to manifold learning for encoding geometric structure of the data and interpreting features for improved recognition results. While the manifold learning methods likely lead to better results, many of them require tedious parameter tuning in order to get good performance [15], [16] and some lack the ability to deal with the testing data directly [17]. In addition, in order to take advantage of the two classes, there is a novel proposal to reformulate most of the existing subspace methods and manifold methods into a unified form [18], [19].

LDA is one of the most important linear method for pattern recognition and representation which maximizes the ratio of the trace of the between-class scatter matrix to the trace of the within-class scatter matrix. However, for a task with very high-dimensional data such as facial images, conventional LDA method may suffer from the problem of singularity. To avoid this problem, PCA has been applied to reduce the dimensions of the high dimensional vector space before employing the LDA method [20]. While the PCA seeks projections which are optimal for image reconstruction from a low dimensional space, it may remove dimensions that contain discriminant information required for pattern recognition and lead to low recognition accuracy.When the within scatter matrix $S_{w}$ is singular or ill-conditioned, Friedman [21] added a diagonal matrix $α$ I for $α > 0$ to $S_{w}$ . Since $S_{w}$ is a symmetric positive semi-definite matrix, $S_{w} + α$ I is nonsingular for any $α > 0$ . Therefore we can employ this so-called R-LDA method to solve the singularity problem. The weakness of R-LDA is that the dimensionality of covariance matrix is often more than ten thousand. It is not practical for R-LDA to process such large covariance matrix, especially when the computing platform is not sufficiently powerful. Huang et al. [22] introduced a more efficient null space method. The basic notion behind the method is that the null space of $S_{w}$ is particularly effective in terms of discriminating ability, whereas, that of the between-class scatter $S_{b}$ is useless. They proved that the null space of the total scatter matrix $S_{t}$ is the common null space of both $S_{w}$ and $S_{b}$ . However, the method is often criticized for the high storage requirement and computational cost in such application as face recognition or facial expression recognition. Chen et al. [23] argued that eigenvectors corresponding to eigenvalues equal to zero or close to zero of $S_{w}$ contain the most discriminant information. Under this context, Yu and Yang [24] proposed a direct linear discriminant analysis method by diagonalising the between-class scatter matrix first and then diagonalising the within-class scatter matrix.

It is known that between-class scatter and within-class scatter are two important measures of the separability of the projected samples. There are two different principles to jointly optimize the two measures: multiplicative and additive. By applying the former principle to the multi-objective programming problem, we can obtain the Fisher discriminant criterion, e.g. LDA, R-LDA. The latter uses a difference criterion, in which the projection axes are attempted to be found from between-class scatter deducing within-class scatter. Song et al. [25] put forward a binary discriminant criterion, the maximum scatter difference (MSD), for pattern recognition, which utilizes the generalized scatter difference rather than the generalized Raleigh quotient as a class separability measure to avoid the singularity problem when suffering small sample size problem. The drawback of MSD classifier is that, being a binary classifier, it cannot be applied directly to multiple-class classification tasks. In addition, the efficiency of MSD will be greatly affected with the variation in the number of classes. Thus, it is not suitable for large-scale pattern recognition tasks. In [26], Song and colleagues generalizes the classification-oriented binary criterion to multiple MSD (MMSD) discriminant criterion for feature extraction. MMSD computes its discriminant vectors from both the range of the between-class scatter matrix and the null space of the within-class scatter matrix. Although MMSD improves the speed and performance compared with MSD, it needs to calculate SVD twice and the computational complexity is still very high for databases of high dimensionality.

Tao al.et [27] made a comparison between the two criterions—quotient and difference finding that quotient-LDA can produce some optimal discriminant vectors which are uncorrelated mutually, but difference-LDA cannot. Therefore, we can find that the recognition rate of quotient-LDA is usually higher than that of difference-LDA under the situation of large sample size. However, as to quotient criterion, the denominator of Rayleigh quotient is sometimes unstable under the situation of small sample size. It leads to sharp decrease in recognition rate because the eigenvalues of $S_{w}^{- 1}$ $S_{b}$ is quite sensitive to the swing of $S_{w}$ , if some eigenvalues of $S_{w}$ approximately approach zeros. While in difference criterion, the stability of $S_{w}$ matters little since the inverse of $S_{w}$ is avoided. Yang et.al [28] proposed a Laplacian transform method based on difference criterion for feature extraction and recognition, which formulate the Laplacian between-class scatter matrix and Laplacian within-class scatter matrix based on image matrices, and the objective function of the proposed method is directly solved by generalized eigenvalue decomposition. Lu et.al [29] presented a discriminant locality preserving projections (DLPP) method based on difference criterion, DLPP seeks to maximize the difference between the locality preserving between-class scatter and locality preserving within-class scatter.

In this paper, the generalized MMSD is proposed to calculate the discriminant vectors. GMMSD employs QR decomposition rather than SVD. It not only avoids the high dimensional complex problem of MMSD, but also allows relatively-free selection of a suitable matrix to improve the performance of classification. We conduct detailed comparisons between GMMSD and other competing methods, in particular the comparisons to the methods based on difference criterion (i.e. MSD, MMSD). As such, we are able to provide performance evaluations of these methods for other researchers. In line with the essence of GMMSD, we also reveal its relationship with other feature extraction method, and we evaluate them under our experimental framework. Interesting results have been shown that, unlike MSD and MMSD, GMMSD performs more effectively and efficiently.

The remaining sections of this paper are organized as follows. In Section 2, we give a brief review of MMSD method for feature extraction. In Section 3, we propose a novel method for feature extraction and classification based on MMSD criterion, called generalized MMSD. We also reveal its relationship to a number of other key feature extraction methods in Section 4. In Section 5, experiments with face images and facial expression images are presented to demonstrate the effectiveness and efficiency of GMMSD method. Concluding remarks is given in Section 6.

Section snippets

Feature extraction method based on MMSD

Assume that the dataset given in $R^{m}$ contains n samples from C classes $x_{k}^{i}, i = 1, 2, \dots, n_{k}$ , where $n_{k}$ denotes the sample size of the k-th class, $\sum_{k = 1}^{C} n_{k} = n$ and $x_{k}^{i}$ is the i-th sample in the k-th class. The centroid of the k-th class is defined by $μ_{k} = \frac{1}{n_{k}} \sum_{i = 1}^{n_{k}} x_{k}^{i}$ and the total centroid of the dataset is defined by $μ = \frac{1}{n} \sum_{k = 1}^{C} \sum_{i = 1}^{n_{k}} x_{k}^{i}$ . The between-class scatter matrix $S_{b}$ and within-class scatter matrix $S_{w}$ are then defined respectively as follows: $S_{b} = \sum_{k = 1}^{C} n_{k} (μ_{k} - μ) {(μ_{k} - μ)}^{T}$ $S_{w} = \sum_{k = 1}^{C} \sum_{i = 1}^{n_{k}} (x_{k}^{i} - μ_{k}) {(x_{k}^{i} - μ_{k})}^{T}$

It is

The proposed method

In this Section, we first describe GMMSD discriminant criterion in Section 3.1, then present the feature extraction method based on GMMSD in Section 3.2. Finally, we compare the computational complexity and memory requirement of GMMSD with PCA+LDA and MMSD.

Theoretical analysis of GMMSD, N-LDA, and MMSD

To further investigate GMMSD feature extraction method, we reveal its relationship to other feature extraction methods.

Experiment results

Because the difference criterion is better for small sample size problems such as face recognition or facial expression recognition (while the quotient criterion is better than the difference criterion for large sample size problems such as the character recognition) [27], in this section, we evaluate the performance of the proposed method using two category databases: face database and facial expression database, where face database includes the Yale [34], AR [35] and FERET database [36], and

Conclusions

In this paper, we propose a novel method, GMMSD, to extract features for pattern recognition and apply it to small sample size problem. We compute the discriminant vectors from both range of whitenizated input data matrix A and the null space of the within-class scatter matrix using QR decomposition, instead of the computationally expensive SVD used by the original MMSD. Experimental results demonstrate that our method has good discriminating power and outperforms its direct competitors such as

Acknowledgments

The authors would like to thank Xin Guo for her help finalize the revision of the manuscript. The work is partially supported by the National Natural Science Foundation of China under Grant Nos. 61331021, 61071211, and the Canada Research Chair Program.

References (49)

W. Liu et al.
Multiview hessian discriminative sparse coding for image annotation
Comput. Vis. Image Understand.
(2014)
O. Samko et al.
Selection of the optimal parameter value for the isomap algorithm
Pattern Recogn. Lett.
(2006)
L. Chen et al.
A new lda-based face recognition system which can solve the small sample size problem
Pattern Recogn.
(2000)
H. Yu et al.
A direct lda algorithm for high-dimensional data-with application to face recognition
Pattern Recogn.
(2001)
Y. Tao et al.
Quotient vs. difference: comparison between the two discriminant criteria
Neurocomputing
(2010)
W. Yang et al.
Feature extraction based on laplacian bidirectional maximum margin criterion
Pattern Recogn.
(2009)
G. Lu et al.
Face recognition using discriminant locality preserving projections based on maximum margin criterion
Pattern Recogn.
(2010)
D. Chu et al.
A new and fast implementation for null space based linear discriminant analysis
Pattern Recogn.
(2010)
J. Yang et al.
Why can lda be performed in pca transformed space?
Pattern Recogn.
(2003)
W. Gu et al.
Facial expression recognition using radial encoding of local gabor features and classifier synthesis
Pattern Recogn.
(2012)

M. Kyperountas et al.

Salient feature and reliable classifier selection for facial expression classification

Pattern Recogn.

(2010)

R. Chellappa et al.

Human and machine recognition of faces: a survey

Proc. IEEE

(1995)

R. Fisher

The use of multiple measurements in taxonomic problems

Ann. Human Genet.

(1936)

D. Tao et al.

Geometric mean for subspace selection

IEEE Trans. Pattern Anal. Machine Intell.

(2009)

J.B. Tenenbaum et al.

A global geometric framework for nonlinear dimensionality reduction

Science

(2000)

D. De Ridder, R.P. Duin, Locally linear embedding for classification, Pattern Recognition Group, Dept. of Imaging...

M. Belkin et al.

Laplacian eigenmaps and spectral techniques for embedding and clustering

Adv. Neural Inform. Process. Syst.

(2001)

D.L. Donoho et al.

Hessian eigenmaps: locally linear embedding techniques for high-dimensional data

Proc. Natl. Acad. Sci.

(2003)

P. Niyogi

Locality preserving projections

Adv. Neural Inform. Process. Syst.

(2004)

X. He et al.

Face recognition using laplacianfaces

IEEE Trans. Pattern Anal. Machine Intell.

(2005)

W. Liu et al.

Multiview hessian regularization for image annotation

IEEE Trans. Image Process.

(2013)

N. Guan et al.

Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent

IEEE Trans. Image Process.

(2011)

N. Guan et al.

Nenmf: an optimal gradient method for nonnegative matrix factorization

IEEE Trans. Signal Process.

(2012)

T. Zhou et al.

Double shrinking sparse dimension reduction

IEEE Trans. Image Process.

(2013)

Cited by (8)

Incremental generalized multiple maximum scatter difference with applications to feature extraction
2018, Journal of Visual Communication and Image Representation
In this paper, we propose a new algorithm to implement the generalized multiple maximum scatter difference (GMMSD). Due to enhanced features of this algorithm over the original GMMSD, we named it GMMSD+. By employing a different projection from both the range of the between-class scatter matrix and the null space of the within-class scatter matrix, GMMSD+ can divide the centroid vector of each class into two components: intrinsic common component (ICC) and discriminant difference component (DCC), and then automatically discards ICC which contains little discriminative information, while keeping DCC which contains the true discriminative power. Next, we introduce a practical implementation of GMMSD+, which can accurately and efficiently update the discriminant vectors with new training samples incrementally, eliminating the complete re-computation of the training process. Our experiments demonstrate that incremental version of GMMSD+(IGMMSD+) eliminates the complete re-computation of the training process when new training samples are presented, leading to significantly reduced computational cost.
Comprehensive study of face recognition using feature extraction and fusion face technique
2023, Concepts and Techniques of Graph Neural Networks
Fractional-weighted entropy-based fuzzy G-2DLDA algorithm: a new facial feature extraction method
2023, Multimedia Tools and Applications
A new fuzzy and Gaussian distribution induced two-directional inverse FDA for feature extraction and face recognition
2022, International Journal of Advanced Intelligence Paradigms
A novel approach to fuzzy-based facial feature extraction and face recognition
2019, Informatica (Slovenia)
Deep convolution neural network recognition algorithm based on maximum scatter difference criterion
2017, Communications in Computer and Information Science

View all citing articles on Scopus

View full text

Generalized multiple maximum scatter difference feature extraction using QR decomposition

Highlights

Abstract

Introduction

Section snippets

Feature extraction method based on MMSD

The proposed method

Theoretical analysis of GMMSD, N-LDA, and MMSD

Experiment results

Conclusions

Acknowledgments

Comput. Vis. Image Understand.

Pattern Recogn. Lett.

Pattern Recogn.

Pattern Recogn.

Neurocomputing

Pattern Recogn.

Pattern Recogn.

Pattern Recogn.

Pattern Recogn.

Pattern Recogn.

Pattern Recogn.

Human and machine recognition of faces: a survey

Proc. IEEE

The use of multiple measurements in taxonomic problems

Ann. Human Genet.

Geometric mean for subspace selection

IEEE Trans. Pattern Anal. Machine Intell.

A global geometric framework for nonlinear dimensionality reduction

Science

Laplacian eigenmaps and spectral techniques for embedding and clustering

Adv. Neural Inform. Process. Syst.

Hessian eigenmaps: locally linear embedding techniques for high-dimensional data

Proc. Natl. Acad. Sci.

Locality preserving projections

Adv. Neural Inform. Process. Syst.

Face recognition using laplacianfaces

IEEE Trans. Pattern Anal. Machine Intell.

Multiview hessian regularization for image annotation

IEEE Trans. Image Process.

Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent

IEEE Trans. Image Process.

Nenmf: an optimal gradient method for nonnegative matrix factorization

IEEE Trans. Signal Process.

Double shrinking sparse dimension reduction

IEEE Trans. Image Process.