Abstract
In this paper, we propose a graph maximum margin criterion (GMMC) which provides a unified method for overcoming the small sample size problem encountered by the algorithms interpreted in a general graph embedding framework. The proposed GMMC-based feature extraction algorithms compute the discriminant vectors by maximizing the difference between the graph between-class scatter matrix and the graph within-class scatter matrix and then the singularity problem is avoided. An efficient and stable algorithm for implementing GMMC is also proposed. We also reveal the eigenvalue distribution of GMMC. Experiments on the ORL, PIE and AR face databases show the effectiveness of the proposed GMMC-based feature extraction methods.
Similar content being viewed by others
References
Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic Press, Boston
Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York
Raudys SJ, Jain AK (1991) Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans Pattern Anal Mach Intell 13(3):252–264
Belhumeour PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Hong ZQ, Yang JY (1991) Optimal discriminant plane for a small number of samples and design method of classifier on the plane. Pattern Recognit 24(4):317–324
Friedman JH (1989) Regularized discriminant analysis. J Am Stat Assoc 84(405):165–175
Cai D, He X, Han J (2008) SRDA: an efficient algorithm for large scale discriminant analysis. IEEE Trans Knowl Data Eng 20(1):1–12
Chen LF, Liao HYM, Ko MT, Yu GJ (2000) A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recognit 33(10):1713–1726
Huang R, Liu Q, Lu H, Ma S (2002) Solving the small size problem of LDA. In: Proceedings of 16th international conference on pattern recognition. IEEE Computer Society, Quebec, pp 29–32
Yu H, Yang J (2000) A direct LDA algorithm for high-dimensional data with application to face recognition. Pattern Recognit 33(1):1726–1731
Lu J, Plataniotis KN, Venetsanopoulos AN (2003) Face recognition using LDA-based algorithms. IEEE Trans Neural Netw 14(1):195–200
Lotlikar R, Kothari R (2000) Fractional-step dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 22(6):623–627
Zhang D, Heb J, Zhao Y, Luo Z, Du M (2014) Global plus local: a complete framework for feature extraction and recognition. Pattern Recognit 47(3):1433–1442
Li H, Jiang T, Zhang K (2003) Efficient and robust feature extraction by maximum margin criterion. In: Advances in neural information processing systems. MIT, Cambridge
Li H, Jiang T, Zhang K (2006) Efficient and robust feature extraction by maximum margin criterion. IEEE Trans Neural Netw 17(1):1157–1165
Liu J, Chen SC, Tan XY (2008) A study on three linear discriminant analysis based methods in small sample size problem. Pattern Recognit 41(1):102–116
Yu W, Teng X, Liu C (2006) Face recognition using discriminant locality preserving projections. Image Vis Comput 24(3):239–248
Lu G-F, Lin Z, Jin Z (2010) Face recognition using discriminant locality preserving projections based on maximum margin criterion. Pattern Recognit 43(10):3572–3579
Yang W, Wang Z, Sun C (2015) A collaborative representation based projections method for feature extraction. Pattern Recognit 48(1):20–27
Tenenbaum JB, Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323
Roweis ST, Saul LK (2000) Nonlinear dimension reduction by locally linear embedding. Science 290(5500):2323–2326
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
He X, Yan S, Hu Y, Niyogi P, Zhang H (2005) Face recognition using Laplacian faces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340
Yan S, Xu D, Zhang B, Zhang H-J, Yang Q, Lin S (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51
Chen HT, Chang HW, Liu TL (2005) Local discriminant embedding and its variants. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 846–853
He X, Cai D, Yan S, Zhang H (2005) Neighborhood preserving embedding. In: Proceedings in international conference on computer vision (ICCV), pp 1208–1213
Liu J, Chen SC, Tan X, Zhang D (2007) Comments on “Efficient and robust maximal margin criterion”. IEEE Trans Neural Netw 18(6):1862–1864
Golub GH, Loan CFV (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore
Sim T, Baker S, Bsat M (2001) The CMU pose, illumination, and expression (PIE) database of human faces robotics. Technical Report CMU-RI-TR-01-02, Institute, Pittsburgh
Martinez AM, Benavente R (1998) The AR face database. Technical Report CVC 24
Zhang T, Fang B, Tang YY, Shang Z, Xu B (2010) Generalized discriminant analysis: a matrix exponential approach. IEEE Trans Syst Man Cybernet Part B 40(1):186–197
Acknowledgments
This research is supported by the NSFC of China (No. 61572033), the Natural Science Foundation of Education Department of Anhui Province of China (No. KJ2015ZD08), Anhui Provincial Natural Science Foundation (No. 1308085MF95).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Theorem 1 Suppose \(u\in {\mathbb {R}}^{n\times 1}\) to be the eigenvector of the matrix \((L^{b}-\alpha L^{w})X^{T}X\) corresponding to the eigenvalue \(\lambda \). Then, Xu is the eigenvector of the matrix \(S_{b}^{L} -\alpha S_{w}^{L}\) corresponding to the eigenvalue \(\lambda \).
Proof
Since \(u\in {\mathbb {R}}^{n\times 1}\) is the eigenvector of the matrix \((L^{b}-\alpha L^{w})X^{T}X\) corresponding to the eigenvalue \(\lambda \), we have
\(\square \)
Lemma 1 Let \(R_{w}\) and \(\tilde{R}_{w}\), respectively, be the orthonormal eigenvectors of \(\hat{{S}}_{w}^{L}\) corresponding to non-zero and zero eigenvalues, then \(\tilde{R}_{w}^{T} \hat{{S}}_{b}^{L} \tilde{R}_{w}\) is positive definite.
Proof
It is obvious the space spanned by \(\tilde{R}_{w}\) is the null space of \(\hat{{S}}_{w}^{L}\). Suppose \(\mathbf{a}\) is an arbitrary vector and \(\mathbf{a}\ne 0\), we have
It is obvious that \(\hat{{S}}_{t}^{L}\) is positive definite, then we can obtain \(\tilde{R}_{w}^{T} \hat{{S}}_{t}^{L} \tilde{R}_{w}\) is also positive definite and
By combining Eqs. (9), (10) and \(\tilde{R}_{w}^{T} \hat{{S}}_{t}^{L} \tilde{R}_{w} =\tilde{R}_{w}^{T} \hat{{S}}_{w}^{L} \tilde{R}_{w} +\tilde{R}_{w}^{T} \hat{{S}}_{b}^{L} \tilde{R}_{w}\), we get
Then we can obtain \(\tilde{R}_{w}^{T} \hat{{S}}_{b}^{L} \tilde{R}_{w}\) is positive definite \(\square \)
Theorem 2 \(\delta (S_{b}^{L} -\alpha S_{w}^{L} )\ge d-r_{t}\), where \(r_{t} =rank(S_{t}^{L})\)
Proof
Since column vectors in \(\tilde{Q}_{t}\) are the eigenvectors of \(S_{t}^{L}\) corresponding to zero eigenvalues, we have
Since \(S_{w}^{L}\) and \(S_{b}^{L}\) are both positive semi-definite and \(S_{t}^{L} =S_{w}^{L} +S_{b}^{L}\), we can obtain
Then we have
Note that \(\tilde{Q}_{t}\) is a \(d\times (d-r_{t})\) matrix. Then we can obtain \(\delta (S_{b}^{L} -\alpha S_{w}^{L})\ge d-r_{t}\). \(\square \)
Lemma 3 Suppose \(\lambda \) to be an eigenvalue of \(\hat{{S}}_{b}^{L} -\alpha \hat{{S}}_{w}^{L}\). Then \(\lambda \) is also an eigenvalue of \(S_{b}^{L} -\alpha S_{w}^{L}\).
Proof
Suppose \(\varphi \) to be eigenvector of the matrix \(\hat{{S}}_{b}^{L} -\alpha \hat{{S}}_{w}^{L}\) corresponding to the eigenvalue \(\lambda \), we have
From Eq. (14) we can obtain
By combining Eqs. (15) and (16) we have
Let \(P=[{\begin{array}{ll} {Q_{t}}&{} {\tilde{Q}_{t}} \\ \end{array}}]\), we have
Note that P is a unitary matrix, then we have
From Eq. (19) we know \(\lambda \) is also an eigenvalue of \(S_{b}^{L} -\alpha S_{w}^{L}\). \(\square \)
Theorem 3 \(\pi (S_{b}^{L} -\alpha S_{w}^{L} )\ge r_{t} -r_{w} ,\nu (S_{b}^{L} -\alpha S_{w}^{L} )\ge r_{t} -r_{b}\), where \(r_{b} =rank\left( {S_{b}^{L}} \right) \) and \(r_{w} =rank(S_{w}^{L})\), respectively.
Proof
Since \(r_{w} =rank(S_{w}^{L}), R_{w}\) is a \(r_{t} \times r_{w}\) matrix and \(\tilde{R}_{w}\) is a \(r_{t} \times (r_{t} -r_{w})\) matrix. Let \(P=[\tilde{R}_{w} R_{w}]\), we have
where \(C=\tilde{R}_{w}^{T} (\hat{{S}}_{b}^{L} -\alpha \hat{{S}}_{w}^{L} )R_{w} \) is a \((r_{t} -r_{w} )\times r_{w}\) matrix and \(D=R_{w}^{T} (\hat{{S}}_{b}^{L} -\alpha \hat{{S}}_{w}^{L} )R_{w}\) is a \(r_{w} \times r_{w}\) matrix. From Lemma 1, we know \(\tilde{R}_w^T \hat{{S}}_b^L \tilde{R}_w \)is positive definite. Then there exists \((\tilde{R}_{w}^{T} \hat{{S}}_{b}^{L} \tilde{R}_{w} )^{-1}\). Let \(\tilde{P}\) be a matrix defined as
where \(I_{1}\) is a \((r_{t} -r_{w} )\times (r_{t} -r_{w})\) identity matrix, \(I_{2}\) is a \(r_{w} \times r_{w}\) identity matrix, and O is a \(r_{w} \times (r_{t} -r_{w})\) zero matrix. Then we have
Note that \(\tilde{R}_{w}^{T} \hat{{S}}_{b}^{L} \tilde{R}_{w}\) is a positive definite \((r_{t} -r_{w} )\times (r_{t} -r_{w})\) matrix, from Eq. (22) we can obtain \(\pi (\tilde{P}^{T}P^{T}(\hat{{S}}_{b}^{L} -\alpha \hat{{S}}_{w}^{L} )P\tilde{P})\ge r_{t} -r_{w}\). Since P and \(\tilde{P}\) are both non-degenerate matrix, from Lemma 2, we can obtain \(\pi (\hat{{S}}_{b}^{L} -\alpha \hat{{S}}_{w}^{L} )\ge r_{t} -r_{w}\). By using Lemma 3 we can obtain \(\pi (S_{b}^{L} -\alpha S_{w}^{L} )\ge r_{t} -r_{w}\)
Following the similar deduction as \(\pi (S_{b}^{L} -\alpha S_{w}^{L})\ge r_{t} -r_{w}\), we can get \(\nu (S_{b}^{L} -\alpha S_{w}^{L} ) \ge r_{t}-r_{b}\) if we use \(R_{b}\) and \(\tilde{R}_{b}\) to substitute \(R_{w}\) and \(\tilde{R}_{w}\), respectively. \(\square \)
Lemma 4 If \(S_{w}^{L}\) is nonsingular, then the function \(f(\alpha )\) is strictly monotonic decreasing.
Proof
Let \(0<\alpha _{1} <\alpha _{2} ,\mathbf{w}_{1}\) and \(\mathbf{w}_{2}\) are the eigenvectors corresponding to the maximal eigenvalues of \((S_{b}^{L} -\alpha _{1} S_{w}^{L})\) and \((S_{b}^{L} -\alpha _{2} S_{w}^{L})\), respectively. Then we have
Since \(S_{w}^{L}\) is nonsingular, we have \(\mathbf{w}_{2}^{T}S_{w}^{L} \mathbf{w}_{2} >0, (\alpha _{2} -\alpha _{1} )\mathbf{w}_{2}^{T}S_{w}^{L} \mathbf{w}_{2} >0\) and \(f(\alpha _{1} )>f(\alpha _{2})\). This proves the strictly monotonic decreasing property of \(f(\alpha )\). \(\square \)
Theorem 4 The eigenvector corresponding to the maximal eigenvalue of \(S_{b}^{L} -\alpha _{0} S_{w}^{L}\) is equivalent to the eigenvector corresponding to the maximal eigenvalue of \((S_{w}^{L} )^{-1}S_{b}^{L}\).
Proof
Suppose \(\mathbf{w}^{{*}}\) is the eigenvector corresponding to the maximal eigenvalue of \((S_{w}^{L} )^{-1}S_{b}^{L}\) and \(c_{1} =\frac{(\mathbf{w}^{{*}})^{T}S_{b}^{L} \mathbf{w}^{{*}}}{(\mathbf{w}^{{*}})^{T}S_{w}^{L} \mathbf{w}^{{*}}}\), where \(c_{1}\) is also the maximal eigenvalue of \((S_{w}^{L} )^{-1}S_{b}^{L}\). Then we have
and
where \(\mathbf{w}\) is an arbitrary projection vector. By combining Eqs. (24) and (25) we have
From Eq. (26) we know \(\mathbf{w}^{{*}}\) is also the eigenvector corresponding to the maximal eigenvalue of \(S_{b}^{L} -c_{1} S_{w}^{L}\). That is
Note that the zero point of \(f(\alpha )\) is unique, then we have \(c_1 =\alpha _{0}\) \(\square \)
Rights and permissions
About this article
Cite this article
Lu, GF., Wang, Y. & Zou, J. Graph Maximum Margin Criterion for Face Recognition. Neural Process Lett 44, 387–405 (2016). https://doi.org/10.1007/s11063-015-9464-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-015-9464-z