Skip to main content
Log in

Graph Maximum Margin Criterion for Face Recognition

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

In this paper, we propose a graph maximum margin criterion (GMMC) which provides a unified method for overcoming the small sample size problem encountered by the algorithms interpreted in a general graph embedding framework. The proposed GMMC-based feature extraction algorithms compute the discriminant vectors by maximizing the difference between the graph between-class scatter matrix and the graph within-class scatter matrix and then the singularity problem is avoided. An efficient and stable algorithm for implementing GMMC is also proposed. We also reveal the eigenvalue distribution of GMMC. Experiments on the ORL, PIE and AR face databases show the effectiveness of the proposed GMMC-based feature extraction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic Press, Boston

    MATH  Google Scholar 

  2. Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  3. Raudys SJ, Jain AK (1991) Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans Pattern Anal Mach Intell 13(3):252–264

    Article  Google Scholar 

  4. Belhumeour PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

    Article  Google Scholar 

  5. Hong ZQ, Yang JY (1991) Optimal discriminant plane for a small number of samples and design method of classifier on the plane. Pattern Recognit 24(4):317–324

    Article  MathSciNet  Google Scholar 

  6. Friedman JH (1989) Regularized discriminant analysis. J Am Stat Assoc 84(405):165–175

    Article  MathSciNet  Google Scholar 

  7. Cai D, He X, Han J (2008) SRDA: an efficient algorithm for large scale discriminant analysis. IEEE Trans Knowl Data Eng 20(1):1–12

    Article  Google Scholar 

  8. Chen LF, Liao HYM, Ko MT, Yu GJ (2000) A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recognit 33(10):1713–1726

    Article  Google Scholar 

  9. Huang R, Liu Q, Lu H, Ma S (2002) Solving the small size problem of LDA. In: Proceedings of 16th international conference on pattern recognition. IEEE Computer Society, Quebec, pp 29–32

  10. Yu H, Yang J (2000) A direct LDA algorithm for high-dimensional data with application to face recognition. Pattern Recognit 33(1):1726–1731

    Google Scholar 

  11. Lu J, Plataniotis KN, Venetsanopoulos AN (2003) Face recognition using LDA-based algorithms. IEEE Trans Neural Netw 14(1):195–200

    Article  Google Scholar 

  12. Lotlikar R, Kothari R (2000) Fractional-step dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 22(6):623–627

    Article  Google Scholar 

  13. Zhang D, Heb J, Zhao Y, Luo Z, Du M (2014) Global plus local: a complete framework for feature extraction and recognition. Pattern Recognit 47(3):1433–1442

    Article  MATH  Google Scholar 

  14. Li H, Jiang T, Zhang K (2003) Efficient and robust feature extraction by maximum margin criterion. In: Advances in neural information processing systems. MIT, Cambridge

  15. Li H, Jiang T, Zhang K (2006) Efficient and robust feature extraction by maximum margin criterion. IEEE Trans Neural Netw 17(1):1157–1165

    Article  Google Scholar 

  16. Liu J, Chen SC, Tan XY (2008) A study on three linear discriminant analysis based methods in small sample size problem. Pattern Recognit 41(1):102–116

    Article  MathSciNet  MATH  Google Scholar 

  17. Yu W, Teng X, Liu C (2006) Face recognition using discriminant locality preserving projections. Image Vis Comput 24(3):239–248

    Article  Google Scholar 

  18. Lu G-F, Lin Z, Jin Z (2010) Face recognition using discriminant locality preserving projections based on maximum margin criterion. Pattern Recognit 43(10):3572–3579

    Article  MATH  Google Scholar 

  19. Yang W, Wang Z, Sun C (2015) A collaborative representation based projections method for feature extraction. Pattern Recognit 48(1):20–27

    Article  Google Scholar 

  20. Tenenbaum JB, Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323

    Article  Google Scholar 

  21. Roweis ST, Saul LK (2000) Nonlinear dimension reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  Google Scholar 

  22. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

    Article  MATH  Google Scholar 

  23. He X, Yan S, Hu Y, Niyogi P, Zhang H (2005) Face recognition using Laplacian faces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340

    Article  Google Scholar 

  24. Yan S, Xu D, Zhang B, Zhang H-J, Yang Q, Lin S (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51

    Article  Google Scholar 

  25. Chen HT, Chang HW, Liu TL (2005) Local discriminant embedding and its variants. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 846–853

  26. He X, Cai D, Yan S, Zhang H (2005) Neighborhood preserving embedding. In: Proceedings in international conference on computer vision (ICCV), pp 1208–1213

  27. Liu J, Chen SC, Tan X, Zhang D (2007) Comments on “Efficient and robust maximal margin criterion”. IEEE Trans Neural Netw 18(6):1862–1864

    Article  Google Scholar 

  28. Golub GH, Loan CFV (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore

    MATH  Google Scholar 

  29. Sim T, Baker S, Bsat M (2001) The CMU pose, illumination, and expression (PIE) database of human faces robotics. Technical Report CMU-RI-TR-01-02, Institute, Pittsburgh

  30. Martinez AM, Benavente R (1998) The AR face database. Technical Report CVC 24

  31. Zhang T, Fang B, Tang YY, Shang Z, Xu B (2010) Generalized discriminant analysis: a matrix exponential approach. IEEE Trans Syst Man Cybernet Part B 40(1):186–197

    Article  Google Scholar 

Download references

Acknowledgments

This research is supported by the NSFC of China (No. 61572033), the Natural Science Foundation of Education Department of Anhui Province of China (No. KJ2015ZD08), Anhui Provincial Natural Science Foundation (No. 1308085MF95).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gui-Fu Lu.

Appendix

Appendix

Theorem 1 Suppose \(u\in {\mathbb {R}}^{n\times 1}\) to be the eigenvector of the matrix \((L^{b}-\alpha L^{w})X^{T}X\) corresponding to the eigenvalue \(\lambda \). Then, Xu is the eigenvector of the matrix \(S_{b}^{L} -\alpha S_{w}^{L}\) corresponding to the eigenvalue \(\lambda \).

Proof

Since \(u\in {\mathbb {R}}^{n\times 1}\) is the eigenvector of the matrix \((L^{b}-\alpha L^{w})X^{T}X\) corresponding to the eigenvalue \(\lambda \), we have

$$\begin{aligned} \left( L^{b}-\alpha L^{w}\right) X^{T}Xu&=\lambda u \\ \Rightarrow X\left( L^{b}-\alpha L^{w}\right) X^{T}Xu&=\lambda Xu \\ \Rightarrow \left( S_{b}^{L} -\alpha S_{w}^{L}\right) Xu&=\lambda Xu \\ \end{aligned}$$

\(\square \)

Lemma 1 Let \(R_{w}\) and \(\tilde{R}_{w}\), respectively, be the orthonormal eigenvectors of \(\hat{{S}}_{w}^{L}\) corresponding to non-zero and zero eigenvalues, then \(\tilde{R}_{w}^{T} \hat{{S}}_{b}^{L} \tilde{R}_{w}\) is positive definite.

Proof

It is obvious the space spanned by \(\tilde{R}_{w}\) is the null space of \(\hat{{S}}_{w}^{L}\). Suppose \(\mathbf{a}\) is an arbitrary vector and \(\mathbf{a}\ne 0\), we have

$$\begin{aligned} \mathbf{a}^{T}\tilde{R}_{w}^{T} \hat{{S}}_{w}^{L} \tilde{R}_{w} \mathbf{a}=0 \end{aligned}$$
(9)

It is obvious that \(\hat{{S}}_{t}^{L}\) is positive definite, then we can obtain \(\tilde{R}_{w}^{T} \hat{{S}}_{t}^{L} \tilde{R}_{w}\) is also positive definite and

$$\begin{aligned} \mathbf{a}^{T}\tilde{R}_{w}^{T} \hat{{S}}_{t}^{L} \tilde{R}_{w} \mathbf{a}>0 \end{aligned}$$
(10)

By combining Eqs. (9), (10) and \(\tilde{R}_{w}^{T} \hat{{S}}_{t}^{L} \tilde{R}_{w} =\tilde{R}_{w}^{T} \hat{{S}}_{w}^{L} \tilde{R}_{w} +\tilde{R}_{w}^{T} \hat{{S}}_{b}^{L} \tilde{R}_{w}\), we get

$$\begin{aligned} \mathbf{a}^{T}\tilde{R}_{w}^{T} \hat{{S}}_{b}^{L} \tilde{R}_{w} \mathbf{a}>0 \end{aligned}$$
(11)

Then we can obtain \(\tilde{R}_{w}^{T} \hat{{S}}_{b}^{L} \tilde{R}_{w}\) is positive definite \(\square \)

Theorem 2 \(\delta (S_{b}^{L} -\alpha S_{w}^{L} )\ge d-r_{t}\), where \(r_{t} =rank(S_{t}^{L})\)

Proof

Since column vectors in \(\tilde{Q}_{t}\) are the eigenvectors of \(S_{t}^{L}\) corresponding to zero eigenvalues, we have

$$\begin{aligned} S_{t}^{L} \tilde{Q}_{t} =0, \tilde{Q}_{t}^{T} S_{t}^{L} \tilde{Q}_{t} =0 \end{aligned}$$
(12)

Since \(S_{w}^{L}\) and \(S_{b}^{L}\) are both positive semi-definite and \(S_{t}^{L} =S_{w}^{L} +S_{b}^{L}\), we can obtain

$$\begin{aligned} \tilde{Q}_{t}^{T} S_{t}^{L} \tilde{Q}_{t}= & {} 0\Leftrightarrow \tilde{Q}_{t}^{T} (S_{w}^{L} +S_{b}^{L} )\tilde{Q}_{t} =0\Leftrightarrow \tilde{Q}_{t}^{T} S_{w}^{L} \tilde{Q}_{t} +\tilde{Q}_{t}^{T} S_{b}^{L} \tilde{Q}_{t} =0\Leftrightarrow \tilde{Q}_{t}^{T} S_{w}^{L} \tilde{Q}_{t}\nonumber \\= & {} 0, \tilde{Q}_{t}^{T} S_{b}^{L} \tilde{Q}_{t} =0 \end{aligned}$$
(13)

Then we have

$$\begin{aligned} S_{w}^{L} \tilde{Q}_{t} =0, S_{b}^{L} \tilde{Q}_{t} =0, \left( S_{b}^{L} -\alpha S_{w}^{L} \right) \tilde{Q}_{t} =0 \end{aligned}$$
(14)

Note that \(\tilde{Q}_{t}\) is a \(d\times (d-r_{t})\) matrix. Then we can obtain \(\delta (S_{b}^{L} -\alpha S_{w}^{L})\ge d-r_{t}\). \(\square \)

Lemma 3 Suppose \(\lambda \) to be an eigenvalue of \(\hat{{S}}_{b}^{L} -\alpha \hat{{S}}_{w}^{L}\). Then \(\lambda \) is also an eigenvalue of \(S_{b}^{L} -\alpha S_{w}^{L}\).

Proof

Suppose \(\varphi \) to be eigenvector of the matrix \(\hat{{S}}_{b}^{L} -\alpha \hat{{S}}_{w}^{L}\) corresponding to the eigenvalue \(\lambda \), we have

$$\begin{aligned}&\left( \hat{{S}}_{b}^{L} -\alpha \hat{{S}}_{w}^{L} \right) \varphi =\lambda \varphi \Rightarrow \nonumber \\&\quad Q_{t} Q_{t}^{T} \left( S_{b}^{L} -\alpha S_{w}^{L} \right) Q_{t} \varphi =\lambda Q_{t} \varphi \end{aligned}$$
(15)

From Eq. (14) we can obtain

$$\begin{aligned} \tilde{Q}_{t} \tilde{Q}_{t}^{T} S_{b}^{L} =0, \tilde{Q}_{t} \tilde{Q}_{t}^{T} S_{w}^{L} =0 \end{aligned}$$
(16)

By combining Eqs. (15) and (16) we have

$$\begin{aligned} \left( Q_{t} Q_{t}^{T} + \tilde{Q}_{t} \tilde{Q}_{t}^{T} \right) \left( S_{b}^{L} -\alpha S_{w}^{L} \right) Q_{t} \varphi =\lambda Q_{t} \varphi \end{aligned}$$
(17)

Let \(P=[{\begin{array}{ll} {Q_{t}}&{} {\tilde{Q}_{t}} \\ \end{array}}]\), we have

$$\begin{aligned} PP^{T}=Q_{t} Q_{t}^{T} +\tilde{Q}_{t} \tilde{Q}_{t}^{T} \end{aligned}$$
(18)

Note that P is a unitary matrix, then we have

$$\begin{aligned} \left( S_{b}^{L} -\alpha S_{w}^{L} \right) Q_{t} \varphi =\lambda Q_{t} \varphi \end{aligned}$$
(19)

From Eq. (19) we know \(\lambda \) is also an eigenvalue of \(S_{b}^{L} -\alpha S_{w}^{L}\). \(\square \)

Theorem 3 \(\pi (S_{b}^{L} -\alpha S_{w}^{L} )\ge r_{t} -r_{w} ,\nu (S_{b}^{L} -\alpha S_{w}^{L} )\ge r_{t} -r_{b}\), where \(r_{b} =rank\left( {S_{b}^{L}} \right) \) and \(r_{w} =rank(S_{w}^{L})\), respectively.

Proof

Since \(r_{w} =rank(S_{w}^{L}), R_{w}\) is a \(r_{t} \times r_{w}\) matrix and \(\tilde{R}_{w}\) is a \(r_{t} \times (r_{t} -r_{w})\) matrix. Let \(P=[\tilde{R}_{w} R_{w}]\), we have

$$\begin{aligned} P^{T}\left( \hat{{S}}_{b}^{L} -\alpha \hat{{S}}_{w}^{L} \right) P=\left[ {\begin{array}{cc} {\tilde{R}_{w}^{T} \hat{{S}}_{b}^{L} \tilde{R}_{w}}&{} C \\ {C^{T}}&{} D \\ \end{array} }\right] \end{aligned}$$
(20)

where \(C=\tilde{R}_{w}^{T} (\hat{{S}}_{b}^{L} -\alpha \hat{{S}}_{w}^{L} )R_{w} \) is a \((r_{t} -r_{w} )\times r_{w}\) matrix and \(D=R_{w}^{T} (\hat{{S}}_{b}^{L} -\alpha \hat{{S}}_{w}^{L} )R_{w}\) is a \(r_{w} \times r_{w}\) matrix. From Lemma 1, we know \(\tilde{R}_w^T \hat{{S}}_b^L \tilde{R}_w \)is positive definite. Then there exists \((\tilde{R}_{w}^{T} \hat{{S}}_{b}^{L} \tilde{R}_{w} )^{-1}\). Let \(\tilde{P}\) be a matrix defined as

$$\begin{aligned} \tilde{P}=\left[ {\begin{array}{cc} {I_{1}}&{}\quad {-\left( \tilde{R}_{w}^{T} \hat{{S}}_{b}^{L} \tilde{R}_{w}\right) ^{-1}C} \\ O&{}\quad {I_{2}} \\ \end{array} }\right] \end{aligned}$$
(21)

where \(I_{1}\) is a \((r_{t} -r_{w} )\times (r_{t} -r_{w})\) identity matrix, \(I_{2}\) is a \(r_{w} \times r_{w}\) identity matrix, and O is a \(r_{w} \times (r_{t} -r_{w})\) zero matrix. Then we have

$$\begin{aligned} \tilde{P}^{T}P^{T}\left( \hat{{S}}_{b}^{L} -\alpha \hat{{S}}_{w}^{L} \right) P\tilde{P}=\left[ {\begin{array}{cc} {\tilde{R}_{w}^{T} \hat{{S}}_{b}^{L} \tilde{R}_{w} }&{} {O^{T}} \\ O&{} {D-C^{T}\left( \tilde{R}_{w}^{T} \hat{{S}}_{b}^{L} \tilde{R}_{w} \right) ^{-1}C} \\ \end{array} }\right] \end{aligned}$$
(22)

Note that \(\tilde{R}_{w}^{T} \hat{{S}}_{b}^{L} \tilde{R}_{w}\) is a positive definite \((r_{t} -r_{w} )\times (r_{t} -r_{w})\) matrix, from Eq. (22) we can obtain \(\pi (\tilde{P}^{T}P^{T}(\hat{{S}}_{b}^{L} -\alpha \hat{{S}}_{w}^{L} )P\tilde{P})\ge r_{t} -r_{w}\). Since P and \(\tilde{P}\) are both non-degenerate matrix, from Lemma 2, we can obtain \(\pi (\hat{{S}}_{b}^{L} -\alpha \hat{{S}}_{w}^{L} )\ge r_{t} -r_{w}\). By using Lemma 3 we can obtain \(\pi (S_{b}^{L} -\alpha S_{w}^{L} )\ge r_{t} -r_{w}\)

Following the similar deduction as \(\pi (S_{b}^{L} -\alpha S_{w}^{L})\ge r_{t} -r_{w}\), we can get \(\nu (S_{b}^{L} -\alpha S_{w}^{L} ) \ge r_{t}-r_{b}\) if we use \(R_{b}\) and \(\tilde{R}_{b}\) to substitute \(R_{w}\) and \(\tilde{R}_{w}\), respectively. \(\square \)

Lemma 4 If \(S_{w}^{L}\) is nonsingular, then the function \(f(\alpha )\) is strictly monotonic decreasing.

Proof

Let \(0<\alpha _{1} <\alpha _{2} ,\mathbf{w}_{1}\) and \(\mathbf{w}_{2}\) are the eigenvectors corresponding to the maximal eigenvalues of \((S_{b}^{L} -\alpha _{1} S_{w}^{L})\) and \((S_{b}^{L} -\alpha _{2} S_{w}^{L})\), respectively. Then we have

$$\begin{aligned} f(\alpha _{1} )&=\mathbf{w}_{1}^{T} \left( S_{b}^{L} -\alpha _{1} S_{w}^{L} \right) \mathbf{w}_{1} \ge \mathbf{w}_{2}^{T}\left( S_{b}^{L} -\alpha _{1} S_{w}^{L}\right) \mathbf{w}_{2} \nonumber \\&=\mathbf{w}_{2}^{T}\left( S_{b}^{L} -\alpha _{2} S_{w}^{L} \right) \mathbf{w}_{2} +(\alpha _{2} -\alpha _{1} )\mathbf{w}_{2}^{T}S_{w}^{L} \mathbf{w}_{2} \nonumber \\&=f(\alpha _{2} )+ \left( \alpha _{2} -\alpha _{1} \right) \mathbf{w}_{2}^{T}S_{w}^{L} \mathbf{w}_{2} \end{aligned}$$
(23)

Since \(S_{w}^{L}\) is nonsingular, we have \(\mathbf{w}_{2}^{T}S_{w}^{L} \mathbf{w}_{2} >0, (\alpha _{2} -\alpha _{1} )\mathbf{w}_{2}^{T}S_{w}^{L} \mathbf{w}_{2} >0\) and \(f(\alpha _{1} )>f(\alpha _{2})\). This proves the strictly monotonic decreasing property of \(f(\alpha )\). \(\square \)

Theorem 4 The eigenvector corresponding to the maximal eigenvalue of \(S_{b}^{L} -\alpha _{0} S_{w}^{L}\) is equivalent to the eigenvector corresponding to the maximal eigenvalue of \((S_{w}^{L} )^{-1}S_{b}^{L}\).

Proof

Suppose \(\mathbf{w}^{{*}}\) is the eigenvector corresponding to the maximal eigenvalue of \((S_{w}^{L} )^{-1}S_{b}^{L}\) and \(c_{1} =\frac{(\mathbf{w}^{{*}})^{T}S_{b}^{L} \mathbf{w}^{{*}}}{(\mathbf{w}^{{*}})^{T}S_{w}^{L} \mathbf{w}^{{*}}}\), where \(c_{1}\) is also the maximal eigenvalue of \((S_{w}^{L} )^{-1}S_{b}^{L}\). Then we have

$$\begin{aligned} (\mathbf{w}^{{*}})^{T}S_{b}^{L} \mathbf{w}^{{*}}-c_{1} (\mathbf{w}^{{*}})^{T}S_{w}^{L} \mathbf{w}^{{*}}=0 \end{aligned}$$
(24)

and

$$\begin{aligned} \frac{\mathbf{w}^{T}S_{b}^{L} \mathbf{w}}{\mathbf{w}^{T}S_{w}^{L} \mathbf{w}}\le c_{1} , \mathbf{w}^{T}S_{b}^{L} \mathbf{w}-c_{1} \mathbf{w}^{T}S_{w}^{L} \mathbf{w}\le 0 \end{aligned}$$
(25)

where \(\mathbf{w}\) is an arbitrary projection vector. By combining Eqs. (24) and (25) we have

$$\begin{aligned} (\mathbf{w}^{{*}})^{T}S_{b}^{L} \mathbf{w}^{{*}}-c_{1} (\mathbf{w}^{{*}})^{T}S_{w}^{L} \mathbf{w}^{{*}}=0\ge \mathbf{w}^{T}S_{b}^{L} \mathbf{w}-c_{1} \mathbf{w}^{T}S_{w}^{L} \mathbf{w} \end{aligned}$$
(26)

From Eq. (26) we know \(\mathbf{w}^{{*}}\) is also the eigenvector corresponding to the maximal eigenvalue of \(S_{b}^{L} -c_{1} S_{w}^{L}\). That is

$$\begin{aligned} f(c_{1} )=(\mathbf{w}^{{*}})^{T}S_{b}^{L} \mathbf{w}^{{*}}-c_{1} (\mathbf{w}^{{*}})^{T}S_{w}^{L} \mathbf{w}^{{*}}=0 \end{aligned}$$
(27)

Note that the zero point of \(f(\alpha )\) is unique, then we have \(c_1 =\alpha _{0}\) \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, GF., Wang, Y. & Zou, J. Graph Maximum Margin Criterion for Face Recognition. Neural Process Lett 44, 387–405 (2016). https://doi.org/10.1007/s11063-015-9464-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-015-9464-z

Keywords

Navigation