Elsevier

Pattern Recognition

Volume 50, February 2016, Pages 1-16
Pattern Recognition

Learning discriminative singular value decomposition representation for face recognition

https://doi.org/10.1016/j.patcog.2015.08.010Get rights and content

Highlights

  • A novel singular value decomposition based representation method is provided.

  • An individual basis set for each image is built through the SVD technique.

  • A common set of singular values is learnt via a discriminant criterion.

  • Sequential quadratic programming (SQP) method is used to solve our model.

Abstract

Face representation is a critical step in face recognition. Recently, singular value decomposition (SVD) based representation methods have attracted researchers׳ attentions for their power of alleviating the facial variations. The SVD representation reveals that the SVD basis set is important for the recognition purpose and the corresponding singular values (SVs) are regulated to form a more effective representation image. However, there exists a common problem in the existing SVD based representation methods: they all empirically make a rule to regulate the SVs, which is obviously not optimal in theory. To address this problem, in this paper, we propose a novel method named learning discriminative singular value decomposition representation (LDSVDR) for face recognition. We build an individual SVD basis set for each image and then learn a common set of SVs by taking account of the information in the basis sets according to a discriminant criterion across the training images. The proposed model is solved by sequential quadratic programming (SQP) method. Extensive experiments are conducted on three popular face databases and the results demonstrate the effectiveness of our method when dealing with variations of illumination, occlusion, disguise and face sketch recognition task.

Introduction

Face recognition is a classical topic in computer vision and pattern recognition community for its great need in many areas, such as access control, human–machine interaction, law enforcement, surveillance and so on [1], [27]. Although great progress has been made by many researchers, it is still a challenging problem because of the large variations existed in the face images, e.g., variations in illumination conditions, poses, facial expressions and various noises (i.e. occlusion, corruption and disguise).

In the past few decades, subspace-based methods [4], [5], [6], [7], [8], [9], [10] have been received wide attention. As two most well-known subspace-based methods, Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) have achieved great success. As we know, PCA is a method designed to model linear variation in high-dimensional data. The purpose of PCA is to find the optimal linear projection that captures the directions of maximum variance in the data. In [4], Kirby et al. did the first attempt to apply PCA in face recognition and the well-known Eigenfaces method was proposed by Turk and Pentland [6]. Differing from PCA, LDA is a supervised learning method, which follows the criterion that the ratio of the between-class scatter and the within-class scatter is maximized. In [8], Belhumenur et al. proposed the well-known Fisherfaces method, which applied PCA procedure before the LDA procedure to avoid the small sample size problem. There are also some derivative methods [7], [30], [31], [32], which make a progress over the two classical methods. However, the subspace-based methods are very sensitive to large variations existed in face images.

Recently, several singular value decomposition (SVD) based representation methods have been proposed due to their ability to effectively alleviate the facial variations. Liu et al. [15] proposed a fractional order singular value decomposition representation (FSVDR) method, which comes from the observation that the leading SVD bases are sensitive to the facial variations. They applied a fractional function to deflate the weights of the facial variation sensitive bases and inflate the weights of the facial variation insensitive bases. However, since the performance of FSVDR relies on the fractional order parameter α, which is generated through an exhaustive strategy, it is unsuitable in real-world applications. Additionally, FSVDR considers the reconstructive power of each basis and utilizes all of the bases in face representation. Actually, those lagging bases are insensitive to the facial variations and contain little reconstructive and discriminative information, which may be regarded as the noise components and even affect the recognition performance. From this viewpoint, Lu et al. [20] proposed a dominant singular value decomposition representation (DSVDR) method. Unlike FSVDR that only uses the reconstructive power of bases, DSVDR decomposes the singular value spectrum of each face image into three subspaces and regulates the singular values (SVs) of those important bases according to their discriminative and reconstructive power simultaneously. More recently, Zhang et al. [13] proposed a simple but effective method named nearest orthogonal matrix representation (NOMR). In NOMR, the nearest orthogonal matrix of each image is calculated as its SVD representation, which keeps the original basis set but changes all of the SVs to be 1. The authors consider the individual basis space corresponds to the essence identity information and the singular values associate with illumination variations. Therefore, they think the SVs are not suitable for face recognition and directly advocate the basis set generated via SVD to identify the original face image. Although NOMR achieves some interesting results for alleviating the effect of illumination and heterogeneity, the way replacing all non-zero singular values with 1 not only deflates the variations in the leading sensitive bases but also the discriminant information contained in the leading image. Therefore, NOMR may be unsuitable in some cases. Actually, NOMR can be seen as a special case in FSVDR. When α=0, all of the SVs are regulated as 1, then FSVDR is equivalent to NORM.

Generally speaking, these former SVD representations [13], [15], [20] empirically make a rule to regulate the SVs without utilizing the information in the SVD basis set, which is obviously not optimal in theory. In fact, SVD representation is composed of the SVs and the SVD basis set. Each SV specifies the luminance of the image layer while the corresponding basis specifies the geometry of the image layer [3]. That is to say, the basis set mostly contains inherent information of the original face image, which is dominant for face recognition. Therefore, in order to get better SVD representation, it seems more reasonable to regulate the SVs according to the information in the basis set. To address this problem, we propose a novel method named learning discriminative singular value decomposition representation (LDSVDR) for face recognition. Specifically, we build an individual SVD basis set for each image and then learn a common set of SVs by taking account of the information in the basis sets according to a discriminant criterion, derived from the fisher criterion [28], across the training images. Since [13], [15], [20] empirically regulate the SVs, it is difficult for these methods to handle different variations in different databases. However, in our proposed method, according to a discriminant criterion that maximizes the ratio of between-class distance to that of within-class distance, discriminative representations can be learnt via taking account of the variations, such as illumination, occlusion and disguise, in the training process, which brings the robustness to different variations. The proposed LDSVDR is solved via sequential quadratic programming (SQP) method [33], [40]. Experiments on variations of illumination, different kinds of occlusions, face sketch recognition task and real disguises are conducted on three public face databases. Our proposed LDSVDR achieves the best results in most cases and is more stable than the other SVD representations under different variations.

The rest of this paper is organized as follows. Section 2 briefly reviews three related works. Section 3 presents the details of LDSVDR. Section 4 conducts extensive experiments to demonstrate the efficacy of our method and Section 5 offers our conclusions.

Section snippets

Related work

In this section, we briefly review FSVDR [15], DSVDR [20] and NOMR [13].

Let A be a m×n (mn without loss of generality) grayscale face image. Seen from Eq. (1), the singular value decomposition (SVD) technique can be defined asA=U¯S¯V¯T,where U¯=[u1,,um]Rm×m and V¯=[v1,,vn]Rn×n are orthogonal matrices, S¯=[D0]T, D=diag(λ1,...,λn), 0 is a n×(mn) zero matrix and λi,i=1:n is the SV of matrix A.

Let k=rank(A), we can further represent A asA=USVT=i=1kλiuiviT,where U=[u1,,uk]Rm×k, V=[v1,,vk]R

SVD basis set is dominant for face recognition

As we know, SVD divides the original face image into two parts: the SVs and the basis set. In the past decades, several face recognition methods [16], [35], [39] used SVs as the representation feature and achieved interesting results in small sample size databases. However, these methods have never been tested on large face databases and their effectiveness on large databases remains unknown (especially, when there are variations in illumination and viewpoint). In [29], Tian et al. raised a

Experiments

Three renowned face databases are used in our experiments, including the Extended Yale B database [22], the CUHK face sketch database [24] and the AR face database [21]. Here, our method is tested and compared with 13 other popular methods. Among these methods, two of them come from the subspace-based methods, which are eigenfaces [6] and fisherfaces [8]; three of them come from the local descriptor-based methods, which are local binary pattern (LBP) [12], Gabor [11] and histogram of

Conclusions

This paper presents a novel method named learning discriminative singular value decomposition representation (LDSVDR). We build an individual SVD basis set for each image and then learn a common set of singular values by taking account of the information in the basis sets according to a discriminant criterion across the training images. The main contribution of this paper is the introduction of the discriminant learning process, which brings the robustness to different variations. The proposed

Acknowledgments

This work was partially supported by the National Science Fund for Distinguished Young Scholars under Grant nos. 61125305, 91420201, 61472187, 61233011 and 61373063, the Key Project of Chinese Ministry of Education under Grant no. 313030, the 973 Program No. 2014CB349303, Fundamental Research Funds for the Central Universities No. 30920140121005, and Program for Changjiang Scholars and Innovative Research Team in University No. IRT13072.

Ying Tai received the B.S. degree in the School of Computer Science and Engineering from Nanjing University of Science and Technology (NUST), Nanjing, China, in 2012. Currently, he is pursuing the Ph.D. degree in NUST. His current research interests include pattern recognition, computer vision, and especially face recognition.

References (45)

  • E. Ganic, A.M. Eskicioglu, Robust DWT–SVD domain image watermarking: embedding data in all frequencies, in: Proceedings...
  • M. Kirby et al.

    Application of the Karhunen–Loeve procedure for the characterization of human faces

    Trans. PAMI

    (1990)
  • A. Martinez et al.

    Pca versus lda

    IEEE Trans. PAMI

    (2001)
  • M. Turk et al.

    Eigenfaces for recognition

    J. Cogn. Neurosci.

    (1991)
  • J. Yang et al.

    Two-dimensional pca: a new approach to face representation and recognition

    IEEE Trans. PAMI

    (2004)
  • P. Belhumenur et al.

    Eigenfaces vs. fisherfaces: recognition using class specific linear projection

    IEEE Trans. PAMI

    (1997)
  • X. He et al.

    Face recognition using laplacian faces

    IEEE Trans. PAMI

    (2005)
  • J. Yang et al.

    Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics

    IEEE Trans. PAMI

    (2007)
  • C. Liu et al.

    Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition

    IEEE Trans. Image Process.

    (2002)
  • T. Ahonen et al.

    Face description with local binary patterns: application to face recognition

    IEEE Trans. PAMI

    (2006)
  • N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, In: CVPR,...
  • Y. Wang et al.

    Face identification based on singular values decomposition and data fusion

    Chin. J. Comput.

    (2000)
  • Cited by (0)

    Ying Tai received the B.S. degree in the School of Computer Science and Engineering from Nanjing University of Science and Technology (NUST), Nanjing, China, in 2012. Currently, he is pursuing the Ph.D. degree in NUST. His current research interests include pattern recognition, computer vision, and especially face recognition.

    Jian Yang (M׳08) received the B.S. degree in mathematics from the Xuzhou Normal University in 1995. He received the M.S. degree in applied mathematics from the Changsha Railway University in 1998 and the Ph.D. degree from the Nanjing University of Science and Technology (NUST), on the subject of pattern recognition and intelligence systems in 2002. In 2003, he was a postdoctoral researcher at the University of Zaragoza. From 2004 to 2006, he was a Postdoctoral Fellow at Biometrics Centre of Hong Kong Polytechnic University. From 2006 to 2007, he was a Postdoctoral Fellow at Department of Computer Science of New Jersey Institute of Technology. Now, he is a professor in the School of Computer Science and Technology of NUST. He is the author of more than 80 scientific papers in pattern recognition and computer vision. His journal papers have been cited more than 1800 times in the ISI Web of Science, and 3000 times in the Web of Scholar Google. His research interests include pattern recognition, computer vision and machine learning. Currently, he is an Associate Editor of Pattern Recognition Letters and IEEE Trans. Neural Networks and Learning Systems, respectively.

    Lei Luo received the B.S. degree from Xinyang Normal University, Xinyang, China in 2008, the M.S. degree from Inner Nanchang University, Nanchang, China in 2011. He is currently pursuing the Ph.D. degree in pattern recognition and intelligence system from School of Computer Science and engineering, Nanjing University of Science and Technology, Nanjing, China. His current research interests include pattern recognition and optimization algorithm.

    Fanlong Zhang received the B.S. and M.S. degrees in 2007 and 2010, respectively. Currently, he is pursuing the Ph.D. degree with the School of Computer Science and Engineering, Nanjing University of Science and Technology (NUST), Nanjing, China. His current research interests include pattern recognition and optimization.

    Jianjun Qian received the B.S. and M.S. degrees in 2007 and 2010, respectively, and the Ph.D. degree in pattern recognition and intelligence systems from Nanjing University of Science and Technology (NUST), in 2014. Now, he is an Assistant Professor in the School of Computer Science and Engineering of NUST. His research interests include pattern recognition, computer vision, and especially face recognition.

    View full text