Elsevier

Neurocomputing

Volume 214, 19 November 2016, Pages 944-957
Neurocomputing

Robust face recognition via sparse boosting representation

https://doi.org/10.1016/j.neucom.2016.06.071Get rights and content

Abstract

Recently linear representation provides an effective way for robust face recognition. However, the existing linear representation methods cannot make an adaptive adjustment in responding to the variations on facial image, so the generalization ability of these methods is limited. In this paper, we propose a sparse boosting representation classification (SBRC) for robust face recognition. To improve the effectiveness of representation coding, an error detection machine (EDM) with multiple error detectors (ED) in SBRC, is proposed to detect and remove destroyed features (i.e. pixels) on a testing image. SBRC has three advantages: First, it has good generalization ability, since the EDM can self-adjust the number of ED according to different variations; Second, EDM would boost the sparsity of coding vector; Third, its implementation is simple and efficient as the EDM is based on l2norm. In addition, five popular face image databases including AR database, Extended Yale B database, ORL database, FERET database and LFW database were applied to validate the performance of SBRC. The superiority of SBRC is confirmed by comparing it with the state-of-the-art face recognition methods.

Introduction

Face recognition (FR) technique has attracted extensive attentions in pattern recognition and computer vision since it has wide real-world applications, for example, the mobile payment and the video surveillance. And several well-known methods are proposed in the past decades [1], [2], [3], [4], [5], [6], [7], e.g. Eigenface [1], Fisherface [3]. These methods have good performance when the face images are taken under a constrained environment. However, these methods seem to be incapable when a probe facial image includes some unconstrained factors, e.g. illumination, occlusion and corruption. To improve the feasibility of FR technique in practice systems, a number of studies are published. These researches attempt to improve the robustness of FR from two perspectives, i.e. feature extraction and classifier.

Traditional feature extraction methods, such as Eigenface [1] and Gabor [8], are not easy to handle the complex variations in facial images. Therefore, some more robust features are proposed in recent years [9], [10], [11], [12], [13], [14]. For instance, to eliminate the influence of illumination variation effectively, Zhang et al. [13] proposed Gradientfaces. Moreover, a novel descriptor based on local phase quantization (LPQ), which can deal with blur robustly, is proposed by Chan et al. [14]. Existing classifiers are used to seek the category of a query image in the feature space, e.g. nearest neighbor (NN) [15] and support vector machines (SVM) [16]. Since NN is a simple but efficient classifier, it is widely employed in FR. However, the performance of this classifier is unstable in practice, since this classifier only uses a little information of training set to evaluate the testing image, i.e. only one sample is utilized in process of classification. For achieve more stable matching, two training samples and three training samples per subject are employed to estimate the category of the probe in NFL [17] and in NSP [18], respectively. Moreover, in Ref. [19], Naseem et al. used all samples of a subject to linearly represent the testing image, i.e. seeking the nearest subspace of the testing. Nevertheless, the discriminative information in the training samples has not been fully utilized by LRC yet.

Recently, collaborative representation (CR) based on FR approaches become a very popular way to solve the complex problems of FR under uncontrolled environment [20], [21], [22], [23], [24], [25], [26], [27]. Different from LRC, a testing image y is represented by all of the training samples A in CR, i.e. yAx, where x is a coding vector. Therefore, the methods based on CR are able to take more advantages from training set to conduct recognition than LRC, i.e. CR utilizes not only the similarity between training samples and testing sample, but also the correlation among the training samples. A key problem of these CR-based methods is how to measure distance between the training samples and the query image, i.e. e=Axy. In [20], [21], the difference between training images and a testing image is measured by l2-norm, i.e. d=||e||22. Moreover, in [28], [29], l1-norm is used to estimate the representation residuals, i.e. d=||e||1.

When the distribution of e is Gaussian or Laplacian, it is reasonable to use l2-norm or l1-norm as error function, respectively. These error functions could fail when the distribution of e is changed due to the contamination on the query image incurred by corruption or occlusion. Consequently, many complex models based on robust regression theory [30], [31], [32] are employed to measure the difference between training samples and a probe. In Refs. [33], [34], Welsch M-estimator is used to estimate the representation residuals. For reducing the influence on measurement by outliers in e, in Refs. [35], [36], [37], the pixels in y are assigned different weights by logistic function. Since the large residuals are assigned small weights, these methods can obtain a more stable distance in some complex occasions, for example, corruption and occlusion. For the large residuals, these models clearly know how to deal with them, i.e. they are viewed as outliers. Nevertheless, it is hard for these models to assign accurate weight to the middle residuals (i.e. they are neither too large nor too small). The recognition result may be changed if the middle residuals have not been set to appropriate weights, and the weights of the middle residuals are controlled generally by manually designed parameters in these models. In other words, the performance of these methods is very sensitive to the values of the parameters. Furthermore, it is often very time-consuming to find an optimal parameter set for them.

Generally speaking, error modeling should be designed according to the distribution of errors. The previous models adopt a similar strategy for estimating representation error. That is, they propose an assumption of the distribution of representation residuals at first, and then design a corresponding error function. Since the error model cannot be changed flexibly as long as it is constructed, it is very inflexible for these methods to cope with the case that realistic distribution is different from the assumption. Although to adopt regularization term with sparse constraint can alleviate this problem, the type and location of noises on facial image are unfortunately unpredictable. Moreover, different variations of facial image may produce different residual distributions. Hence, the generalization of these models is still problematic.

To address the above problems, we propose a sparse boosting representation based on classification (SBRC) for robust face recognition in this paper. SBRC contains two parts: an error detection machine (EDM) and a classifier. The EDM consists of multiple error detectors (e.g. l2norm), and the classifier is in the same category as collaborative representation based classifier (CRC) [21]. It is more flexible for SBRC than the previous models to deal with residuals with diversity distribution, since EDM can make appropriate adjustment (i.e. choosing different number of error detectors) according to different types of noises. In addition, each detector only detects and removes a few interfered pixels by non-Gaussian noises in the testing image. The weighting functions in Refs. [33], [35], [36], [37] can be viewed as a special EDM which only includes one error detector. Due to terribly contaminated pixels in testing facial image can be removed by EDM in SBRC, the sparsity of coding vector can be boosted automatically in classification phase. The main highlights of our work are summarized as follows:

  • 1)

    Good generalization ability. Our method has a very powerful EDM which can deal with different types of residuals by automatically choosing automatically different number of error detectors (ED). In other words, our model can make an adaptive adjustment according to variations in a testing facial image. In addition, EDM adopts the elimination rule which removes the largest residuals, so our method does not have to deal with the middle residuals. Therefore, this mechanism will ensure that the various errors of the query image can be precisely detected by the proposed approach. In other words, this model has strong generalization ability in practice.

  • 2)

    Sparse coding vector. Since the destroyed pixels on the testing are removed by EDM, the training samples belonging to the same class as the testing sample become the most competitive ones in the collaborative representation. Hence, our method can obtain an efficient and sparse representation coding vector in classification phase.

  • 3)

    Simple and effective implementation. The proposed method obtains stronger robustness than the ordinary linear regression models (OLRM) (i.e. its fidelity term is l2norm based), and it also has better generalization ability than the robust regression models (RRM) (i.e. its fidelity term employs robust function, for example, the logistic function). Compared with RRM, nevertheless, our method is more likely to be implemented. Because we do not have to worry about what error function is employed to estimate specific representation residuals, for the EDM can automatically generate an appropriate strategy for different types of noises. In addition, due to the fact that the proposed method only includes l2norm, we do not need to spend extra time on the parameter optimization for fidelity term and meanwhile, the solution for l2norm based objective function has an analytical solution.

The rest of this paper is organized as follows: in Section 2, we briefly review major works about OLRM and RRM. In Section 3, we present the proposed method. In Section 4, the advantage of our method in complex scenarios is analyzed. Section 5, we validate the performance of our method via extensive experiments and compare the proposed method with state-of-the-art-methods. Section 6 concludes the paper.

Section snippets

Related work

In this section, we briefly introduce two types of researches based on CR, i.e. OLRM and RRM. We assume the matrix of training samples A consists of n facial images which have m features (i.e. pixels) in this paper, i.e. A=[a1,a2,...,ai,...,an]m×n, ai presents the ith sample in A. CR implies that all of the training samples are used to linearly represent the testing sample y=[y1,y2,...,yi,...ym]Tm×1, i.e.y=a1x1+a2x2+...+aixi+...+anxn+ewhere e=[e1,e2,...,ei,...,em]T is an error vector, x=[x1,

Proposed method

In this section, we detail the EDM in SBRC how to detect noises (i.e. outliers) on the testing image at first and how to determine the number of EDs for a certain query image. Then, we describe the classifier of the proposed method. Finally, the time complex of SBRC is analyzed.

Analysis of our method

In this section, we analyze the merits of the proposed method from three aspects, i.e. generalization, sparsity and implementation. For better illuminating the specific of SBRC, we will introduce appropriately a part experiments. More experiments are shown in the next section, i.e. the experiment section.

Experiments

In this section, we will show the experimental results of SBRC in FR with variations such as illumination, expression, gesture and occlusion. For evaluating the proposed method, we compare it with six related state-of-the-art methods for robust FR. The compared methods include linear regression for classification (LRC) [19], CRC [21], sparse representation based classification (SRC) [20], correntropy-based sparse representation (CESR) [33], robust sparse coding (RSC) [35] and regularized robust

Conclusions

In this paper, a sparse boosting representation based classification (SBRC) is proposed for robust face recognition. SBRC can robustly deal with testing images distorted by variations including illumination changes, gesture changes, corruption, occlusion, and real disguises. In previous robust regression model (RRM), the distribution of the noises on the query image need be assumed before designing error estimation function. Therefore, the generalization of the approaches based on RRM is

Acknowledgments

This work was supported partially by the National Nature Science Foundation of China under Grant Nos. 61202276 and 61403053. And partially this research is funded by Chongqing Natural Science Foundation (the project No. is No. cstc2014jcyjA40018, cstc2014jcyjA40022, and cstc2014kjrcqnrc40002) and by Chongqing education committee under Grant No. KJ1500402 and KJ1500417.

Tao Liu received the B.S. degree in Chongqing University of Technology in 2013. He is currently working toward the M.S. degree in College of Computer Science and Technology of Chongqing University of Posts and Telecommunications in Chongqing. His research interests include pattern recognition and machine learning, with specific interest in face recognition.

References (53)

  • P.J. Phillips et al.

    The FERET database and evaluation procedure for face-recognition algorithms

    Image Vision. Comput.

    (1998)
  • M. Turk et al.

    Eigenfaces for recognition

    J. Cognit. Neurosci.

    (1991)
  • P.N. Belhumeur et al.

    Eigenfaces vs. fisherfaces: recognition using class specific linear projection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1997)
  • M.S. Bartlett, Independent component representations for face recognition, Face Image Analysis by Unsupervised...
  • C. Liu et al.

    Independent component analysis of Gabor features for face recognition

    IEEE Trans. Neural Netw.

    (2003)
  • J. Yang et al.

    Two-dimensional PCA: a new approach to appearance-based face representation and recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2004)
  • T. Ahonen et al.

    Face description with local binary patterns: application to face recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2006)
  • J.G. Daugman

    Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters

    J Opt. Soc. Am. A

    (1985)
  • W.K. Wong et al.

    Joint tensor feature analysis for visual object recognition

    IEEE Trans. Cybern.

    (2015)
  • Z. Lai et al.

    Multilinear sparse principal component analysis

    IEEE Trans. Neural Netw. Learn. Syst.

    (2014)
  • T.P. Zhang et al.

    Face recognition under varying illumination using gradientfaces

    IEEE Trans. Image Process.

    (2009)
  • C.H. Chan et al.

    Multiscale local phase quantization for robust component-based face recognition using kernel fusion of multiple descriptors

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • T.M. Cover et al.

    Nearest neighbor pattern classification

    IEEE Trans. Inf. Theory

    (1967)
  • B. Heisele, P. Ho, T. Poggio, Face recognition with support vector machines: Global versus component-based approach....
  • S.Z. Li et al.

    Face recognition using the nearest feature line method

    IEEE Trans. Neural Netw.

    (1999)
  • J.-T. Chien et al.

    Discriminant waveletfaces and nearest feature classifiers for face recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2002)
  • Cited by (22)

    • Robust principal component analysis with intra-block correlation

      2020, Neurocomputing
      Citation Excerpt :

      In the last decade, classical PCA [1] is the most widely used tool for dimensionality reduction, but is highly sensitive to grossly corrupted observations. Unfortunately, gross errors are now ubiquitous in practical applications such as face recognition [2], image inpainting [3], and video surveillance [4]. Robust Principle Component Analysis (RPCA) [5,6], as an extension of PCA, have been proposed to seek the low-dimensional structure from highly corrupted measurements with arbitrary magnitude entries.

    • Bilateral structure based matrix regression classification for face recognition

      2019, Neurocomputing
      Citation Excerpt :

      In Section 3.1, we have mentioned that IRLS procedure of our method is equivalent to HQM. In fact, Ref. [21] demonstrates its relationship to Error Detection (ED) [7,8,48]. As the name suggests, ED mechanism pays attention to find out corruptions and occlusions on input face image and tries to eliminate them from query, so as to improve the performance of classification.

    View all citing articles on Scopus

    Tao Liu received the B.S. degree in Chongqing University of Technology in 2013. He is currently working toward the M.S. degree in College of Computer Science and Technology of Chongqing University of Posts and Telecommunications in Chongqing. His research interests include pattern recognition and machine learning, with specific interest in face recognition.

    Jian-Xun Mi received the B.S. degree in Automation from Sichuan University (SCU), Chengdu, China in 2004 and pH.D degree in Pattern Recognition & Intelligent Systems from University of Science and Technology of China (USTC), Hefei, China in 2010. He worked at the Bio-Computing Research Center at Shenzhen Graduate School Harbin Institute of Technology, Shenzhen, China as a Postdoctoral Research Fellow from Sept. 2011 to Sept. 2013. Now he is an associate professor in Chongqing University of Posts and Telecommunications, Chongqing, China.

    Ying Liu received the B.S. degree in 2013. She is currently working toward the M.S. degree in College of Computer Science and Technology of Chongqing University of Posts and Telecommunications in Chongqing. Her research interests include pattern recognition and machine learning, with specific interest in face recognition.

    Chao Li received the B.S. degree in the Zhongyuan University of Technology in 2014. He is currently working toward the M.S. degree in College of Computer Science and Technology of Chongqing University of Posts and Telecommunications in Chongqing. His research interests include pattern recognition and machine learning, with specific interest in face recognition.

    View full text