Elsevier

Pattern Recognition

Volume 66, June 2017, Pages 144-152
Pattern Recognition

Pose-invariant face recognition with homography-based normalization

https://doi.org/10.1016/j.patcog.2016.11.024Get rights and content

Highlights

  • We propose a highly efficient and accurate pose normalization approach for pose-invariant face recognition.

  • This is the first time that homography is utilized for face synthesis.

  • The proposed approach covers the full range of pose variations within ±90° of yaw.

  • The proposed approach outperforms existing methods on four popular face databases.

Abstract

Pose-invariant face recognition (PIFR) refers to the ability that recognizes face images with arbitrary pose variations. Among existing PIFR algorithms, pose normalization has been proved to be an effective approach which preserves texture fidelity, but usually depends on precise 3D face models or at high computational cost. In this paper, we propose an highly efficient PIFR algorithm that effectively handles the main challenges caused by pose variation. First, a dense grid of 3D facial landmarks are projected to each 2D face image, which enables feature extraction in an pose adaptive manner. Second, for the local patch around each landmark, an optimal warp is estimated based on homography to correct texture deformation caused by pose variations. The reconstructed frontal-view patches are then utilized for face recognition with traditional face descriptors. The homography-based normalization is highly efficient and the synthesized frontal face images are of high quality. Finally, we propose an effective approach for occlusion detection, which enables face recognition with visible patches only. Therefore, the proposed algorithm effectively handles the main challenges in PIFR. Experimental results on four popular face databases demonstrate that the propose approach performs well on both constrained and unconstrained environments.

Introduction

Face recognition is one of the most important biometric techniques. It has wide potential in many real-world applications, e.g., video surveillance, access control systems, forensics and security, and social networks [1], [2], [3], [4], [5], [6], [7], [8], [9]. The key advantage of face recognition lies in its non-intrusive property, which means it can work in a passive manner. However, the downside of this property is that the appearance of face images is vulnerable to a number of factors, e.g., pose, illumination, occlusion, and expression variations [10]. In particular, pose variation is the primary stumbling block to realizing the full potential of face recognition, as argued in a recent survey [11]. In this paper, we study the pose-invariant face recognition (PIFR) problem, which targets at recognizing face images captured under arbitrary poses.

Pose variation dramatically changes the appearance of face images. The appearance difference caused by pose variations usually exceeds the intrinsic appearance difference between subjects. As illustrated in Fig. 1, pose variation results in displacement of facial components, non-linear texture warping, and self-occlusion. Besides, pose variation is often combined with other factors, e.g, image blur and illumination variation, to jointly affect face recognition, as shown in Fig. 2. To handle these challenges, a number of PIFR approaches have been proposed. Among existing approaches, pose normalization is advantageous as it produces pose-free faces with high fidelity, and usually requires no training data. Existing pose normalization approaches can be divided into two categories: 2D methods [12], [13], [14] and 3D methods [15], [16], [17], [18]. As the face is essentially a 3D object, the appearance change caused by pose variation can be modeled more accurately with an ideal 3D face model. However, 3D modeling from a single 2D face image is an ill-posed problem and thus difficult in practice. Another disadvantage of 3D methods is that they depend on complicated computer graphics techniques for face image rendering. In comparison, 2D methods conduct pose normalization within the 2D image domain. Due to the lack of one degree of freedom, accurate pose normalization within the 2D image domain is difficult. Existing 2D methods usually adopt computationally expensive algorithms, e.g., Markov Random Fields (MRF) [14], Lucas–Kanade algorithm [12], to promote accuracy in pose normalization.

In this paper, we propose a novel pose normalization approach that combines the advantages of both 3D methods and 2D methods. In our approach, a grid of dense 3D facial landmarks are projected to the 2D image by aligning five semantically corresponding facial landmarks between the face image and a generic 3D face model. The grid of facial landmarks efficiently establishes dense correspondence of face images across pose. Next, by assuming the local patch around each facial landmark is a simple planar surface, the transformation of the local patches across pose is efficiently approximated by homography based on landmarks in the patch. With the estimated transformation, the non-linear texture warping across pose is corrected. Compared with existing 2D pose normalization methods, e.g., Markov Random Fields (MRF) [14], Lucas–Kanade algorithm [12], the homography-based method estimates the local warp quite efficiently.

The above method reconstructs frontal-view face image patches from un-occluded facial textures. Existing feature extraction methods, e.g., local descriptors, can be applied on the corrected face patches to compose the face representation. Therefore, occlusion detection is important to distinguish occluded facial textures from visible facial textures. We further propose a method for occlusion detection and a scheme to extract fixed-length face representations from pose varied face images. Based on face symmetry, we extract patch-level features from both the original face image and the horizontally flipped version. For each patch pair of the two images, their features are fused by weighting according to their visibility. The patch-level feature vectors are then concatenated to compose the complete face representation. In this way, we obtain a fixed-length face representation for each face, regardless of their poses. The advantage of this method is that we can make the best of visible facial textures for face recognition.

In this paper, we term the homography-based pose normalization method as HPN. The remainder of the paper is organized as follows: Section 2 briefly reviews related works for PIFR. The proposed HPN method is illustrated in Section 3. Face representation based on HPN is described in Section 4. Experimental results are presented in Section 5, leading to conclusions in Section 6.

Section snippets

Related works

A number of approaches have been proposed to solve the PIFR problem from various perspectives. Among existing works, pose-robust feature extraction and pose normalization are the two most important categories of methods. For a comprehensive review of existing methods, we direct readers to a recent survey [11]. In this section, we only review the most relevant works to this paper.

Methods falling in the pose-robust feature extraction category can be further divided into two types: handcrafted

Homography-based pose normalization

In this section, we describe the HPN approach for patch-wise frontal-view synthesis. The main idea is assuming a local patch on the face is a planar surface; therefore its different views across pose are related by homography, whose parameters can be estimated from a set of semantically corresponding facial landmarks. The flowchart of HPN is illustrated in Fig. 3, Fig. 4. First, we align each 2D face image and a generic 3D face model by orthogonal projection and obtain a dense grid of pose

HPN-based face representation

We extract features from each synthesized frontal-view patch, respectively. The type of features is flexible. In this paper, we mainly employ the Dual-Cross Patterns (DCP) [33] descriptor for feature extraction. In detail, for each pose normalized patch, we extract DCP histogram feature from a cell of size N×N pixels (NM) centering around the central landmark. The DCP feature vectors extracted from all patches are concatenated to form the representation of the face image.

Pose variation results

Experiments

In this section, we conduct extensive experiments to justify the effectiveness of HPN. Two categories of experiments are conducted. First, face identification experiments are conducted on the three most popular databases for the PIFR research, i.e., FERET [34], CMU-PIE [35], and Multi-PIE [36]. Images in the three databases were captured under laboratory environments. The gallery set for each database is composed of frontal face images. Probe images are divided into different sets according to

Conclusion

Wide-range pose variation is a major challenge for fully-automatic face recognition. Among existing approaches, pose normalization is an effective solution, because it reserves high-fidelity facial textures at no cost of training data. In this paper, we propose a highly efficient pose normalization approach named HPN which is based on homography. HPN effectively handles the three major challenges for PIFR, i.e., loss of semantic correspondence, non-linear facial texture warping, and occlusion;

Acknowledgment

This work is supported by Australian Research Council Projects FT-130101457 and DP-140102164.

Changxing Ding received the Ph.D. degree from the University of Technology Sydney, Australia. His research interests include computer vision, machine learning, and especially focus on face recognition.

References (50)

  • F. Schroff, D. Kalenichenko, J. Philbin, Facenet: a unified embedding for face recognition and clustering, in:...
  • R. He et al.

    Two-stage nonnegative sparse representation for large-scale face recognition

    IEEE Trans. Neural Netw. Learn. Syst.

    (2013)
  • Y. Sun et al.

    Complementary cohort strategy for multimodal face pair matching

    IEEE Trans. Inf. Forensics Secur.

    (2016)
  • D.F. Smith et al.

    Face recognition on consumer devicesreflections on replay attacks

    IEEE Trans. Inf. Forensics Secur.

    (2015)
  • C. Ding et al.

    A comprehensive survey on pose-invariant face recognition

    ACM Trans. Intell. Syst. Technol.

    (2016)
  • A.B. Ashraf, S. Lucey, T. Chen, Learning patch correspondences for improved viewpoint invariant face recognition, in:...
  • H. Gao, H.K. Ekenel, R. Stiefelhagen, Pose normalization for local appearance-based face recognition, in: Proceedings...
  • H.T. Ho et al.

    Pose-invariant face recognition using Markov random fields

    IEEE Trans. Image Process.

    (2013)
  • V. Blanz et al.

    Face recognition based on fitting a 3d morphable model

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2003)
  • M.W. Lee et al.

    Pose-invariant face recognition using a 3d deformable model

    Pattern Recognit.

    (2003)
  • C. Ding et al.

    Multi-task pose-invariant face recognition

    IEEE Trans. Image Process.

    (2015)
  • Z. Cao, Q. Yin, X. Tang, J. Sun, Face recognition with learning-based descriptor, in: Proceedings of IEEE Conference on...
  • D. Chen, X. Cao, F. Wen, J. Sun, Blessing of dimensionality: high-dimensional feature and its efficient compression for...
  • D. Yi, Z. Lei, S.Z. Li, Towards pose robust face recognition, in: Proceedings of IEEE Conference on Computer Vision and...
  • S.R. Arashloo et al.

    Energy normalization for pose-invariant face recognition based on mrf model image matching

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • Cited by (124)

    • MVS-STRNet: Cross-view Space Target Recognition from Multi-View Stereo

      2024, Proceedings of SPIE - The International Society for Optical Engineering
    View all citing articles on Scopus

    Changxing Ding received the Ph.D. degree from the University of Technology Sydney, Australia. His research interests include computer vision, machine learning, and especially focus on face recognition.

    Dacheng Tao is a Professor of computer science with the Centre for Artificial Intelligence, and the Faculty of Engineering and Information Technology in the University of Technology Sydney. He mainly applies statistics and mathematics to data analytics problems and his research interests spread across computer vision, data science, image processing, machine learning, and video surveillance. His research results have expounded in one monograph and 100+ publications at prestigious journals and prominent conferences, such as IEEE TPAMI, T-NNLS, T-IP, JMLR, IJCV, NIPS, ICML, CVPR, ICCV, ECCV, AISTATS, ICDM; and ACM SIGKDD, with several best paper awards, such as the best theory/algorithm paper runner up award in IEEE ICDM07, the best student paper award in IEEE ICDM13, and the 2014 ICDM 10 Year Highest-Impact Paper Award.

    View full text