Elsevier

Neurocomputing

Volume 464, 13 November 2021, Pages 485-496
Neurocomputing

Deep representation alignment network for pose-invariant face recognition

https://doi.org/10.1016/j.neucom.2021.08.103Get rights and content

Abstract

With the recent developments in convolutional neural networks and the increasing amount of data, there has been great progress in face recognition. Nevertheless, unconstrained situations remain challenging, given their variations in illumination, expression, and pose. To handle such pose variation, we propose the deep representation alignment network (DRA-Net), which aligns the deep representation of the profile face with that of the frontal face. Comprised of a denoising autoencoder (DAE) and a deep representation transformation (DRT) block, DRA-Net uses end-to-end training. DAE recovers deep representations of large pose angle in not visible face areas, and the DRT block transforms the recovered deep representation from profile into near-frontal poses. Also, we implement cosine loss and use pairwise training to mitigate the gap between frontal and profile representations and reduce intra-class variation. In experimental results, DRA-Net outperforms other state-of-the-art methods, particularly for large pose angle on LFW, YTF, Multi-PIE, CFP, IJB-A, and M2FPA benchmarks.

Introduction

In the computer vision field, face recognition has played an important role for decades. Due to the complexity of the problem, the performance and flexibility are limited until the emergence of convolutional neural networks (CNNs). In recent years, although perfect frontal face recognition is possible in a controlled environment, in practice, the performance of face recognition is limited by uncontrollable factors such as illumination, expression, and large pose variations.

In this paper, we focus on the problem of pose variation. The corresponding research can be divided into improving the performance of stem CNN models and lessening the impact of pose variation. To enhance the performance of face recognition models, most methods adopt margin penalty, such as SphereFace [1], CosFace [2], and ArcFace [3] to learn a more discriminative feature space. Although state-of-the-art models trained using these loss functions do attain high accuracy on the IJB-A [4] benchmark, these models still result in mismatched pairs containing frontal and profile faces of the same identity.

As mentioned above, pose variation affects the performance of face recognition no matter how powerful the loss function is. Hence, researchers seek methods for pose-invariant face recognition that learn identity representations from arbitrary poses of the same person. Such pose-invariant face recognition methods can be accomplished from two aspects: face frontalization or pose-invariant representation learning.

The success in frontal face recognition suggests that face frontalization could be helpful for pose-invariant face recognition. As profile facial images include occluded face areas, resulting in a lack of face information, transforming or generating a frontal face from a profile face is difficult. Current methods are grouped into 2D image and 3D information methods. For the first group, 2D-based methods include generative adversarial networks (GANs) [5] which combine facial images and head pose information as inputs, using a generator which produces a synthesized frontal facial image by inputting an arbitrary pose facial image in an effort to mislead a discriminator, such as in disentangled representation learning GAN [6], pose-weighted GAN [7], and progressive pose normalization GAN [8]. When the generator has learned enough to fool the discriminator, the synthesized frontal facial image yielded by the generator can represent the same identity.

Three-dimensional-based methods, in turn, employ facial landmark detection and 3D morphable models (3DMMs) to reconstruct a 3D face model, after which the face models of arbitrary poses are transformed to frontal poses, such as in joint face alignment and 3D face reconstruction [9] and the 3D-aided deep pose-invariant face recognition model [10]. Nevertheless, as these methods suffer the loss of occluded textures, inpainting is usually employed to improve the synthesized facial image.

Pose-invariant representation learning is based on a common representation of the same identity from arbitrary poses. Hence, such methods take subspace mapping or pose variation disentanglement from identity representation approaches. Face frontalization differs from extracting pose-invariant representation in processing: face frontalization focuses on input image preprocessing, in contrast to the extraction of pose-invariant representation, which focuses on representation post-processing. The goal of the two methods is to convert arbitrary-pose facial images or representations into frontal images.

In this paper we propose the deep representation alignment network (DRA-Net), a representation alignment framework which incorporates a denoising autoencoder (DAE) and an innovative deep representation transformation (DRT) block to learn identity-preserving representations. We assume that frontal and profile deep representations are misaligned due to pose discrepancy. In [11], the notion of representation equivariance concerns the correspondence between input and representation. The main idea is that if an input image is produced using a geometric transformation, the representation will achieve a similar effect. Namely, we can use a function which reconstructs frontal facial images from arbitrary poses in the representation field. Due to the equivariance property, [12] speculates that the frontal and profile feature spaces have mapping connections and thus trains CNN models to learn the mapping function between features of arbitrary poses to the frontal feature space.

Inspired by these approaches, we formulate DRA-Net, which includes the DAE and DRT blocks to align a profile representation with a frontal representation. The objective of DAE, a carefully designed autoencoder, is to denoise pose-noisy features and recover occluded features. Likewise, the target of the DRT block, which is based on 2D image transformation, is to align profile and frontal representations. Based on the equivariance property, DRA-Net performs reconstruction and transformation on a deep representation, which is similar to performing these operation on an image. During training, DRA-Net learns the conversion between deep frontal and profile representations. Specifically, we use pairwise training and cosine loss to improve training stability. Additionally, we use singular value decomposition (SVD) in the DRT block to reduce the number of parameters and to prevent overfitting. In this work, DRA-Net achieves state-of-the-art performance on the LFW [13], YTF [14], Multi-PIE [15], CFP [16], IJB-A [4], and M2FPA [17] datasets.

Section snippets

Related work

In this section, we briefly review the related work on pose invariant face recognition.

Proposed method

In this section, we present the proposed DRA-Net, which learns the alignment between frontal and profile representations.

Experiments

We evaluate the proposed DRA-Net on six benchmarks for pose-invariant face recognition: (1) Labeled Faces in the Wild (LFW) [13], which contains different variations, most of which however are small head poses; (2) YouTube Faces Database (YTF) [14], consisting of different facial images in YouTube videos; (3) CMU Multi-PIE [15], a multi-view face recognition benchmark which includes large variations in illumination and expression, and there are two settings, setting 1 and setting 2, for

Conclusion

The proposed DRA-Net aligns deep face representations across different poses. The experimental results demonstrate the effectiveness of pose-invariant face recognition. To improve alignment robustness, we use DAE to recover occluded features and a DRT block to transform deep profile representations into frontal representations. Furthermore, we combine cosine loss and pairwise training to mitigate profile–frontal discrepancies. Deep face representation alignment is accomplished in a lightweight

CRediT authorship contribution statement

Chun-Hsien Lin: Conceptualization, Methodology, Software, Writing - review & editing, Formal analysis, Investigation. Wei-Jia Huang: Methodology, Software, Writing - original draft, Data curation, Investigation. Bing-Fei Wu: Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the Ministry of Science and Technology under Grant MOST 108-2638-E-009-001-MY2.

Chun-Hsien Lin was born in Taipei, Taiwan. He received the B.S. degree in electrical and computer engineering and the M.S. degree in electrical and control engineering from National Chiao Tung University (NCTU), Hsinchu, Taiwan, in 2015 and 2017, respectively, where he is currently pursuing the Ph.D. degree. His research interests include computer vision and machine learning, with a particular emphasis in deep face recognition, domain adaptation, and feature learning.

References (51)

  • M. He et al.

    Deformable face net for pose invariant face recognition

    Pattern Recogn.

    (2020)
  • J.-C. Chen et al.

    Unconstrained face verification using deep CNN features

  • W. Liu et al.

    SphereFace: Deep hypersphere embedding for face recognition

  • H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, W. Liu, CosFace: Large margin cosine loss for deep face...
  • J. Deng, J. Guo, N. Xue, S. Zafeiriou, ArcFace: Additive angular margin loss for deep face recognition, in: CVPR, 2019,...
  • B.F. Klare et al.

    Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A

  • I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative...
  • X. Yin et al.

    Disentangled representation learning GAN for pose-invariant face recognition

  • S. Zhang, Q. Miao, M. huang, X. Zhu, Y. Chen, Z. Lei, J. Wang, Pose-weighted GAN for photorealistic face...
  • L. Liu et al.

    Progressive pose normalization generative adversarial network for frontal face synthesis and face recognition under large pose

  • F. Liu et al.

    Joint face alignment and 3D face reconstruction with application to face recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2020)
  • J. Zhao, L. Xiong, Y. Cheng, Y. Cheng, J. Li, L. Zhou, Y. Xu, J. Karlekar, S. Pranata, S. Shen, J. Xing, S. Yan, J....
  • K. Lenc et al.

    Understanding image representations by measuring their equivariance and equivalence

  • K. Cao et al.

    Pose-robust face recognition via deep residual equivariant mapping

  • G.B. Huang et al.

    Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments, Technical Report

    (2007)
  • L. Wolf et al.

    Face recognition in unconstrained videos with matched background similarity

  • R. Gross, I. Matthews, J. Cohn, T. Kanade, S. Baker, Multi-PIE, in: ECCV,...
  • S. Sengupta et al.

    Frontal to profile face verification in the wild

  • P. Li, X. Wu, Y. Hu, R. He, Z. Sun, M2FPA: A multi-yaw multi-pitch high-quality dataset and benchmark for facial pose...
  • I. Gruber et al.

    Facing face recognition with ResNet: Round One

  • K. He et al.

    Deep residual learning for image recognition

  • S. Chen, Y. Liu, X. Gao, Z. Han, MobileFaceNets: Efficient CNNs for accurate real-time face verification on mobile...
  • X. Wu et al.

    A light CNN for deep face representation with noisy labels

    IEEE Trans. Inf. Forensics Secur.

    (2018)
  • Y. Zhong et al.

    Toward end-to-end face recognition through alignment learning

    IEEE Signal Process. Lett.

    (2017)
  • M. Jaderberg, K. Simonyan, A. Zisserman, K. Kavukcuoglu, Spatial transformer networks, in: NIPS,...
  • Cited by (0)

    Chun-Hsien Lin was born in Taipei, Taiwan. He received the B.S. degree in electrical and computer engineering and the M.S. degree in electrical and control engineering from National Chiao Tung University (NCTU), Hsinchu, Taiwan, in 2015 and 2017, respectively, where he is currently pursuing the Ph.D. degree. His research interests include computer vision and machine learning, with a particular emphasis in deep face recognition, domain adaptation, and feature learning.

    Wei-Jia Huang received the B.S. degree in electrical engineering from National Central University, and the M.S. degree from graduate degree program of robotics from National Chiao Tung University in 2020 and 2018 respectively. His research areas are face recognition, deep learning and image processing.

    Bing-Fei Wu received the Ph.D. degree in electrical engineering from the University of Southern California, Los Angeles, in 1992. Dr. Wu is a Chair Professor of the Department of Electrical and Computer Engineering, National Yang Ming Chiao Tung University in Taiwan. He is a Fellow of IEEE and serves as the President of Taiwan Association of System Science and Engineering and the Director of Control Engineering Program, Ministry of Science and Technology, Taiwan, both in 2019.

    View full text