Deep representation alignment network for pose-invariant face recognition
Introduction
In the computer vision field, face recognition has played an important role for decades. Due to the complexity of the problem, the performance and flexibility are limited until the emergence of convolutional neural networks (CNNs). In recent years, although perfect frontal face recognition is possible in a controlled environment, in practice, the performance of face recognition is limited by uncontrollable factors such as illumination, expression, and large pose variations.
In this paper, we focus on the problem of pose variation. The corresponding research can be divided into improving the performance of stem CNN models and lessening the impact of pose variation. To enhance the performance of face recognition models, most methods adopt margin penalty, such as SphereFace [1], CosFace [2], and ArcFace [3] to learn a more discriminative feature space. Although state-of-the-art models trained using these loss functions do attain high accuracy on the IJB-A [4] benchmark, these models still result in mismatched pairs containing frontal and profile faces of the same identity.
As mentioned above, pose variation affects the performance of face recognition no matter how powerful the loss function is. Hence, researchers seek methods for pose-invariant face recognition that learn identity representations from arbitrary poses of the same person. Such pose-invariant face recognition methods can be accomplished from two aspects: face frontalization or pose-invariant representation learning.
The success in frontal face recognition suggests that face frontalization could be helpful for pose-invariant face recognition. As profile facial images include occluded face areas, resulting in a lack of face information, transforming or generating a frontal face from a profile face is difficult. Current methods are grouped into 2D image and 3D information methods. For the first group, 2D-based methods include generative adversarial networks (GANs) [5] which combine facial images and head pose information as inputs, using a generator which produces a synthesized frontal facial image by inputting an arbitrary pose facial image in an effort to mislead a discriminator, such as in disentangled representation learning GAN [6], pose-weighted GAN [7], and progressive pose normalization GAN [8]. When the generator has learned enough to fool the discriminator, the synthesized frontal facial image yielded by the generator can represent the same identity.
Three-dimensional-based methods, in turn, employ facial landmark detection and 3D morphable models (3DMMs) to reconstruct a 3D face model, after which the face models of arbitrary poses are transformed to frontal poses, such as in joint face alignment and 3D face reconstruction [9] and the 3D-aided deep pose-invariant face recognition model [10]. Nevertheless, as these methods suffer the loss of occluded textures, inpainting is usually employed to improve the synthesized facial image.
Pose-invariant representation learning is based on a common representation of the same identity from arbitrary poses. Hence, such methods take subspace mapping or pose variation disentanglement from identity representation approaches. Face frontalization differs from extracting pose-invariant representation in processing: face frontalization focuses on input image preprocessing, in contrast to the extraction of pose-invariant representation, which focuses on representation post-processing. The goal of the two methods is to convert arbitrary-pose facial images or representations into frontal images.
In this paper we propose the deep representation alignment network (DRA-Net), a representation alignment framework which incorporates a denoising autoencoder (DAE) and an innovative deep representation transformation (DRT) block to learn identity-preserving representations. We assume that frontal and profile deep representations are misaligned due to pose discrepancy. In [11], the notion of representation equivariance concerns the correspondence between input and representation. The main idea is that if an input image is produced using a geometric transformation, the representation will achieve a similar effect. Namely, we can use a function which reconstructs frontal facial images from arbitrary poses in the representation field. Due to the equivariance property, [12] speculates that the frontal and profile feature spaces have mapping connections and thus trains CNN models to learn the mapping function between features of arbitrary poses to the frontal feature space.
Inspired by these approaches, we formulate DRA-Net, which includes the DAE and DRT blocks to align a profile representation with a frontal representation. The objective of DAE, a carefully designed autoencoder, is to denoise pose-noisy features and recover occluded features. Likewise, the target of the DRT block, which is based on 2D image transformation, is to align profile and frontal representations. Based on the equivariance property, DRA-Net performs reconstruction and transformation on a deep representation, which is similar to performing these operation on an image. During training, DRA-Net learns the conversion between deep frontal and profile representations. Specifically, we use pairwise training and cosine loss to improve training stability. Additionally, we use singular value decomposition (SVD) in the DRT block to reduce the number of parameters and to prevent overfitting. In this work, DRA-Net achieves state-of-the-art performance on the LFW [13], YTF [14], Multi-PIE [15], CFP [16], IJB-A [4], and M2FPA [17] datasets.
Section snippets
Related work
In this section, we briefly review the related work on pose invariant face recognition.
Proposed method
In this section, we present the proposed DRA-Net, which learns the alignment between frontal and profile representations.
Experiments
We evaluate the proposed DRA-Net on six benchmarks for pose-invariant face recognition: (1) Labeled Faces in the Wild (LFW) [13], which contains different variations, most of which however are small head poses; (2) YouTube Faces Database (YTF) [14], consisting of different facial images in YouTube videos; (3) CMU Multi-PIE [15], a multi-view face recognition benchmark which includes large variations in illumination and expression, and there are two settings, setting 1 and setting 2, for
Conclusion
The proposed DRA-Net aligns deep face representations across different poses. The experimental results demonstrate the effectiveness of pose-invariant face recognition. To improve alignment robustness, we use DAE to recover occluded features and a DRT block to transform deep profile representations into frontal representations. Furthermore, we combine cosine loss and pairwise training to mitigate profile–frontal discrepancies. Deep face representation alignment is accomplished in a lightweight
CRediT authorship contribution statement
Chun-Hsien Lin: Conceptualization, Methodology, Software, Writing - review & editing, Formal analysis, Investigation. Wei-Jia Huang: Methodology, Software, Writing - original draft, Data curation, Investigation. Bing-Fei Wu: Supervision, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by the Ministry of Science and Technology under Grant MOST 108-2638-E-009-001-MY2.
Chun-Hsien Lin was born in Taipei, Taiwan. He received the B.S. degree in electrical and computer engineering and the M.S. degree in electrical and control engineering from National Chiao Tung University (NCTU), Hsinchu, Taiwan, in 2015 and 2017, respectively, where he is currently pursuing the Ph.D. degree. His research interests include computer vision and machine learning, with a particular emphasis in deep face recognition, domain adaptation, and feature learning.
References (51)
- et al.
Deformable face net for pose invariant face recognition
Pattern Recogn.
(2020) - et al.
Unconstrained face verification using deep CNN features
- et al.
SphereFace: Deep hypersphere embedding for face recognition
- H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, W. Liu, CosFace: Large margin cosine loss for deep face...
- J. Deng, J. Guo, N. Xue, S. Zafeiriou, ArcFace: Additive angular margin loss for deep face recognition, in: CVPR, 2019,...
- et al.
Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A
- I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative...
- et al.
Disentangled representation learning GAN for pose-invariant face recognition
- S. Zhang, Q. Miao, M. huang, X. Zhu, Y. Chen, Z. Lei, J. Wang, Pose-weighted GAN for photorealistic face...
- et al.
Progressive pose normalization generative adversarial network for frontal face synthesis and face recognition under large pose
Joint face alignment and 3D face reconstruction with application to face recognition
IEEE Trans. Pattern Anal. Mach. Intell.
Understanding image representations by measuring their equivariance and equivalence
Pose-robust face recognition via deep residual equivariant mapping
Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments, Technical Report
Face recognition in unconstrained videos with matched background similarity
Frontal to profile face verification in the wild
Facing face recognition with ResNet: Round One
Deep residual learning for image recognition
A light CNN for deep face representation with noisy labels
IEEE Trans. Inf. Forensics Secur.
Toward end-to-end face recognition through alignment learning
IEEE Signal Process. Lett.
Cited by (0)
Chun-Hsien Lin was born in Taipei, Taiwan. He received the B.S. degree in electrical and computer engineering and the M.S. degree in electrical and control engineering from National Chiao Tung University (NCTU), Hsinchu, Taiwan, in 2015 and 2017, respectively, where he is currently pursuing the Ph.D. degree. His research interests include computer vision and machine learning, with a particular emphasis in deep face recognition, domain adaptation, and feature learning.
Wei-Jia Huang received the B.S. degree in electrical engineering from National Central University, and the M.S. degree from graduate degree program of robotics from National Chiao Tung University in 2020 and 2018 respectively. His research areas are face recognition, deep learning and image processing.
Bing-Fei Wu received the Ph.D. degree in electrical engineering from the University of Southern California, Los Angeles, in 1992. Dr. Wu is a Chair Professor of the Department of Electrical and Computer Engineering, National Yang Ming Chiao Tung University in Taiwan. He is a Fellow of IEEE and serves as the President of Taiwan Association of System Science and Engineering and the Director of Control Engineering Program, Ministry of Science and Technology, Taiwan, both in 2019.