Abstract
Deep learning algorithms have dramatically improved 2D face recognition, which makes 3D face recognition more promising, as 3D faces are well acknowledged to be more discriminative than 2D faces. Most deep learning algorithms for 3D face recognition are based on depth information. However, it is difficult to acquire enough depth data to meet the requirements by deep learning. There are many approaches used to reconstruct 3D faces from 2D faces, but depths derived from these 3D faces are too coarse to ensure a satisfying recognition. Therefore, this paper uses real 3D depth data to search for an optimal mapping from 2D faces to facial depths by a well-trained CycleGAN network. This can greatly assist 3D face recognition. To make the CycleGAN network more recognition-oriented, an identical cycle consistency loss is employed instead of the cycle consistency loss in CycleGAN, which typically takes the form of pixel losses. With two perceptual losses enforcing at both ends, CycleGAN is asked to preserve as much identity information as possible in both the forward and backward cycles. Furthermore, another perceptual loss is incorporated to ensure that the mapped depth can preserve the same identity as the corresponding real 3D depth. To increase the generalizability of the model in cases where the data are insufficient, this paper uses massive reconstructed depth data to pre-train the U-Nets therein. Then, the modified CycleGAN network is further trained with the ND-2006 dataset(13450 images) to finally obtain the optimal U-Net network. Extensive experiments are conducted on multiple datasets to show that the proposed method is practicable and effective.
Similar content being viewed by others
References
Bowyer K W, Chang K, Flynn P (2006) A survey of approaches and challenges in 3d and multi-modal 3d + 2d face recognition. Comput Vis Image Underst 101(1):1–15
Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: Additive angular margin loss for deep face recognition. 4685–4694. https://doi.org/10.1109/CVPR.2019.00482
Parkhi O M, Vedaldi A, Zisserman A (2015) Deep face recognition. In: Xianghua Xie M W J, Tam G K L (eds) Proceedings of the British Machine Vision Conference (BMVC). https://doi.org/10.5244/C.29.41. BMVA Press, pp 41.1–41.12
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. 815–823. https://doi.org/10.1109/CVPR.2015.7298682
Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: Closing the gap to human-level performance in face verification. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 1701–1708
Faltemier T C, Bowyer K W, Flynn P J (2007) Using a multi-instance enrollment representation to improve 3d face recognition. In: 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems, pp 1–6
Dou P, Shah S K, Kakadiaris I A (2017) End-to-end 3d face reconstruction with deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1503–1512
Feng Y, Wu F, Shao X, Wang Y, Zhou X (2018) Joint 3d face reconstruction and dense alignment with position map regression network. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision – ECCV 2018. Springer International Publishing, Cham, pp 557–574
Richardson E, Sela M, Kimmel R (2016) 3d face reconstruction by learning from synthetic data. 460–469. https://doi.org/10.1109/3DV.2016.56
Zulqarnain Gilani S, Mian A (2018) Learning from millions of 3d scans for large-scale 3d face recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1896–1905
Mu G, Huang D, Hu G, Sun J, Wang Y (2019) Led3d: A lightweight and efficient deep approach to recognizing low-quality 3d faces. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5766–5775
Gilani S Z, Mian A, Shafait F, Reid I (2018) Dense 3d face correspondence. IEEE Trans Pattern Anal Mach Intell 40(7):1584–1598. https://doi.org/10.1109/TPAMI.2017.2725279
Gilani S Z, Mian A, Eastwood P (2017) Deep, dense and accurate 3d face correspondence for generating population specific deformable models, vol 69. https://www.sciencedirect.com/science/article/pii/S0031320317301644
Gilani S Z, Mian A (2016) Towards large-scale 3d face recognition. In: 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp 1–8
Isola P, Zhu J, Zhou T, Efros A A (2017) Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5967–5976
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution, vol 9906, pp 694– 711
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778
Ronneberger O (2017) Invited talk: U-net convolutional networks for biomedical image segmentation. In: Maier-Hein K H, Deserno T M, Handels H, Tolxdorff T (eds) Bildverarbeitung für die Medizin 2017. Springer, Berlin, pp 3–3
Zhu J, Park T, Isola P, Efros A A (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. 2242–2251. https://doi.org/10.1109/ICCV.2017.244
J. C, Rhodes G, H. J, V. H (2011) The oxford handbook of face perception. https://doi.org/10.1093/oxfordhb/9780199559053.001.0001 https://doi.org/10.1093/oxfordhb/9780199559053.001.00 https://doi.org/10.1093/oxfordhb/9780199559053.001.0001
Zhou H, Liu J, Liu Z, Liu Y, Wang X (2020) Rotate-and-render: Unsupervised photorealistic face rotation from single-view images. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5910–5919
Blanz V, Vetter T, Rockwood A (2002) A morphable model for the synthesis of 3d faces. ACM siggraph, pp 187–194
Zhu X, Lei Z, Yan J, Yi D, Li S Z (2015) High-fidelity pose and expression normalization for face recognition in the wild. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 787–796
Huber P, Hu G, Tena R, Mortazavian P, Koppen P, Christmas W J, Ratsch M, Kittler J (2016) A multiresolution 3d morphable face model and fitting framework. In: Proceedings of the 11th international joint conference on computer vision, imaging and computer graphics theory and applications
Thies J, Zollhöfer M, Stamminger M, Theobalt C, Nießner M (2016) Face2face: Real-time face capture and reenactment of rgb videos. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2387–2395
Hou, Qiming, Zhou, Kun, Cao, Chen (2014) Displaced dynamic expression regression for real-time facial tracking and animation. Acm Trans Graph 33(4CD):1–10
Jeni L A, Cohn J F, Kanade T (2015) Dense 3d face alignment from 2d videos in real-time. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol 1, pp 1–8
Grewe C M, Zachow S (2016) Fully automated and highly accurate dense correspondence for facial surfaces. Springer International Publishing, Cham Hua G, Jégou H (eds)
Huber P, Feng Z, Christmas W, Kittler J, Rötsch M (2015) Fitting 3d morphable face models using local features. In: 2015 IEEE International Conference on Image Processing (ICIP), pp 1195–1199
Güler R A, Trigeorgis G, Antonakos E, Snape P, Zafeiriou S, Kokkinos I (2017) Densereg: Fully convolutional dense shape regression in-the-wild, pp 2614–2623. https://doi.org/10.1109/CVPR.2017.280
Yu R, Saito S, Li H, Ceylan D, Li H (2017) Learning dense facial correspondences in unconstrained images. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 4733–4742
Jourabloo A, Liu X (2016) Large-pose face alignment via cnn-based dense 3d model fitting, pp 4188–4196. https://doi.org/10.1109/CVPR.2016.454
Zhu X, Lei Z, Liu X, Shi H, Li S Z (2016) Face alignment across large poses: A 3d solution. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 146–155
Liu F, Zeng D, Zhao Q, Liu X (2016) Joint face alignment and 3d face reconstruction, pp 545–560
Blanz V, Vetter T (2003) Face recognition based on fitting a 3d morphable model. IEEE Trans Pattern Anal Mach Intell 25(9):1063–1074. https://doi.org/10.1109/TPAMI.2003.1227983
Jackson A S, Bulat A, Argyriou V, Tzimiropoulos G (2017) Large pose 3d face reconstruction from a single image via direct volumetric cnn regression. 1031–1039. https://doi.org/10.1109/ICCV.2017.117
Tran A T, Hassner T, Masi I, Medioni G (2017) Regressing robust and discriminative 3d morphable models with a very deep neural network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1493–1502
Huang G B, Mattar M, Berg T, Learned-Miller E (2008) Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments. In: Workshop on faces in ‘real-life’ images: detection, alignment, and recognition. https://hal.inria.fr/inria-00321923. Erik Learned-Miller and Andras Ferencz and Frédéric Jurie, Marseille
Wolf L, Hassner T, Maoz I (2011) Face recognition in unconstrained videos with matched background similarity. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 529–534
Klare B F, Klein B, Taborsky E, Blanton A, Cheney J, Allen K, Grother P, Mah A, Burge M, Jain A K (2015) Pushing the frontiers of unconstrained face detection and recognition: Iarpa janus benchmark a. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1931–1939
Liu F, Zhu R, Zeng D, Zhao Q, Liu X (2018) Disentangling features in 3d face shapes for joint face reconstruction and recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5216–5225
Jacobs C, Salesin D, Oliver N, Hertzmann A, Curless A B (2001) Image analogies. Proceedings of Siggraph, pp 327–340
Efros A A, Leung T K (1999) Texture synthesis by non-parametric sampling. In: Proceedings of the Seventh IEEE international conference on computer vision, vol 2, pp 1033–1038
Iizuka S, Simo-Serra E, Ishikawa H (2016) Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Trans Graph 35 (4):110.1–110.11
Larsson G, Maire M, Shakhnarovich G (2016) Learning representations for automatic colorization, vol 9908, pp 577–593
Zhang R, Isola P, Efros A (2016) Colorful image colorization 9907:649–666. https://doi.org/10.1007/978-3-319-46487-9_40
Mirza M, Osindero S (2014) Conditional generative adversarial nets
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proceedings of the 27th international conference on neural information processing systems - Volume 2, NIPS’14. MIT Press, Cambridge, pp 2672–2680
Huang X, Li Y, Poursaeed O, Hopcroft J, Belongie S (2017) Stacked generative adversarial networks, pp 1866–1875. https://doi.org/10.1109/CVPR.2017.202
Feng T, Gu D (2019) Sganvo: Unsupervised deep visual odometry and depth estimation with stacked generative adversarial networks. IEEE Robot Autom Lett 4(4):4431–4437. https://doi.org/10.1109/LRA.2019.2925555
Jung H, Kim Y, Min D, Oh C, Sohn K (2017) Depth prediction from a single image with conditional adversarial networks. In: 2017 IEEE International Conference on Image Processing (ICIP), pp 1717–1721
Lore K, Reddy K, Giering M, Bernal E A (2018) Generative adversarial networks for depth map estimation from rgb video. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). https://doi.org/10.1109/CVPRW.2018.00163. IEEE Computer Society, Los Alamitos, pp 1258–12588
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503. https://doi.org/10.1109/LSP.2016.2603342
Zhou X, Seibert H, Busch C, Funk W (2008) A 3d face recognition algorithm using histogram-based features. In: Proceedings of the 1st Eurographics conference on 3D Object Retrieval, pp 65–71
Williams J H (2016) Quantifying measurement. Morgan and Claypool Publishers, pp 2053–2571. https://doi.org/10.1088/978-1-6817-4433-9https://doi.org/10.1088/978-1-6817-4433-9
Yi D, Lei Z, Liao S, Li S Z (2014) Learning face representation from scratch. arXiv:1411.7923
Vijayan V, Bowyer K W, Flynn P J, Huang D, Chen L, Hansen M, Ocegueda O, Shah S K, Kakadiaris I A (2011) Twins 3d face recognition challenge. In: 2011 International Joint Conference on Biometrics (IJCB), pp 1–7
Zhong C, Sun Z, Tan T (2008) Learning efficient codes for 3d face recognition. In: 2008 15th IEEE international conference on image processing, pp 1928–1931
Savran A, Alyüz N, Dibeklioğlu H, Çeliktutan O, Gökberk B, Sankur B, Akarun L (2008) Bosphorus database for 3d face analysis. In: Schouten B, Juul N C, Drygajlo A, Tistarelli M (eds) Biometrics and Identity Management. Springer, Berlin, pp 47– 56
Gupta S, Markey M K, Bovik A C (2010) Anthropometric 3d face recognition. Int J Comput Vis 90(3):331–349
Guo J, Zhu X, Yang Y, Yang F, Lei Z, Li S Z (2020) Towards fast, accurate and stable 3d dense face alignment. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision – ECCV 2020. Springer International Publishing, Cham, pp 152–168
Acknowledgements
This paper is supported by the National Natural Science Foundation of China (Nos. U19B2040,11731013,11991022), the Strategic Priority Research Program of Chinese Academy of Sciences (Nos. XDA27010100, XDA27010302), the Fundamental Research Funds for the Central Universities and Shenzhen Magic Intelligence Technology Co., Ltd.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Hanqin Chen and Yao Yan contributed equally to this work.
Rights and permissions
About this article
Cite this article
Chen, H., Yan, Y., Qin, J. et al. Recognition-oriented facial depth estimation from a single image. Appl Intell 53, 1807–1825 (2023). https://doi.org/10.1007/s10489-022-03560-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03560-x