Skip to main content
Log in

Recognition-oriented facial depth estimation from a single image

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Deep learning algorithms have dramatically improved 2D face recognition, which makes 3D face recognition more promising, as 3D faces are well acknowledged to be more discriminative than 2D faces. Most deep learning algorithms for 3D face recognition are based on depth information. However, it is difficult to acquire enough depth data to meet the requirements by deep learning. There are many approaches used to reconstruct 3D faces from 2D faces, but depths derived from these 3D faces are too coarse to ensure a satisfying recognition. Therefore, this paper uses real 3D depth data to search for an optimal mapping from 2D faces to facial depths by a well-trained CycleGAN network. This can greatly assist 3D face recognition. To make the CycleGAN network more recognition-oriented, an identical cycle consistency loss is employed instead of the cycle consistency loss in CycleGAN, which typically takes the form of pixel losses. With two perceptual losses enforcing at both ends, CycleGAN is asked to preserve as much identity information as possible in both the forward and backward cycles. Furthermore, another perceptual loss is incorporated to ensure that the mapped depth can preserve the same identity as the corresponding real 3D depth. To increase the generalizability of the model in cases where the data are insufficient, this paper uses massive reconstructed depth data to pre-train the U-Nets therein. Then, the modified CycleGAN network is further trained with the ND-2006 dataset(13450 images) to finally obtain the optimal U-Net network. Extensive experiments are conducted on multiple datasets to show that the proposed method is practicable and effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Bowyer K W, Chang K, Flynn P (2006) A survey of approaches and challenges in 3d and multi-modal 3d + 2d face recognition. Comput Vis Image Underst 101(1):1–15

    Article  Google Scholar 

  2. Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: Additive angular margin loss for deep face recognition. 4685–4694. https://doi.org/10.1109/CVPR.2019.00482

  3. Parkhi O M, Vedaldi A, Zisserman A (2015) Deep face recognition. In: Xianghua Xie M W J, Tam G K L (eds) Proceedings of the British Machine Vision Conference (BMVC). https://doi.org/10.5244/C.29.41. BMVA Press, pp 41.1–41.12

  4. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. 815–823. https://doi.org/10.1109/CVPR.2015.7298682

  5. Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: Closing the gap to human-level performance in face verification. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 1701–1708

  6. Faltemier T C, Bowyer K W, Flynn P J (2007) Using a multi-instance enrollment representation to improve 3d face recognition. In: 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems, pp 1–6

  7. Dou P, Shah S K, Kakadiaris I A (2017) End-to-end 3d face reconstruction with deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1503–1512

  8. Feng Y, Wu F, Shao X, Wang Y, Zhou X (2018) Joint 3d face reconstruction and dense alignment with position map regression network. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision – ECCV 2018. Springer International Publishing, Cham, pp 557–574

  9. Richardson E, Sela M, Kimmel R (2016) 3d face reconstruction by learning from synthetic data. 460–469. https://doi.org/10.1109/3DV.2016.56

  10. Zulqarnain Gilani S, Mian A (2018) Learning from millions of 3d scans for large-scale 3d face recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1896–1905

  11. Mu G, Huang D, Hu G, Sun J, Wang Y (2019) Led3d: A lightweight and efficient deep approach to recognizing low-quality 3d faces. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5766–5775

  12. Gilani S Z, Mian A, Shafait F, Reid I (2018) Dense 3d face correspondence. IEEE Trans Pattern Anal Mach Intell 40(7):1584–1598. https://doi.org/10.1109/TPAMI.2017.2725279

    Article  Google Scholar 

  13. Gilani S Z, Mian A, Eastwood P (2017) Deep, dense and accurate 3d face correspondence for generating population specific deformable models, vol 69. https://www.sciencedirect.com/science/article/pii/S0031320317301644

  14. Gilani S Z, Mian A (2016) Towards large-scale 3d face recognition. In: 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp 1–8

  15. Isola P, Zhu J, Zhou T, Efros A A (2017) Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5967–5976

  16. Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution, vol 9906, pp 694– 711

  17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778

  18. Ronneberger O (2017) Invited talk: U-net convolutional networks for biomedical image segmentation. In: Maier-Hein K H, Deserno T M, Handels H, Tolxdorff T (eds) Bildverarbeitung für die Medizin 2017. Springer, Berlin, pp 3–3

  19. Zhu J, Park T, Isola P, Efros A A (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. 2242–2251. https://doi.org/10.1109/ICCV.2017.244

  20. J. C, Rhodes G, H. J, V. H (2011) The oxford handbook of face perception. https://doi.org/10.1093/oxfordhb/9780199559053.001.0001 https://doi.org/10.1093/oxfordhb/9780199559053.001.00 https://doi.org/10.1093/oxfordhb/9780199559053.001.0001

  21. Zhou H, Liu J, Liu Z, Liu Y, Wang X (2020) Rotate-and-render: Unsupervised photorealistic face rotation from single-view images. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5910–5919

  22. Blanz V, Vetter T, Rockwood A (2002) A morphable model for the synthesis of 3d faces. ACM siggraph, pp 187–194

  23. Zhu X, Lei Z, Yan J, Yi D, Li S Z (2015) High-fidelity pose and expression normalization for face recognition in the wild. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 787–796

  24. Huber P, Hu G, Tena R, Mortazavian P, Koppen P, Christmas W J, Ratsch M, Kittler J (2016) A multiresolution 3d morphable face model and fitting framework. In: Proceedings of the 11th international joint conference on computer vision, imaging and computer graphics theory and applications

  25. Thies J, Zollhöfer M, Stamminger M, Theobalt C, Nießner M (2016) Face2face: Real-time face capture and reenactment of rgb videos. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2387–2395

  26. Hou, Qiming, Zhou, Kun, Cao, Chen (2014) Displaced dynamic expression regression for real-time facial tracking and animation. Acm Trans Graph 33(4CD):1–10

    Google Scholar 

  27. Jeni L A, Cohn J F, Kanade T (2015) Dense 3d face alignment from 2d videos in real-time. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol 1, pp 1–8

  28. Grewe C M, Zachow S (2016) Fully automated and highly accurate dense correspondence for facial surfaces. Springer International Publishing, Cham Hua G, Jégou H (eds)

  29. Huber P, Feng Z, Christmas W, Kittler J, Rötsch M (2015) Fitting 3d morphable face models using local features. In: 2015 IEEE International Conference on Image Processing (ICIP), pp 1195–1199

  30. Güler R A, Trigeorgis G, Antonakos E, Snape P, Zafeiriou S, Kokkinos I (2017) Densereg: Fully convolutional dense shape regression in-the-wild, pp 2614–2623. https://doi.org/10.1109/CVPR.2017.280

  31. Yu R, Saito S, Li H, Ceylan D, Li H (2017) Learning dense facial correspondences in unconstrained images. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 4733–4742

  32. Jourabloo A, Liu X (2016) Large-pose face alignment via cnn-based dense 3d model fitting, pp 4188–4196. https://doi.org/10.1109/CVPR.2016.454

  33. Zhu X, Lei Z, Liu X, Shi H, Li S Z (2016) Face alignment across large poses: A 3d solution. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 146–155

  34. Liu F, Zeng D, Zhao Q, Liu X (2016) Joint face alignment and 3d face reconstruction, pp 545–560

  35. Blanz V, Vetter T (2003) Face recognition based on fitting a 3d morphable model. IEEE Trans Pattern Anal Mach Intell 25(9):1063–1074. https://doi.org/10.1109/TPAMI.2003.1227983

    Article  Google Scholar 

  36. Jackson A S, Bulat A, Argyriou V, Tzimiropoulos G (2017) Large pose 3d face reconstruction from a single image via direct volumetric cnn regression. 1031–1039. https://doi.org/10.1109/ICCV.2017.117

  37. Tran A T, Hassner T, Masi I, Medioni G (2017) Regressing robust and discriminative 3d morphable models with a very deep neural network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1493–1502

  38. Huang G B, Mattar M, Berg T, Learned-Miller E (2008) Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments. In: Workshop on faces in ‘real-life’ images: detection, alignment, and recognition. https://hal.inria.fr/inria-00321923. Erik Learned-Miller and Andras Ferencz and Frédéric Jurie, Marseille

  39. Wolf L, Hassner T, Maoz I (2011) Face recognition in unconstrained videos with matched background similarity. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 529–534

  40. Klare B F, Klein B, Taborsky E, Blanton A, Cheney J, Allen K, Grother P, Mah A, Burge M, Jain A K (2015) Pushing the frontiers of unconstrained face detection and recognition: Iarpa janus benchmark a. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1931–1939

  41. Liu F, Zhu R, Zeng D, Zhao Q, Liu X (2018) Disentangling features in 3d face shapes for joint face reconstruction and recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5216–5225

  42. Jacobs C, Salesin D, Oliver N, Hertzmann A, Curless A B (2001) Image analogies. Proceedings of Siggraph, pp 327–340

  43. Efros A A, Leung T K (1999) Texture synthesis by non-parametric sampling. In: Proceedings of the Seventh IEEE international conference on computer vision, vol 2, pp 1033–1038

  44. Iizuka S, Simo-Serra E, Ishikawa H (2016) Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Trans Graph 35 (4):110.1–110.11

    Article  Google Scholar 

  45. Larsson G, Maire M, Shakhnarovich G (2016) Learning representations for automatic colorization, vol 9908, pp 577–593

  46. Zhang R, Isola P, Efros A (2016) Colorful image colorization 9907:649–666. https://doi.org/10.1007/978-3-319-46487-9_40

  47. Mirza M, Osindero S (2014) Conditional generative adversarial nets

  48. Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proceedings of the 27th international conference on neural information processing systems - Volume 2, NIPS’14. MIT Press, Cambridge, pp 2672–2680

  49. Huang X, Li Y, Poursaeed O, Hopcroft J, Belongie S (2017) Stacked generative adversarial networks, pp 1866–1875. https://doi.org/10.1109/CVPR.2017.202

  50. Feng T, Gu D (2019) Sganvo: Unsupervised deep visual odometry and depth estimation with stacked generative adversarial networks. IEEE Robot Autom Lett 4(4):4431–4437. https://doi.org/10.1109/LRA.2019.2925555

    Article  Google Scholar 

  51. Jung H, Kim Y, Min D, Oh C, Sohn K (2017) Depth prediction from a single image with conditional adversarial networks. In: 2017 IEEE International Conference on Image Processing (ICIP), pp 1717–1721

  52. Lore K, Reddy K, Giering M, Bernal E A (2018) Generative adversarial networks for depth map estimation from rgb video. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). https://doi.org/10.1109/CVPRW.2018.00163. IEEE Computer Society, Los Alamitos, pp 1258–12588

  53. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503. https://doi.org/10.1109/LSP.2016.2603342

    Article  Google Scholar 

  54. Zhou X, Seibert H, Busch C, Funk W (2008) A 3d face recognition algorithm using histogram-based features. In: Proceedings of the 1st Eurographics conference on 3D Object Retrieval, pp 65–71

  55. Williams J H (2016) Quantifying measurement. Morgan and Claypool Publishers, pp 2053–2571. https://doi.org/10.1088/978-1-6817-4433-9https://doi.org/10.1088/978-1-6817-4433-9

  56. Yi D, Lei Z, Liao S, Li S Z (2014) Learning face representation from scratch. arXiv:1411.7923

  57. Vijayan V, Bowyer K W, Flynn P J, Huang D, Chen L, Hansen M, Ocegueda O, Shah S K, Kakadiaris I A (2011) Twins 3d face recognition challenge. In: 2011 International Joint Conference on Biometrics (IJCB), pp 1–7

  58. Zhong C, Sun Z, Tan T (2008) Learning efficient codes for 3d face recognition. In: 2008 15th IEEE international conference on image processing, pp 1928–1931

  59. Savran A, Alyüz N, Dibeklioğlu H, Çeliktutan O, Gökberk B, Sankur B, Akarun L (2008) Bosphorus database for 3d face analysis. In: Schouten B, Juul N C, Drygajlo A, Tistarelli M (eds) Biometrics and Identity Management. Springer, Berlin, pp 47– 56

  60. Gupta S, Markey M K, Bovik A C (2010) Anthropometric 3d face recognition. Int J Comput Vis 90(3):331–349

    Article  Google Scholar 

  61. Guo J, Zhu X, Yang Y, Yang F, Lei Z, Li S Z (2020) Towards fast, accurate and stable 3d dense face alignment. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision – ECCV 2020. Springer International Publishing, Cham, pp 152–168

Download references

Acknowledgements

This paper is supported by the National Natural Science Foundation of China (Nos. U19B2040,11731013,11991022), the Strategic Priority Research Program of Chinese Academy of Sciences (Nos. XDA27010100, XDA27010302), the Fundamental Research Funds for the Central Universities and Shenzhen Magic Intelligence Technology Co., Ltd.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tong Zhao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Hanqin Chen and Yao Yan contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, H., Yan, Y., Qin, J. et al. Recognition-oriented facial depth estimation from a single image. Appl Intell 53, 1807–1825 (2023). https://doi.org/10.1007/s10489-022-03560-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03560-x

Keywords

Navigation