Skip to main content
Log in

Learning-detailed 3D face reconstruction based on convolutional neural networks from a single image

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The efficiency of convolutional neural networks (CNNs) facilitates 3D face reconstruction, which takes a single image as an input and demonstrates significant performance in generating a detailed face geometry. The dependence of the extensive scale of labelled data works as a key to making CNN-based techniques significantly successful. However, no such datasets are publicly available that provide an across-the-board quantity of face images with correspondingly explained 3D face geometry. State-of-the-art learning-based 3D face reconstruction methods synthesize the training data by using a coarse morphable model of a face having non-photo-realistic synthesized face images. In this article, by using a learning-based inverse face rendering, we propose a novel data-generation technique by rendering a large number of face images that are photo-realistic and possess distinct properties. Based on the real-time fine-scale textured 3D face reconstruction comprising decently constructed datasets, we can train two cascaded CNNs in a coarse-to-fine manner. The networks are trained for actual detailed 3D face reconstruction from a single image. Experimental results demonstrate that the reconstruction of 3D face shapes with geometry details from only one input image can efficiently be performed by our method. Furthermore, the results demonstrate the efficiency of our technique to pose, expression and lighting dynamics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Both fine-scale and coarse-scale photo-realistic face image datasets will be publicly available once the present work is published.

  2. https://github.com/unibas-gravis/scalismo-faces.

  3. https://github.com/waps101/3DMMedges.

References

  1. Blanz V, Vetter T (2003) Face recognition based on fitting a 3d morphable model. IEEE Trans Pattern Anal Mach Intell 25(9):1063–1074

    Article  Google Scholar 

  2. Blanz V (2006) Face recognition based on a 3D morphable model. In: 7th International conference on automatic face and gesture recognition (FGR06), Southampton, pp. 617-624, https://doi.org/10.1109/FGR.2006.42.

  3. Ichim AE, Bouaziz S, Pauly M (2015) Dynamic 3d avatar creation from hand-held video input. ACM Trans Gr (ToG) 34(4):45

    Google Scholar 

  4. Thies J, Zollhofer M, Stamminger M, Theobalt C, Niessner M (2016) Face2face: real-time face capture and reenactment of rgb videos. In: IEEE conference on computer vision and pattern recognition 2387–2395

  5. Kemelmacher Shlizerman I, Basri R (2011) 3d face reconstruction from a single image using a single reference face shape. IEEE Trans Pattern Anal Mach Intell 33(2):394–405

    Article  Google Scholar 

  6. Richardson E, Sela M, OR-EL R, Kimmel R (2017) Learning detailed face reconstruction from a single image. In: IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, pp. 5553-5562, https://doi.org/10.1109/CVPR.2017.589

  7. Cao C, Weng Y, Zhou S, Tong Y, Zhou K (2014) Facewarehouse: a 3d facial expression database for visual computing. IEEE Trans Vis Comput Gr 20(3):413–425

    Article  Google Scholar 

  8. Aldrian O, Smith WA (2013) Inverse rendering of faces with a 3d morphable model. IEEE Trans Pattern Anal Mach Intell 35(5):1080–1093

    Article  Google Scholar 

  9. Zhang R, Tsai P-S, Cryer JE, Shah M (1999) Shapefrom-shading: a survey. IEEE Trans Pattern Anal Mach Intell 21(8):690–706

    Article  Google Scholar 

  10. Garrido P, Zollhöfer M, Casas D, Valgaerts L, Varanasi K, Pérez P, Theobalt C (2016) Reconstruction of personalized 3d face rigs from monocular video. ACM Trans Gr 35(3):28. https://doi.org/10.1145/2890493

    Article  Google Scholar 

  11. Prados E, Faugeras O (2006) Shape from shading. Handbook of mathematical models in computer vision. Springer, Berlin, pp 375–388

    MATH  Google Scholar 

  12. Shimshoni I, Moses Y, Lindenbaum M (2000) Shape reconstruction of 3d bilaterally symmetric surfaces. Int J Comput Vis 39(2):97–110

    Article  Google Scholar 

  13. Zhao WY, Chellappa R (2000) Illumination-insensitive face recognition using symmetric shape-from-shading. In: IEEE Conference on computer vision and pattern recognition (CVPR) 1:286–293

  14. Zhao WY, Chellappa R (2001) Symmetric shape-fromshading using self-ratio image. Int J Comput Vis 45(1):55–65

    Article  Google Scholar 

  15. STEWART GW (1993) On the early history of the singular value decomposition. SIAM Rev 35(4):551–566

    Article  MathSciNet  Google Scholar 

  16. Zhu X, Lei Z, Yan J, Yi D, Li SZ (2015) High-fidelity pose and expression normalization for face recognition in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 787-796

  17. Jourabloo A, Liu X (2016) Large-pose face alignment via cnn-based dense 3d model fitting. In: IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, pp. 4188-4196, https://doi.org/10.1109/CVPR.2016.454

  18. Feng L, Zeng D, Zhao Q, Liu X (2016) Joint face alignment and 3d face reconstruction. In: European conference on computer vision. Amsterdam, The Netherlands

  19. Zhu X, Lei Z, Liu X, Shi H, Li SZ (2016) Face alignment across large poses: a 3d solution. In: IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, pp. 146-155, https://doi.org/10.1109/CVPR.2016.23

  20. Amberg B, Blake A, Fitzgibbon A, Romdhani S, Vetter T (2007) Reconstructing high quality face-surfaces using model based stereo. In: IEEE 11th international conference on computer vision, Rio de Janeiro, pp. 1–8, https://doi.org/10.1109/ICCV.2007.4408998

  21. Dou P, Wu Y, Shah S, Kakadiaris I (2014) Robust 3d face shape reconstruction from single images via two-fold coupled structure learning and off-the-shelf landmark detectors. In: British machine vision conference. https://doi.org/10.5244/C.28.131

  22. Aldrian O, Smith W (2010) A linear approach of 3d face shape and texture recovery using a 3d morphable model. In: British machine vision conference. https://doi.org/10.5244/C.24.75

  23. Liu F, Zeng D, Li J, Zhao Q (2015) Cascaded regressor based 3d face reconstruction from a single arbitrary view image. In arXiv preprint arXiv:1509.06161

  24. Castelan M, Horebeek J V (2008) 3d face shape approximation from intensities using partial least squares. In: IEEE conference on computer vision and pattern recognition workshops, Anchorage, AK, pp. 1-8, https://doi.org/10.1109/CVPRW.2008.4563049

  25. Zhen L, Bai Q, He R, Li SZ (2008) Face shape recovery from a single image using cca mapping between tensor spaces. In: IEEE conference on computer vision and pattern recognition (CVPR), Anchorage, AK, pp. 1-7, https://doi.org/10.1109/CVPR.2008.4587341

  26. Richardson E, Sela M, Kimmel R (2016) 3d face reconstruction by learning from synthetic data. In: Fourth international conference on 3D vision (3DV), Stanford, CA, pp. 460-469, https://doi.org/10.1109/3DV.2016.56

  27. Cao C, Bradley D, Zhou K, Beeler T (2015) Real-time high-fidelity facial performance capture. ACM Trans Gr 34(4):46

    Article  Google Scholar 

  28. Cao C, Hou Q, Zhou K (2014) Displaced dynamic expression regression for real-time facial tracking and animation. ACM Trans Gr 33(4):43

    Google Scholar 

  29. Cao C, Weng Y, Lin S, Zhou K (2013) 3d shape regression for real-time facial animation. ACM Trans Gr 32, 4, Article 41, 10 pages. https://doi.org/10.1145/2461912.2462012

  30. Shi F, Wu H-T, Tong X, Chai J (2014) Automatic acquisition of high-fidelity facial performances using monocular videos. ACM Trans Gr 33(6):222

    Article  Google Scholar 

  31. Bas A, Smith WAP, Bolkart T, Wuhrer S (2016) Fitting a 3d morphable model to edges: a comparison between hard and soft correspondences. In: Asian conference on computer vision workshop on facial informatics (Taipei, Taiwan), vol. 10117, pp. 377–391

  32. SCHöBORN S, EGGER B, MOREL-FORSTER A, VETTER T (2017) Markov chain Monte Carlo for automated face image analysis. Int J Comput Vis 123:160C183. https://doi.org/10.1007/s11263-016-0967-5

    Article  MathSciNet  Google Scholar 

  33. Paysan P, Knothe R, Amberg B, Romdhani S, Vetter T (2009) A 3d face model for pose and illumination invariant face recognition. In: IEEE International conference on advanced video and signal based surveillance, Genova, pp. 296-301, https://doi.org/10.1109/AVSS.2009.58

  34. Ramamoorthi R, Hanrahan P (2001) An efficient representation for irradiance environment maps. In: 28th annual conference on Computer graphics and interactive techniques (SIGGRAPH ’01). Association for Computing Machinery, New York, NY, USA, 497C500. https://doi.org/10.1145/383259.383317

  35. Blanz V, Vetter TA (1999) Morphable model for the synthesis of 3D faces. In: 26th annual conference on Computer graphics and interactive techniques (SIGGRAPH ’99). ACM Press/Addison-Wesley Publishing Co., USA, 187C194. https://doi.org/10.1145/311535.311556

  36. Chartrand R, Yin W (2008) Iteratively reweighted algorithms for compressive sensing. In; IEEE international conference on acoustics, speech and signal processing, Las Vegas, NV, pp. 3869–3872, https://doi.org/10.1109/ICASSP.2008.4518498

  37. Gross R, Matthews I, Cohn J, Kanade T, Baker S (2010) Multi-pie. Image Vis Comput 28(5):807–813

    Article  Google Scholar 

  38. Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: IEEE international conference on computer vision workshops, Sydney, NSW, pp. 397–403, https://doi.org/10.1109/ICCVW.2013.59

  39. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, pp. 770–778, https://doi.org/10.1109/CVPR.2016.90

  40. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer- assisted intervention, Springer, pp. 234–241

  41. Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. In: MWJ Xianghua Xie, Tam GKL (Eds.) British Machine Vision Conference (BMVC), BMVA Press, pp. 41.1–41.12

  42. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: 22Nd ACM international conference on multimedia (New York, NY, USA), (MM 14), ACM, pp. 675–678. 10.1145/2647868.2654889

  43. Guo Y, Zhang J, Cai J, Jiang B, Zheng J (2019) Cnnbased real-time dense face reconstruction with inverse-rendered photo-realistic face images. IEEE Trans Pattern Anal Mach Intell 41(6):1294–1307. https://doi.org/10.1109/TPAMI.2018.2837742

    Article  Google Scholar 

  44. Egger B, Schborn S, Schneider A, Kortylewski A, Morel-Forster A, Blumer C, Vetter T (2018) Occlusion-aware 3d morphable models and an illumination prior for face image analysis. Int J Comput Vis 126:1269C1287. https://doi.org/10.1007/s11263-018-1064-8

    Article  Google Scholar 

  45. Garrido P, Valgaerts L, Wu C, Theobalt C (2013) Reconstructing detailed dynamic face geometry from monocular video. ACM Trans Gr 32, 6, Article 158, 10 pages. 10.1145/2508363.2508380

  46. Kim H, Zollhöer M, Tewari A, Thies J, Richardt C, Theobalt C (2018) Inversefacenet: deep single-shot inverse face rendering from a single image. In: IEEE conference on computer vision and pattern recognition

  47. Phillips PJ, Flynn PJ, Scruggs T, Bowyer KW, Chang J, Hoffman K, Marques J, Min J, Worek W (2005) Overview of the face recognition grand challenge. In: IEEE conference on computer vision and pattern recognition (CVPR’05), San Diego, CA, USA, pp. 947–954 vol. 1, https://doi.org/10.1109/CVPR.2005.268

  48. Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: IEEE conference on computer vision and pattern recognition, Columbus, OH, pp. 1867–1874, https://doi.org/10.1109/CVPR.2014.241

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for a careful reading of this article, and for all their comments, which led to a number of improvements in the article. S. Hayat and A. Ullah is supported by the Higher Education Commission, Pakistan under grant number 20-11682/NRPU/RGM/R&D/HEC/2020. A. Khan was supported by the National Natural Science Foundation of China (Grant No. 61772164).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Asad Khan or Sakander Hayat.

Ethics declarations

Conflict of Interests

The authors declare that there are no conflict of interests regarding the publication of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khan, A., Hayat, S., Ahmad, M. et al. Learning-detailed 3D face reconstruction based on convolutional neural networks from a single image. Neural Comput & Applic 33, 5951–5964 (2021). https://doi.org/10.1007/s00521-020-05373-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05373-w

Keywords

Navigation