Abstract
The efficiency of convolutional neural networks (CNNs) facilitates 3D face reconstruction, which takes a single image as an input and demonstrates significant performance in generating a detailed face geometry. The dependence of the extensive scale of labelled data works as a key to making CNN-based techniques significantly successful. However, no such datasets are publicly available that provide an across-the-board quantity of face images with correspondingly explained 3D face geometry. State-of-the-art learning-based 3D face reconstruction methods synthesize the training data by using a coarse morphable model of a face having non-photo-realistic synthesized face images. In this article, by using a learning-based inverse face rendering, we propose a novel data-generation technique by rendering a large number of face images that are photo-realistic and possess distinct properties. Based on the real-time fine-scale textured 3D face reconstruction comprising decently constructed datasets, we can train two cascaded CNNs in a coarse-to-fine manner. The networks are trained for actual detailed 3D face reconstruction from a single image. Experimental results demonstrate that the reconstruction of 3D face shapes with geometry details from only one input image can efficiently be performed by our method. Furthermore, the results demonstrate the efficiency of our technique to pose, expression and lighting dynamics.
Similar content being viewed by others
Notes
Both fine-scale and coarse-scale photo-realistic face image datasets will be publicly available once the present work is published.
References
Blanz V, Vetter T (2003) Face recognition based on fitting a 3d morphable model. IEEE Trans Pattern Anal Mach Intell 25(9):1063–1074
Blanz V (2006) Face recognition based on a 3D morphable model. In: 7th International conference on automatic face and gesture recognition (FGR06), Southampton, pp. 617-624, https://doi.org/10.1109/FGR.2006.42.
Ichim AE, Bouaziz S, Pauly M (2015) Dynamic 3d avatar creation from hand-held video input. ACM Trans Gr (ToG) 34(4):45
Thies J, Zollhofer M, Stamminger M, Theobalt C, Niessner M (2016) Face2face: real-time face capture and reenactment of rgb videos. In: IEEE conference on computer vision and pattern recognition 2387–2395
Kemelmacher Shlizerman I, Basri R (2011) 3d face reconstruction from a single image using a single reference face shape. IEEE Trans Pattern Anal Mach Intell 33(2):394–405
Richardson E, Sela M, OR-EL R, Kimmel R (2017) Learning detailed face reconstruction from a single image. In: IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, pp. 5553-5562, https://doi.org/10.1109/CVPR.2017.589
Cao C, Weng Y, Zhou S, Tong Y, Zhou K (2014) Facewarehouse: a 3d facial expression database for visual computing. IEEE Trans Vis Comput Gr 20(3):413–425
Aldrian O, Smith WA (2013) Inverse rendering of faces with a 3d morphable model. IEEE Trans Pattern Anal Mach Intell 35(5):1080–1093
Zhang R, Tsai P-S, Cryer JE, Shah M (1999) Shapefrom-shading: a survey. IEEE Trans Pattern Anal Mach Intell 21(8):690–706
Garrido P, Zollhöfer M, Casas D, Valgaerts L, Varanasi K, Pérez P, Theobalt C (2016) Reconstruction of personalized 3d face rigs from monocular video. ACM Trans Gr 35(3):28. https://doi.org/10.1145/2890493
Prados E, Faugeras O (2006) Shape from shading. Handbook of mathematical models in computer vision. Springer, Berlin, pp 375–388
Shimshoni I, Moses Y, Lindenbaum M (2000) Shape reconstruction of 3d bilaterally symmetric surfaces. Int J Comput Vis 39(2):97–110
Zhao WY, Chellappa R (2000) Illumination-insensitive face recognition using symmetric shape-from-shading. In: IEEE Conference on computer vision and pattern recognition (CVPR) 1:286–293
Zhao WY, Chellappa R (2001) Symmetric shape-fromshading using self-ratio image. Int J Comput Vis 45(1):55–65
STEWART GW (1993) On the early history of the singular value decomposition. SIAM Rev 35(4):551–566
Zhu X, Lei Z, Yan J, Yi D, Li SZ (2015) High-fidelity pose and expression normalization for face recognition in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 787-796
Jourabloo A, Liu X (2016) Large-pose face alignment via cnn-based dense 3d model fitting. In: IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, pp. 4188-4196, https://doi.org/10.1109/CVPR.2016.454
Feng L, Zeng D, Zhao Q, Liu X (2016) Joint face alignment and 3d face reconstruction. In: European conference on computer vision. Amsterdam, The Netherlands
Zhu X, Lei Z, Liu X, Shi H, Li SZ (2016) Face alignment across large poses: a 3d solution. In: IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, pp. 146-155, https://doi.org/10.1109/CVPR.2016.23
Amberg B, Blake A, Fitzgibbon A, Romdhani S, Vetter T (2007) Reconstructing high quality face-surfaces using model based stereo. In: IEEE 11th international conference on computer vision, Rio de Janeiro, pp. 1–8, https://doi.org/10.1109/ICCV.2007.4408998
Dou P, Wu Y, Shah S, Kakadiaris I (2014) Robust 3d face shape reconstruction from single images via two-fold coupled structure learning and off-the-shelf landmark detectors. In: British machine vision conference. https://doi.org/10.5244/C.28.131
Aldrian O, Smith W (2010) A linear approach of 3d face shape and texture recovery using a 3d morphable model. In: British machine vision conference. https://doi.org/10.5244/C.24.75
Liu F, Zeng D, Li J, Zhao Q (2015) Cascaded regressor based 3d face reconstruction from a single arbitrary view image. In arXiv preprint arXiv:1509.06161
Castelan M, Horebeek J V (2008) 3d face shape approximation from intensities using partial least squares. In: IEEE conference on computer vision and pattern recognition workshops, Anchorage, AK, pp. 1-8, https://doi.org/10.1109/CVPRW.2008.4563049
Zhen L, Bai Q, He R, Li SZ (2008) Face shape recovery from a single image using cca mapping between tensor spaces. In: IEEE conference on computer vision and pattern recognition (CVPR), Anchorage, AK, pp. 1-7, https://doi.org/10.1109/CVPR.2008.4587341
Richardson E, Sela M, Kimmel R (2016) 3d face reconstruction by learning from synthetic data. In: Fourth international conference on 3D vision (3DV), Stanford, CA, pp. 460-469, https://doi.org/10.1109/3DV.2016.56
Cao C, Bradley D, Zhou K, Beeler T (2015) Real-time high-fidelity facial performance capture. ACM Trans Gr 34(4):46
Cao C, Hou Q, Zhou K (2014) Displaced dynamic expression regression for real-time facial tracking and animation. ACM Trans Gr 33(4):43
Cao C, Weng Y, Lin S, Zhou K (2013) 3d shape regression for real-time facial animation. ACM Trans Gr 32, 4, Article 41, 10 pages. https://doi.org/10.1145/2461912.2462012
Shi F, Wu H-T, Tong X, Chai J (2014) Automatic acquisition of high-fidelity facial performances using monocular videos. ACM Trans Gr 33(6):222
Bas A, Smith WAP, Bolkart T, Wuhrer S (2016) Fitting a 3d morphable model to edges: a comparison between hard and soft correspondences. In: Asian conference on computer vision workshop on facial informatics (Taipei, Taiwan), vol. 10117, pp. 377–391
SCHöBORN S, EGGER B, MOREL-FORSTER A, VETTER T (2017) Markov chain Monte Carlo for automated face image analysis. Int J Comput Vis 123:160C183. https://doi.org/10.1007/s11263-016-0967-5
Paysan P, Knothe R, Amberg B, Romdhani S, Vetter T (2009) A 3d face model for pose and illumination invariant face recognition. In: IEEE International conference on advanced video and signal based surveillance, Genova, pp. 296-301, https://doi.org/10.1109/AVSS.2009.58
Ramamoorthi R, Hanrahan P (2001) An efficient representation for irradiance environment maps. In: 28th annual conference on Computer graphics and interactive techniques (SIGGRAPH ’01). Association for Computing Machinery, New York, NY, USA, 497C500. https://doi.org/10.1145/383259.383317
Blanz V, Vetter TA (1999) Morphable model for the synthesis of 3D faces. In: 26th annual conference on Computer graphics and interactive techniques (SIGGRAPH ’99). ACM Press/Addison-Wesley Publishing Co., USA, 187C194. https://doi.org/10.1145/311535.311556
Chartrand R, Yin W (2008) Iteratively reweighted algorithms for compressive sensing. In; IEEE international conference on acoustics, speech and signal processing, Las Vegas, NV, pp. 3869–3872, https://doi.org/10.1109/ICASSP.2008.4518498
Gross R, Matthews I, Cohn J, Kanade T, Baker S (2010) Multi-pie. Image Vis Comput 28(5):807–813
Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: IEEE international conference on computer vision workshops, Sydney, NSW, pp. 397–403, https://doi.org/10.1109/ICCVW.2013.59
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, pp. 770–778, https://doi.org/10.1109/CVPR.2016.90
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer- assisted intervention, Springer, pp. 234–241
Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. In: MWJ Xianghua Xie, Tam GKL (Eds.) British Machine Vision Conference (BMVC), BMVA Press, pp. 41.1–41.12
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: 22Nd ACM international conference on multimedia (New York, NY, USA), (MM 14), ACM, pp. 675–678. 10.1145/2647868.2654889
Guo Y, Zhang J, Cai J, Jiang B, Zheng J (2019) Cnnbased real-time dense face reconstruction with inverse-rendered photo-realistic face images. IEEE Trans Pattern Anal Mach Intell 41(6):1294–1307. https://doi.org/10.1109/TPAMI.2018.2837742
Egger B, Schborn S, Schneider A, Kortylewski A, Morel-Forster A, Blumer C, Vetter T (2018) Occlusion-aware 3d morphable models and an illumination prior for face image analysis. Int J Comput Vis 126:1269C1287. https://doi.org/10.1007/s11263-018-1064-8
Garrido P, Valgaerts L, Wu C, Theobalt C (2013) Reconstructing detailed dynamic face geometry from monocular video. ACM Trans Gr 32, 6, Article 158, 10 pages. 10.1145/2508363.2508380
Kim H, Zollhöer M, Tewari A, Thies J, Richardt C, Theobalt C (2018) Inversefacenet: deep single-shot inverse face rendering from a single image. In: IEEE conference on computer vision and pattern recognition
Phillips PJ, Flynn PJ, Scruggs T, Bowyer KW, Chang J, Hoffman K, Marques J, Min J, Worek W (2005) Overview of the face recognition grand challenge. In: IEEE conference on computer vision and pattern recognition (CVPR’05), San Diego, CA, USA, pp. 947–954 vol. 1, https://doi.org/10.1109/CVPR.2005.268
Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: IEEE conference on computer vision and pattern recognition, Columbus, OH, pp. 1867–1874, https://doi.org/10.1109/CVPR.2014.241
Acknowledgements
The authors would like to thank the anonymous reviewers for a careful reading of this article, and for all their comments, which led to a number of improvements in the article. S. Hayat and A. Ullah is supported by the Higher Education Commission, Pakistan under grant number 20-11682/NRPU/RGM/R&D/HEC/2020. A. Khan was supported by the National Natural Science Foundation of China (Grant No. 61772164).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of Interests
The authors declare that there are no conflict of interests regarding the publication of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Khan, A., Hayat, S., Ahmad, M. et al. Learning-detailed 3D face reconstruction based on convolutional neural networks from a single image. Neural Comput & Applic 33, 5951–5964 (2021). https://doi.org/10.1007/s00521-020-05373-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05373-w