skip to main content
10.1145/3503161.3547800acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Public Access

End-to-End 3D Face Reconstruction with Expressions and Specular Albedos from Single In-the-wild Images

Published:10 October 2022Publication History

ABSTRACT

Recovering 3D face models from in-the-wild face images has numerous potential applications. However, properly modeling complex lighting effects in reality, including specular lighting, shadows, and occlusions, from a single in-the-wild face image is still considered as a widely open research challenge. In this paper, we propose a convolutional neural network based framework to regress the face model from a single image in the wild. The outputted face model includes dense 3D shape, head pose, expression, diffuse albedo, specular albedo, and the corresponding lighting conditions. Our approach uses novel hybrid loss functions to disentangle face shape identities, expressions, poses, albedos, and lighting. Besides a carefully-designed ablation study, we also conduct direct comparison experiments to show that our method can outperform state-of-art methods both quantitatively and qualitatively.

Skip Supplemental Material Section

Supplemental Material

References

  1. Andrew D. Bagdanov, Alberto Del Bimbo, and Iacopo Masi. 2011. The Florence 2D/3D Hybrid Face Dataset. In Proceedings of the 2011 Joint ACM Workshop on Human Gesture and Behavior Understanding (Scottsdale, Arizona, USA) (J-HGBU '11). ACM, New York, NY, USA, 79--80. https://doi.org/10.1145/2072572.2072597Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Volker Blanz, Curzio Basso, Tomaso Poggio, and Thomas Vetter. 2003. Reanimating faces in images and video. Computer graphics forum 22, 3 (2003), 641--650.Google ScholarGoogle Scholar
  3. V. Blanz and T. Vetter. 2003. Face recognition based on fitting a 3D morphable model. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 9 (2003), 1063--1074. https://doi.org/10.1109/TPAMI.2003.1227983Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Adrian Bulat and Georgios Tzimiropoulos. 2017. How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks). In International Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  5. Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2014. Face- Warehouse: A 3D Facial Expression Database for Visual Computing. IEEE Transactions on Visualization and Computer Graphics 20, 3 (2014), 413--425. https://doi.org/10.1109/TVCG.2013.249Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Qixin Deng, Luming Ma, Aobo Jin, Huikun Bi, Binh Huy Le, and Zhigang Deng. 2021. Plausible 3D face wrinkle generation using variational autoencoders. IEEE Transactions on Visualization and Computer Graphics (2021).Google ScholarGoogle Scholar
  7. Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. 2019. Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0--0.Google ScholarGoogle ScholarCross RefCross Ref
  8. Zhigang Deng and Ulrich Neumann. 2008. Data-driven 3D facial animation. Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Zhigang Deng and Ulrich Neumann. 2008. Expressive Speech Animation Synthesis with Phoneme-Level Controls. Computer Graphics Forum 27, 8 (2008), 2096--2113.Google ScholarGoogle ScholarCross RefCross Ref
  10. Pengfei Dou, Shishir K Shah, and Ioannis A Kakadiaris. 2017. End-to-end 3D face reconstruction with deep neural networks. In proceedings of the IEEE conference on computer vision and pattern recognition. 5908--5917.Google ScholarGoogle ScholarCross RefCross Ref
  11. Bernhard Egger, William A. P. Smith, Ayush Tewari, Stefanie Wuhrer, Michael Zollhöfer, Thabo Beeler, Florian Bernard, Timo Bolkart, Adam Kortylewski, Sami Romdhani, Christian Theobalt, Volker Blanz, and Thomas Vetter. 2019. 3D Morphable Face Models - Past, Present and Future. CoRR abs/1909.01815 (2019). arXiv:1909.01815 http://arxiv.org/abs/1909.01815Google ScholarGoogle Scholar
  12. Yao Feng, Haiwen Feng, Michael J. Black, and Timo Bolkart. 2021. Learning an Animatable Detailed 3D Face Model from In-the-Wild Images. ACM Trans. Graph. 40, 4, Article 88 (July 2021), 13 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yao Feng, Fan Wu, Xiaohu Shao, Yanfeng Wang, and Xi Zhou. 2018. Joint 3d face reconstruction and dense alignment with position map regression network. In Proceedings of the European Conference on Computer Vision (ECCV). 534--551.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ohad Fried, Eli Shechtman, Dan B Goldman, and Adam Finkelstein. 2016. Perspective-aware manipulation of portrait photos. ACM Transactions on Graphics (TOG) 35, 4 (2016), 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Pablo Garrido, Michael Zollhöfer, Dan Casas, Levi Valgaerts, Kiran Varanasi, Patrick Pérez, and Christian Theobalt. 2016. Reconstruction of personalized 3D face rigs from monocular video. ACM Transactions on Graphics (TOG) 35, 3 (2016), 1--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Baris Gecer, Stylianos Ploumpis, Irene Kotsia, and Stefanos Zafeiriou. 2019. Ganfit: Generative adversarial network fitting for high fidelity 3d face reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1155--1164.Google ScholarGoogle ScholarCross RefCross Ref
  17. Kyle Genova, Forrester Cole, Aaron Maschinot, Aaron Sarna, Daniel Vlasic, and William T Freeman. 2018. Unsupervised training for 3d morphable model regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8377--8386.Google ScholarGoogle ScholarCross RefCross Ref
  18. Thomas Gerig, Andreas Morel-Forster, Clemens Blumer, Bernhard Egger, Marcel Luthi, Sandro Schoenborn, and Thomas Vetter. 2018. Morphable Face Models - An Open Framework. In 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018). 75--82. https://doi.org/10.1109/FG.2018.00021Google ScholarGoogle Scholar
  19. Syed Zulqarnain Gilani and Ajmal Mian. 2018. Learning from millions of 3D scans for large-scale 3D face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1896--1905.Google ScholarGoogle Scholar
  20. Jianzhu Guo, Xiangyu Zhu, Yang Yang, Fan Yang, Zhen Lei, and Stan Z Li. 2020. Towards fast, accurate and stable 3d dense face alignment. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XIX 16. Springer, 152--168.Google ScholarGoogle Scholar
  21. Yandong Guo, Lei Zhang, Yuxiao Hu, X. He, and Jianfeng Gao. 2016. MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition. In ECCV.Google ScholarGoogle Scholar
  22. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  23. Aaron S Jackson, Adrian Bulat, Vasileios Argyriou, and Georgios Tzimiropoulos. 2017. Large pose 3D face reconstruction from a single image via direct volumetric CNN regression. In Proceedings of the IEEE International Conference on Computer Vision. 1031--1039.Google ScholarGoogle ScholarCross RefCross Ref
  24. Alexandros Lattas, Stylianos Moschoglou, Baris Gecer, Stylianos Ploumpis, Vasileios Triantafyllou, Abhijeet Ghosh, and Stefanos Zafeiriou. 2020. AvatarMe: Realistically Renderable 3D Facial Reconstruction "In-the-Wild". In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  25. Alexandros Lattas, Stylianos Moschoglou, Stylianos Ploumpis, Baris Gecer, Abhijeet Ghosh, and Stefanos P Zafeiriou. 2021. AvatarMe: Facial Shape and BRDF Inference with Photorealistic Rendering-Aware GANs. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).Google ScholarGoogle Scholar
  26. Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. 2020. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  27. Martin D Levine and Yingfeng Chris Yu. 2009. State-of-the-art of 3D facial reconstruction methods for face recognition based on a single 2D training image per person. Pattern Recognition Letters 30, 10 (2009), 908--913.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. John P Lewis, Ken Anjyo, Taehyun Rhee, Mengjie Zhang, Frederic H Pighin, and Zhigang Deng. 2014. Practice and theory of blendshape facial models. Eurographics (State of the Art Reports) 1, 8 (2014), 2.Google ScholarGoogle Scholar
  29. Tianye Li, Timo Bolkart, Michael. J. Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia) 36, 6 (2017), 194:1--194:17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ziwei Liu, Ping Luo, XiaogangWang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Luming Ma and Zhigang Deng. 2019. Real-Time Facial Expression Transformation for Monocular RGB Video. Computer Graphics Forum 38, 1 (2019), 470--481.Google ScholarGoogle ScholarCross RefCross Ref
  32. Luming Ma and Zhigang Deng. 2019. Real-time hierarchical facial performance capture. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games. 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yuval Nirkin, Iacopo Masi, Anh Tran Tuan, Tal Hassner, and Gerard Medioni. 2018. On face segmentation, face swapping, and face perception. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 98--105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kyle Olszewski, Joseph J Lim, Shunsuke Saito, and Hao Li. 2016. High-fidelity facial and speech animation for VR HMDs. ACM Transactions on Graphics (TOG) 35, 6 (2016), 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Pascal Paysan, Reinhard Knothe, Brian Amberg, Sami Romdhani, and Thomas Vetter. 2009. A 3D face model for pose and illumination invariant face recognition. In 2009 sixth IEEE international conference on advanced video and signal based surveillance. Ieee, 296--301.Google ScholarGoogle Scholar
  36. Ravi Ramamoorthi and Pat Hanrahan. 2001. An efficient representation for irradiance environment maps. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques. 497--500.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ravi Ramamoorthi and Pat Hanrahan. 2001. A signal-processing framework for inverse rendering. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques. 117--128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Elad Richardson, Matan Sela, Roy Or-El, and Ron Kimmel. 2017. Learning detailed face reconstruction from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1259--1268.Google ScholarGoogle ScholarCross RefCross Ref
  39. Sami Romdhani and Thomas Vetter. 2005. Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), Vol. 2. IEEE, 986--993.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Christos Sagonas, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2013. 300 faces in-the-wild challenge: The first facial landmark localization challenge. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 397--403.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Soubhik Sanyal, Timo Bolkart, Haiwen Feng, and Michael J Black. 2019. Learning to regress 3D face shape and expression from an image without 3D supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7763--7772.Google ScholarGoogle ScholarCross RefCross Ref
  42. Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 815--823.Google ScholarGoogle ScholarCross RefCross Ref
  43. Matan Sela, Elad Richardson, and Ron Kimmel. 2017. Unrestricted facial geometry reconstruction using image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision. 1576--1585.Google ScholarGoogle ScholarCross RefCross Ref
  44. Jiaxiang Shang, Tianwei Shen, Shiwei Li, Lei Zhou, Mingmin Zhen, Tian Fang, and Long Quan. 2020. Self-supervised monocular 3d face reconstruction by occlusion-aware multi-view geometry consistency. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XV 16. Springer, 53--70.Google ScholarGoogle Scholar
  45. William AP Smith, Alassane Seck, Hannah Dee, Bernard Tiddeman, Joshua B Tenenbaum, and Bernhard Egger. 2020. A morphable face albedo model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5011--5020.Google ScholarGoogle ScholarCross RefCross Ref
  46. Ayush Tewari, Michael Zollhofer, Hyeongwoo Kim, Pablo Garrido, Florian Bernard, Patrick Perez, and Christian Theobalt. 2017. Mofa: Model-based deepconvolutional face autoencoder for unsupervised monocular reconstruction. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 1274--1283.Google ScholarGoogle Scholar
  47. Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2387--2395.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Anh Tuan Tran, Tal Hassner, Iacopo Masi, and Gérard Medioni. 2017. Regressing robust and discriminative 3D morphable models with a very deep neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5163--5172.Google ScholarGoogle ScholarCross RefCross Ref
  49. Huawei Wei, Shuang Liang, and Yichen Wei. 2019. 3d dense face alignment via graph convolution networks. arXiv preprint arXiv:1904.05562 (2019).Google ScholarGoogle Scholar
  50. Haotian Yang, Hao Zhu, Yanru Wang, Mingkai Huang, Qiu Shen, Ruigang Yang, and Xun Cao. 2020. FaceScape: A Large-Scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  51. Jie Zhang, Shiguang Shan, Meina Kan, and Xilin Chen. 2014. Coarse-to-fine autoencoder networks (cfan) for real-time face alignment. In European conference on computer vision. Springer, 1--16.Google ScholarGoogle ScholarCross RefCross Ref
  52. Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499--1503.Google ScholarGoogle ScholarCross RefCross Ref
  53. Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang. 2018. Unet: A nested u-net architecture for medical image segmentation. In Deep learning in medical image analysis and multimodal learning for clinical decision support. Springer, 3--11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Xiangyu Zhu, Zhen Lei, Xiaoming Liu, Hailin Shi, and Stan Z Li. 2016. Face alignment across large poses: A 3d solution. In Proceedings of the IEEE conference on computer vision and pattern recognition. 146--155.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. End-to-End 3D Face Reconstruction with Expressions and Specular Albedos from Single In-the-wild Images

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '22: Proceedings of the 30th ACM International Conference on Multimedia
        October 2022
        7537 pages
        ISBN:9781450392037
        DOI:10.1145/3503161

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 October 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia
      • Article Metrics

        • Downloads (Last 12 months)110
        • Downloads (Last 6 weeks)14

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader