ABSTRACT
With the introduction of the "metaverse" concept, digital human technology has garnered widespread attention, and how to quickly model a high-precision 3D human face has become a research focus in the field of digital humans. To address the issue of 3D Morphable Models (3DMM) being unable to reconstruct facial details, this paper proposes an animatable facial detail generation algorithm based on generative adversarial loss, which is used to restore more facial reliefs on the basic face obtained from three-dimensional facial reconstruction, achieving a highly realistic reconstruction of digital human faces. Firstly, this paper employs displacement mapping technology to recover facial detail information, constructing an autoencoder network to predict displacement maps from input images, controlling the vertices of the 3D model to express more relief details. Secondly, to effectively capture the mid-to-high frequency details of the face, generative adversarial loss is introduced to model the high-frequency attribute differences between images, jointly constrained with photometric losses and other content losses to ensure the accuracy of generated facial details. Lastly, to decouple static from dynamic facial details, this paper embeds 3DMM expression-related parameters into the generator model of displacement maps, and adopts an identity averaging strategy to constrain the encoder network to model only the static details of the face, whereas the dynamic details are provided by the 3DMM parameters, enabling the generation of dynamic facial details driven by expressions. The final experimental results show that the algorithm proposed in this paper demonstrates competitive performance in terms of model reconstruction quality and accuracy when compared to other state-of-the-art algorithms in facial detail generation. Additionally, the ability of the model to dynamically construct facial details was verified through expression transfer experiments, and the importance of generative adversarial loss was confirmed through ablation experiments.
- Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques. 187–194.Google ScholarDigital Library
- Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2013. Facewarehouse: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20, 3 (2013), 413–425.Google Scholar
- Qiong Cao, Li Shen, Weidi Xie, Omkar M Parkhi, and Andrew Zisserman. 2018. Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018). IEEE, 67–74.Google ScholarDigital Library
- Zenghao Chai, Haoxian Zhang, Jing Ren, Di Kang, Zhengzhuo Xu, Xuefei Zhe, Chun Yuan, and Linchao Bao. 2022. REALY: Rethinking the evaluation of 3D face reconstruction. In European Conference on Computer Vision. Springer, 74–92.Google ScholarDigital Library
- Radek Daněček, Michael J Black, and Timo Bolkart. 2022. EMOCA: Emotion driven monocular face capture and animation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20311–20322.Google ScholarCross Ref
- Yao Feng, Haiwen Feng, Michael J Black, and Timo Bolkart. 2021. Learning an animatable detailed 3D face model from in-the-wild images. ACM Transactions on Graphics (ToG) 40, 4 (2021), 1–13.Google ScholarDigital Library
- Yao Feng, Fan Wu, Xiaohu Shao, Yanfeng Wang, and Xi Zhou. 2018. Joint 3d face reconstruction and dense alignment with position map regression network. In Proceedings of the European conference on computer vision (ECCV). 534–551.Google ScholarDigital Library
- Shape from Shading. 1970. A Method for Obtaining the Shape of a Smooth Opaque Object from One View. Massachusetts Institute of Technology 232 (1970).Google Scholar
- Baris Gecer, Stylianos Ploumpis, Irene Kotsia, and Stefanos Zafeiriou. 2019. Ganfit: Generative adversarial network fitting for high fidelity 3d face reconstruction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1155–1164.Google ScholarCross Ref
- Baris Gecer, Stylianos Ploumpis, Irene Kotsia, and Stefanos Zafeiriou. 2021. Fast-ganfit: Generative adversarial network for high fidelity 3d face reconstruction. IEEE transactions on pattern analysis and machine intelligence 44, 9 (2021), 4879–4893.Google Scholar
- Kyle Genova, Forrester Cole, Aaron Maschinot, Aaron Sarna, Daniel Vlasic, and William T Freeman. 2018. Unsupervised training for 3d morphable model regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8377–8386.Google ScholarCross Ref
- Thomas Gerig, Andreas Morel-Forster, Clemens Blumer, Bernhard Egger, Marcel Luthi, Sandro Schönborn, and Thomas Vetter. 2018. Morphable face models-an open framework. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 75–82.Google ScholarDigital Library
- Aaron S Jackson, Adrian Bulat, Vasileios Argyriou, and Georgios Tzimiropoulos. 2017. Large pose 3D face reconstruction from a single image via direct volumetric CNN regression. In Proceedings of the IEEE international conference on computer vision. 1031–1039.Google ScholarCross Ref
- Luo Jiang, Juyong Zhang, Bailin Deng, Hao Li, and Ligang Liu. 2018. 3D face reconstruction with geometry details from a single image. IEEE Transactions on Image Processing 27, 10 (2018), 4756–4770.Google ScholarDigital Library
- Tianye Li, Timo Bolkart, Michael J Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans.ACM Trans. Graph. 36, 6 (2017), 194–1.Google ScholarDigital Library
- Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2018. Large-scale celebfaces attributes (celeba) dataset. Retrieved August 15, 2018 (2018), 11.Google Scholar
- Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).Google Scholar
- Elad Richardson, Matan Sela, and Ron Kimmel. 2016. 3D face reconstruction by learning from synthetic data. In 2016 fourth international conference on 3D vision (3DV). IEEE, 460–469.Google Scholar
- Elad Richardson, Matan Sela, Roy Or-El, and Ron Kimmel. 2017. Learning detailed face reconstruction from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1259–1268.Google ScholarCross Ref
- Soubhik Sanyal, Timo Bolkart, Haiwen Feng, and Michael J Black. 2019. Learning to regress 3D face shape and expression from an image without 3D supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7763–7772.Google ScholarCross Ref
- Matan Sela, Elad Richardson, and Ron Kimmel. 2017. Unrestricted facial geometry reconstruction using image-to-image translation. In Proceedings of the IEEE international conference on computer vision. 1576–1585.Google ScholarCross Ref
- Ayush Tewari, Michael Zollhofer, Hyeongwoo Kim, Pablo Garrido, Florian Bernard, Patrick Perez, and Christian Theobalt. 2017. Mofa: Model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In Proceedings of the IEEE international conference on computer vision workshops. 1274–1283.Google Scholar
- Luan Tran and Xiaoming Liu. 2018. Nonlinear 3d face morphable model. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7346–7355.Google ScholarCross Ref
- Luan Tran and Xiaoming Liu. 2019. On learning 3d face morphable model from in-the-wild images. IEEE transactions on pattern analysis and machine intelligence 43, 1 (2019), 157–171.Google ScholarDigital Library
- Huynh Cao Tuan, Do Nang Toan, and Lam Thanh Hien. 2022. Automatic Selection of Key Points for 3D-Face Deformation. Journal of Advances in Information Technology 13, 4 (2022), 332 – 337.Google ScholarCross Ref
- Haotian Yang, Hao Zhu, Yanru Wang, Mingkai Huang, Qiu Shen, Ruigang Yang, and Xun Cao. 2020. Facescape: a large-scale high quality 3d face dataset and detailed riggable 3d face prediction. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition. 601–610.Google ScholarCross Ref
Index Terms
- Animatable 3D Facial Detail Reconstruction from In-the-wild Images
Recommendations
Learning an animatable detailed 3D face model from in-the-wild images
While current monocular 3D face reconstruction methods can recover fine geometric details, they suffer several limitations. Some methods produce faces that cannot be realistically animated because they do not model how wrinkles vary with expression. ...
Efficient 3D reconstruction for face recognition
Face recognition with variant pose, illumination and expression (PIE) is a challenging problem. In this paper, we propose an analysis-by-synthesis framework for face recognition with variant PIE. First, an efficient two-dimensional (2D)-to-three-...
2D face fitting-assisted 3D face reconstruction for pose-robust face recognition
Special issue on Digital Information ForensicsRecent face recognition algorithm can achieve high accuracy when the tested face samples are frontal. However, when the face pose changes largely, the performance of existing methods drop drastically. Efforts on pose-robust face recognition are highly ...
Comments