Skip to main content

Multi-modal Feature Guided Detailed 3D Face Reconstruction from a Single Image

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Abstract

Reconstructing a 3D face model with high-quality geometry and texture from a single face image is ill-conditioned and challenging. On the one hand, many methods heavily rely on a large amount of training data, which is not easy to obtain. On the other hand, position local features of a face surface can not reflect the global information of an entire face. Due to these challenges, existing methods can hardly reconstruct detailed geometry and realistic textures. To address these issues, we propose a multi-modal feature guided 3D face reconstruction method, named MMFG, which does not require any training data and can generate detailed geometry from a single image. Specifically, we represent the reconstructed 3D face as a signed distance field, and propose to combine the position local feature and multi-modal global features to reconstruct a detailed 3D face. To obtain region-aware information, a Swin Transformer is used as our global feature extractor to extract multi-modal global feature from the rendered multi-view RGB images and depth images. Furthermore, considering the different effects of RGB and depth information on albedo and shading, we use the global features from different modal to guide the recovery of BRDF component respectively during differentiable rendering. Experimental results demonstrate that the proposed method can generate more detailed 3D faces, achieving state-of-the-art results on texture reconstruction and competitive results on shape reconstruction on the NoW dataset.

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant (No. 61976173), Shaanxi Fundamental Science Research Project for Mathematics and Physics (Grant No. 22JSY011) and the MoE-CMCC Artificial Intelligence Project (No. MCM20190701).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vetter, T., Blanz, V.: Estimating coloured 3d face models from single images: an example based approach. ECCV 1407, 499–513 (1998)

    Google Scholar 

  2. Hu, L., Saito, S., Wei, L., Nagano, K., Seo, J., Fursund, J., Sadeghi, I., Sun, C., Chen, Y.C., Li, H.: Avatar digitization from a single image for real-time rendering. ToG 36(6), 1–14 (2017)

    Article  Google Scholar 

  3. Sela, M., Richardson, E., Kimmel, R.: Unrestricted facial geometry reconstruction using image-to-image translation. In: ICCV, pp. 1576–1585 (2017)

    Google Scholar 

  4. Deng, Y., Yang, J., Xu, S., Chen, D., Jia, Y., Tong, X.: Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In: CVPR Workshops (2019)

    Google Scholar 

  5. Feng, Y., Feng, H., Black, M.J., Bolkart, T.: Learning an animatable detailed 3d face model from in-the-wild images. ToG 40(4), 1–13 (2021)

    Article  Google Scholar 

  6. Zielonka, W., Bolkart, T., Thies, J.: Towards metrical reconstruction of human faces. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision, ECCV 2022, LNCS, vol. 13673, pp. 250–269. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19778-9_15

  7. Ren, X., Lattas, A., Gecer, B., Deng, J., Ma, C., Yang, X.: Facial geometric detail recovery via implicit representation. In: FG, pp. 1–8. IEEE (2023)

    Google Scholar 

  8. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021)

    Google Scholar 

  9. Park, J.J., Florence, P., Straub, J., Newcombe, R.: DeepSDF: learning continuous signed distance functions for shape representation. In: CVPR, pp. 165–174 (2019)

    Google Scholar 

  10. Parke, F.I.: Measuring three-dimensional surfaces with a two-dimensional data tablet. Comput. Graph. 1(1), 5–7 (1975)

    Article  Google Scholar 

  11. Jackson, A.S., Bulat, A., Argyriou, V., Tzimiropoulos, G.: Large pose 3d face reconstruction from a single image via direct volumetric CNN regression. In: ICCV, pp. 1031–1039 (2017)

    Google Scholar 

  12. Feng, Y., Wu, F., Shao, X., Wang, Y.: Joint 3d face reconstruction and dense alignment with position map regression network. In: ECCV, pp. 534–551 (2018)

    Google Scholar 

  13. Zeng, X., Peng, X., Qiao, Y.: Df2net: a dense-fine-finer network for detailed 3d face reconstruction. In: ICCV, pp. 2315–2324 (2019)

    Google Scholar 

  14. Gropp, A., Yariv, L., Haim, N., Atzmon, M., Lipman, Y.: Implicit geometric regularization for learning shapes. arXiv preprint arXiv:2002.10099 (2020)

  15. Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S.: Occupancy networks: learning 3d reconstruction in function space. In: CVPR, pp. 4460–4470 (2019)

    Google Scholar 

  16. Genova, K., Cole, F., Sud, A., Sarna, A., Funkhouser, T.: Local deep implicit functions for 3d shape. In: CVPR, pp. 4857–4866 (2020)

    Google Scholar 

  17. Ibing, M., Lim, I., Kobbelt, L.: 3d shape generation with grid-based implicit functions. In: CVPR, pp. 13559–13568 (2021)

    Google Scholar 

  18. Takikawa, T., et al.: Neural geometric level of detail: Real-time rendering with implicit 3d shapes. In: CVPR, pp. 11358–11367 (2021)

    Google Scholar 

  19. Yenamandra, T., et al.: i3dmm: Deep implicit 3d morphable model of human heads. In: CVPR, pp. 12803–12813 (2021)

    Google Scholar 

  20. Zheng, M., Yang, H., Huang, D., Chen, L.: ImFace: a nonlinear 3d morphable face model with implicit neural representations. In: CVPR, pp. 20343–20352 (2022)

    Google Scholar 

  21. Gecer, B., Ploumpis, S., Kotsia, I., Zafeiriou, S.: GANFIT: generative adversarial network fitting for high fidelity 3d face reconstruction. In: CVPR, pp. 1155–1164 (2019)

    Google Scholar 

  22. Gecer, B., Deng, J., Zafeiriou, S.: OSTeC: one-shot texture completion. In: CVPR, pp. 7628–7638 (2021)

    Google Scholar 

  23. Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. TPAMI 44(3), 1623–1637 (2020)

    Article  Google Scholar 

  24. Jiang, Y., Ji, D., Han, Z., Zwicker, M.: SDFDiff: differentiable rendering of signed distance fields for 3d shape optimization. In: CVPR, pp. 1251–1261 (2020)

    Google Scholar 

  25. Peng, S., et al.: Animatable neural radiance fields for modeling dynamic human bodies. In: ICCV, pp. 14314–14323 (2021)

    Google Scholar 

  26. Lee, C.H., Liu, Z., Wu, L., Luo, P.: MaskGAN: towards diverse and interactive facial image manipulation. In: CVPR, pp. 5549–5558 (2020)

    Google Scholar 

  27. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR, pp. 4401–4410 (2019)

    Google Scholar 

  28. Sanyal, S., Bolkart, T., Feng, H., Black, M.: Learning to regress 3D face shape and expression from an image without 3D supervision. In: CVPR, pp. 7763–7772 (2019)

    Google Scholar 

  29. Guo, J., Zhu, X., Yang, Y., Yang, F., Lei, Z., Li, S.Z.: Towards fast, accurate and stable 3d dense face alignment. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 152–168. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_10

    Chapter  Google Scholar 

  30. Yang, H., et al.: FacEscape: a large-scale high quality 3d face dataset and detailed riggable 3d face prediction. In: CVPR, pp. 601–610 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huibin Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, J., Yu, C., Li, H. (2024). Multi-modal Feature Guided Detailed 3D Face Reconstruction from a Single Image. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14426. Springer, Singapore. https://doi.org/10.1007/978-981-99-8432-9_29

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8432-9_29

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8431-2

  • Online ISBN: 978-981-99-8432-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics