Skip to main content

FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

The advancement in deep implicit modeling and articulated models has significantly enhanced the process of digitizing human figures in 3D from just a single image. While state-of-the-art methods have greatly improved geometric precision, the challenge of accurately inferring texture remains, particularly in obscured areas such as the back of a person in frontal-view images. This limitation in texture prediction largely stems from the scarcity of large-scale and diverse 3D datasets, whereas their 2D counterparts are abundant and easily accessible. To address this issue, our paper proposes leveraging extensive 2D fashion datasets to enhance both texture and shape prediction in 3D human digitization. We incorporate 2D priors from the fashion dataset to learn the occluded back view, refined with our proposed domain alignment strategy. We then fuse this information with the input image to obtain a fully textured mesh of the given person. Through extensive experimentation on standard 3D human benchmarks, we demonstrate the superior performance of our approach in terms of both texture and geometry. Code and dataset is available at https://github.com/humansensinglab/FAMOUS.

C. Haene and J.-C. Bazin—Independent Researcher.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Forensic Science Professor Brings Her Innovative VR Tech to Roger Williams University. https://www.rwu.edu/news/news-archive/forensic-science-professor-brings-her-innovative-vr-tech-rwu

  2. IGOODI Official. https://www.linkedin.com/pulse/virtual-fitness-training-future-realistic-avatars-igoodi-official

  3. Readyplayer. https://readyplayer.me

  4. RenderPeople. https://renderpeople.com/3d-people

  5. Virtual Fieldwork: Using Virtual Reality (VR) in Anthropology. https://www.unf.edu/dhi/projects/current/virtual-fieldwork-using-virtual-reality-in-anthropology.html

  6. Walmart Virtual Try-On. https://www.walmart.com/cp/virtual-try-on/4879497

  7. Weta FX. https://www.wetafx.co.nz/films

  8. AlBahar, B., Lu, J., Yang, J., Shu, Z., Shechtman, E., Huang, J.B.: Pose with style: detail-preserving pose-guided image synthesis with conditional StyleGAN. ACM ToG 40, 1–11 (2021)

    Article  Google Scholar 

  9. Alldieck, T., Magnor, M., Bhatnagar, B.L., Theobalt, C., Pons-Moll, G.: Learning to reconstruct people in clothing from a single RGB camera. In: CVPR (2019)

    Google Scholar 

  10. Alldieck, T., Zanfir, M., Sminchisescu, C.: Photorealistic monocular 3D reconstruction of humans wearing clothing. In: CVPR (2022)

    Google Scholar 

  11. Bhunia, A.K., et al.: Person image synthesis via denoising diffusion model. In: CVPR (2023)

    Google Scholar 

  12. Cao, Y., Chen, G., Han, K., Yang, W., Wong, K.Y.K.: JIFF: jointly-aligned implicit face function for high quality single view clothed human reconstruction. In: CVPR (2022)

    Google Scholar 

  13. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, USA (2006)

    Google Scholar 

  14. Dong, Z., Guo, C., Song, J., Chen, X., Geiger, A., Hilliges, O.: PINA: learning a personalized implicit neural avatar from a single RGB-D video sequence. In: CVPR (2022)

    Google Scholar 

  15. Gilbert, A., Volino, M., Collomosse, J., Hilton, A.: Volumetric performance capture from minimal camera viewpoints. In: ECCV (2018)

    Google Scholar 

  16. Guo, K., et al.: The relightables: volumetric performance capture of humans with realistic relighting. ACM ToG 38, 1–19 (2019)

    Google Scholar 

  17. Han, S.H., Park, M.G., Yoon, J.H., Kang, J.M., Park, Y.J., Jeon, H.G.: High-fidelity 3D human digitization from single 2k resolution images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)

    Google Scholar 

  18. He, T., Collomosse, J., Jin, H., Soatto, S.: Geo-PIFu: geometry and pixel aligned implicit functions for single-view human reconstruction. In: NeurIPS (2020)

    Google Scholar 

  19. He, T., Xu, Y., Saito, S., Soatto, S., Tung, T.: ARCH++: animation-ready clothed human reconstruction revisited. In: ICCV (2021)

    Google Scholar 

  20. Huang, Z., et al.: Deep volumetric video from very sparse multi-view performance capture. In: ECCV (2018)

    Google Scholar 

  21. Huang, Z., Xu, Y., Lassner, C., Li, H., Tung, T.: ARCH: animatable reconstruction of clothed humans. In: CVPR (2020)

    Google Scholar 

  22. Li, Y., Huang, C., Loy, C.C.: Dense intrinsic appearance flow for human pose transfer. In: CVPR (2019)

    Google Scholar 

  23. Li, Z., Yu, T., Zheng, Z., Liu, Y.: Robust and accurate 3D portraits in seconds. IEEE TPAMI (2022)

    Google Scholar 

  24. Liu, W., Piao, Z., Min, J., Luo, W., Ma, L., Gao, S.: Liquid warping GAN: a unified framework for human motion imitation, appearance transfer and novel view synthesis. In: ICCV (2019)

    Google Scholar 

  25. Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: CVPR (2016)

    Google Scholar 

  26. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6) (2015)

    Google Scholar 

  27. Men, Y., Mao, Y., Jiang, Y., Ma, W.Y., Lian, Z.: Controllable person image synthesis with attribute-decomposed GAN. In: CVPR (2020)

    Google Scholar 

  28. Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: CVPR (2019)

    Google Scholar 

  29. Prabhudesai, M., Lal, S., Patil, D., Tung, H.Y., Harley, A.W., Fragkiadaki, K.: Disentangling 3D prototypical networks for few-shot concept learning. In: ICLR (2021)

    Google Scholar 

  30. Ren, Y., Fan, X., Li, G., Liu, S., Li, T.H.: Neural texture extraction and distribution for controllable person image synthesis. In: CVPR (2022)

    Google Scholar 

  31. Ren, Y., Yu, X., Chen, J., Li, T.H., Li, G.: Deep image spatial transformation for person image generation. In: CVPR (2020)

    Google Scholar 

  32. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  33. Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: ICCV (2019)

    Google Scholar 

  34. Saito, S., Simon, T., Saragih, J., Joo, H.: PIFuHD: multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In: CVPR (2020)

    Google Scholar 

  35. Sarkar, K., Golyanik, V., Liu, L., Theobalt, C.: Style and pose control for image synthesis of humans from a single monocular view (2021)

    Google Scholar 

  36. Sieberth, T., Dobay, A., Affolter, R., Ebert, L.C.: Applying virtual reality in forensics - a virtual scene walkthrough. Forensic Sci. Med. Pathol. 15, 41–47 (2019)

    Article  Google Scholar 

  37. Sitzmann, V., Martel, J., Bergman, A., Lindell, D., Wetzstein, G.: Implicit neural representations with periodic activation functions. In: NeurIPS (2020)

    Google Scholar 

  38. Smith, D., Loper, M., Hu, X., Mavroidis, P., Romero, J.: FACSIMILE: fast and accurate scans from an image in less than a second. In: ICCV (2019)

    Google Scholar 

  39. Song, D.Y., , Lee, H., Seo, J., Cho, D.: DiFu: depth-guided implicit function for clothed human reconstruction (2023)

    Google Scholar 

  40. Sun, B., Saenko, K.: Deep CORAL: correlation alignment for deep domain adaptation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 443–450. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_35

    Chapter  Google Scholar 

  41. Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain confusion: maximizing for domain invariance. CoRR abs/1412.3474 (2014)

    Google Scholar 

  42. Vlasic, D., Baran, I., Matusik, W., Popović, J.: Articulated mesh animation from multi-view silhouettes. ACM Trans. Graph (2008)

    Google Scholar 

  43. Xiu, Y., Yang, J., Cao, X., Tzionas, D., Black, M.J.: ECON: explicit clothed humans optimized via normal integration. In: CVPR (2023)

    Google Scholar 

  44. Xiu, Y., Yang, J., Tzionas, D., Black, M.J.: ICON: implicit clothed humans obtained from normals. In: CVPR (2022)

    Google Scholar 

  45. Yang, Z., et al.: S3: neural shape, skeleton, and skinning fields for 3D human modeling. In: CVPR (2021)

    Google Scholar 

  46. Yu, T., Zheng, Z., Guo, K., Liu, P., Dai, Q., Liu, Y.: Function4D: real-time human volumetric capture from very sparse consumer RGBD sensors. In: CVPR (2021)

    Google Scholar 

  47. Zakharkin, I., Mazur, K., Grigorev, A., Lempitsky, V.: Point-based modeling of human clothing. In: ICCV (2021)

    Google Scholar 

  48. Zhang, H., et al.: PyMAF-X: towards well-aligned full-body model regression from monocular images. IEEE TPAMI 45, 12287–12303 (2023)

    Article  Google Scholar 

  49. Zhang, J., Li, K., Lai, Y.K., Yang, J.: PISE: person image synthesis and editing with decoupled GAN. In: CVPR (2021)

    Google Scholar 

  50. Zhang, P., Zhang, B., Chen, D., Yuan, L., Wen, F.: Cross-domain correspondence learning for exemplar-based image translation. In: CVPR (2020)

    Google Scholar 

  51. Zheng, Y., et al.: DeepMultiCap: performance capture of multiple characters using sparse multiview cameras. In: ICCV (2021)

    Google Scholar 

  52. Zheng, Z., Yu, T., Liu, Y., Dai, Q.: PaMIR: parametric model-conditioned implicit representation for image-based human reconstruction. IEEE TPAMI 44(6) (2022)

    Google Scholar 

  53. Zhou, X., Yin, M., Chen, X., Sun, L., Gao, C., Li, Q.: Cross attention based style distribution for controllable person image synthesis. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13675, pp. 161–178. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19784-0_10

    Chapter  Google Scholar 

  54. Zuo, X., Du, C., Wang, S., Zheng, J., Yang, R.: Interactive visual hull refinement for specular and transparent object surface reconstruction. In: ICCV (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vishnu Mani Hema .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 23996 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hema, V.M., Aich, S., Haene, C., Bazin, JC., De la Torre, F. (2025). FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15140. Springer, Cham. https://doi.org/10.1007/978-3-031-73007-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73007-8_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73006-1

  • Online ISBN: 978-3-031-73007-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics