Abstract
The advancement in deep implicit modeling and articulated models has significantly enhanced the process of digitizing human figures in 3D from just a single image. While state-of-the-art methods have greatly improved geometric precision, the challenge of accurately inferring texture remains, particularly in obscured areas such as the back of a person in frontal-view images. This limitation in texture prediction largely stems from the scarcity of large-scale and diverse 3D datasets, whereas their 2D counterparts are abundant and easily accessible. To address this issue, our paper proposes leveraging extensive 2D fashion datasets to enhance both texture and shape prediction in 3D human digitization. We incorporate 2D priors from the fashion dataset to learn the occluded back view, refined with our proposed domain alignment strategy. We then fuse this information with the input image to obtain a fully textured mesh of the given person. Through extensive experimentation on standard 3D human benchmarks, we demonstrate the superior performance of our approach in terms of both texture and geometry. Code and dataset is available at https://github.com/humansensinglab/FAMOUS.
C. Haene and J.-C. Bazin—Independent Researcher.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Forensic Science Professor Brings Her Innovative VR Tech to Roger Williams University. https://www.rwu.edu/news/news-archive/forensic-science-professor-brings-her-innovative-vr-tech-rwu
IGOODI Official. https://www.linkedin.com/pulse/virtual-fitness-training-future-realistic-avatars-igoodi-official
Readyplayer. https://readyplayer.me
RenderPeople. https://renderpeople.com/3d-people
Virtual Fieldwork: Using Virtual Reality (VR) in Anthropology. https://www.unf.edu/dhi/projects/current/virtual-fieldwork-using-virtual-reality-in-anthropology.html
Walmart Virtual Try-On. https://www.walmart.com/cp/virtual-try-on/4879497
Weta FX. https://www.wetafx.co.nz/films
AlBahar, B., Lu, J., Yang, J., Shu, Z., Shechtman, E., Huang, J.B.: Pose with style: detail-preserving pose-guided image synthesis with conditional StyleGAN. ACM ToG 40, 1–11 (2021)
Alldieck, T., Magnor, M., Bhatnagar, B.L., Theobalt, C., Pons-Moll, G.: Learning to reconstruct people in clothing from a single RGB camera. In: CVPR (2019)
Alldieck, T., Zanfir, M., Sminchisescu, C.: Photorealistic monocular 3D reconstruction of humans wearing clothing. In: CVPR (2022)
Bhunia, A.K., et al.: Person image synthesis via denoising diffusion model. In: CVPR (2023)
Cao, Y., Chen, G., Han, K., Yang, W., Wong, K.Y.K.: JIFF: jointly-aligned implicit face function for high quality single view clothed human reconstruction. In: CVPR (2022)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, USA (2006)
Dong, Z., Guo, C., Song, J., Chen, X., Geiger, A., Hilliges, O.: PINA: learning a personalized implicit neural avatar from a single RGB-D video sequence. In: CVPR (2022)
Gilbert, A., Volino, M., Collomosse, J., Hilton, A.: Volumetric performance capture from minimal camera viewpoints. In: ECCV (2018)
Guo, K., et al.: The relightables: volumetric performance capture of humans with realistic relighting. ACM ToG 38, 1–19 (2019)
Han, S.H., Park, M.G., Yoon, J.H., Kang, J.M., Park, Y.J., Jeon, H.G.: High-fidelity 3D human digitization from single 2k resolution images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
He, T., Collomosse, J., Jin, H., Soatto, S.: Geo-PIFu: geometry and pixel aligned implicit functions for single-view human reconstruction. In: NeurIPS (2020)
He, T., Xu, Y., Saito, S., Soatto, S., Tung, T.: ARCH++: animation-ready clothed human reconstruction revisited. In: ICCV (2021)
Huang, Z., et al.: Deep volumetric video from very sparse multi-view performance capture. In: ECCV (2018)
Huang, Z., Xu, Y., Lassner, C., Li, H., Tung, T.: ARCH: animatable reconstruction of clothed humans. In: CVPR (2020)
Li, Y., Huang, C., Loy, C.C.: Dense intrinsic appearance flow for human pose transfer. In: CVPR (2019)
Li, Z., Yu, T., Zheng, Z., Liu, Y.: Robust and accurate 3D portraits in seconds. IEEE TPAMI (2022)
Liu, W., Piao, Z., Min, J., Luo, W., Ma, L., Gao, S.: Liquid warping GAN: a unified framework for human motion imitation, appearance transfer and novel view synthesis. In: ICCV (2019)
Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: CVPR (2016)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6) (2015)
Men, Y., Mao, Y., Jiang, Y., Ma, W.Y., Lian, Z.: Controllable person image synthesis with attribute-decomposed GAN. In: CVPR (2020)
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: CVPR (2019)
Prabhudesai, M., Lal, S., Patil, D., Tung, H.Y., Harley, A.W., Fragkiadaki, K.: Disentangling 3D prototypical networks for few-shot concept learning. In: ICLR (2021)
Ren, Y., Fan, X., Li, G., Liu, S., Li, T.H.: Neural texture extraction and distribution for controllable person image synthesis. In: CVPR (2022)
Ren, Y., Yu, X., Chen, J., Li, T.H., Li, G.: Deep image spatial transformation for person image generation. In: CVPR (2020)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: ICCV (2019)
Saito, S., Simon, T., Saragih, J., Joo, H.: PIFuHD: multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In: CVPR (2020)
Sarkar, K., Golyanik, V., Liu, L., Theobalt, C.: Style and pose control for image synthesis of humans from a single monocular view (2021)
Sieberth, T., Dobay, A., Affolter, R., Ebert, L.C.: Applying virtual reality in forensics - a virtual scene walkthrough. Forensic Sci. Med. Pathol. 15, 41–47 (2019)
Sitzmann, V., Martel, J., Bergman, A., Lindell, D., Wetzstein, G.: Implicit neural representations with periodic activation functions. In: NeurIPS (2020)
Smith, D., Loper, M., Hu, X., Mavroidis, P., Romero, J.: FACSIMILE: fast and accurate scans from an image in less than a second. In: ICCV (2019)
Song, D.Y., , Lee, H., Seo, J., Cho, D.: DiFu: depth-guided implicit function for clothed human reconstruction (2023)
Sun, B., Saenko, K.: Deep CORAL: correlation alignment for deep domain adaptation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 443–450. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_35
Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain confusion: maximizing for domain invariance. CoRR abs/1412.3474 (2014)
Vlasic, D., Baran, I., Matusik, W., Popović, J.: Articulated mesh animation from multi-view silhouettes. ACM Trans. Graph (2008)
Xiu, Y., Yang, J., Cao, X., Tzionas, D., Black, M.J.: ECON: explicit clothed humans optimized via normal integration. In: CVPR (2023)
Xiu, Y., Yang, J., Tzionas, D., Black, M.J.: ICON: implicit clothed humans obtained from normals. In: CVPR (2022)
Yang, Z., et al.: S3: neural shape, skeleton, and skinning fields for 3D human modeling. In: CVPR (2021)
Yu, T., Zheng, Z., Guo, K., Liu, P., Dai, Q., Liu, Y.: Function4D: real-time human volumetric capture from very sparse consumer RGBD sensors. In: CVPR (2021)
Zakharkin, I., Mazur, K., Grigorev, A., Lempitsky, V.: Point-based modeling of human clothing. In: ICCV (2021)
Zhang, H., et al.: PyMAF-X: towards well-aligned full-body model regression from monocular images. IEEE TPAMI 45, 12287–12303 (2023)
Zhang, J., Li, K., Lai, Y.K., Yang, J.: PISE: person image synthesis and editing with decoupled GAN. In: CVPR (2021)
Zhang, P., Zhang, B., Chen, D., Yuan, L., Wen, F.: Cross-domain correspondence learning for exemplar-based image translation. In: CVPR (2020)
Zheng, Y., et al.: DeepMultiCap: performance capture of multiple characters using sparse multiview cameras. In: ICCV (2021)
Zheng, Z., Yu, T., Liu, Y., Dai, Q.: PaMIR: parametric model-conditioned implicit representation for image-based human reconstruction. IEEE TPAMI 44(6) (2022)
Zhou, X., Yin, M., Chen, X., Sun, L., Gao, C., Li, Q.: Cross attention based style distribution for controllable person image synthesis. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13675, pp. 161–178. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19784-0_10
Zuo, X., Du, C., Wang, S., Zheng, J., Yang, R.: Interactive visual hull refinement for specular and transparent object surface reconstruction. In: ICCV (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hema, V.M., Aich, S., Haene, C., Bazin, JC., De la Torre, F. (2025). FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15140. Springer, Cham. https://doi.org/10.1007/978-3-031-73007-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-73007-8_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73006-1
Online ISBN: 978-3-031-73007-8
eBook Packages: Computer ScienceComputer Science (R0)