Towards Finer Human Reconstruction for Single RGB-D Images

Zhu, Yan; Qian, Yu; Dai, Renlong; Wang, Linbo; Liu, Zhengyi; Fang, Xianyong

doi:10.1007/978-3-031-82021-2_9

Yan Zhu¹³,
Yu Qian¹³,
Renlong Dai¹³,
Linbo Wang¹³,
Zhengyi Liu¹³ &
…
Xianyong Fang ORCID: orcid.org/0000-0002-6045-8430¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15339))

Included in the following conference series:

Computer Graphics International Conference

125 Accesses

Abstract

Existing methods on the parametric model assisted human surface reconstruction from single RGB-D images are still difficult to obtain fine results. This article proposes an improved method which includes three tactics to overcome this limitation. First, a direct optimization scheme is adopted to refine the parametric model for better back prior, considering that the estimated model can be inaccurate and thus affect the reconstruction performances. Second, a new encoder-decoder structured residual-feature based back refinement network is proposed to further polish the initial back surface. It can preserve the global human shapes and poses without missing body parts while keeping local details. Here, a learnable weighted based cross attention module (LCA) is embedded, which adaptively merges the residual features in high levels from both the SMPL-X and initial back depths via cross-attention for rich details. Thirdly, a new silhouette loss on both front and back surfaces is introduced, so that fine back surfaces with smooth transition between the front and back can be reached. With those three tactics, a novel framework is proposed for robust surface reconstruction for single RGB-D images. Experiment results show that the proposed approach can obtain surfaces with significant details without missing parts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alldieck, T., Zanfir, M., Sminchisescu, C.: Photorealistic monocular 3D reconstruction of humans wearing clothing. In: CVPR, pp. 1506–1515 (2022)
Google Scholar
Burov, A., Nießner, M., Thies, J.: Dynamic surface function networks for clothed human bodies. In: ICCV, pp. 10754–10764 (2021)
Google Scholar
Cai, H., Feng, W., Feng, X., Wang, Y., Zhang, J.: Neural surface reconstruction of dynamic scenes with monocular RGB-D camera. In: NIPS (2022)
Google Scholar
Cai, Z., et al.: SMPLer-X: scaling up expressive human pose and shape estimation. NIPS 36 (2024)
Google Scholar
Cao, Y., Han, K., Wong, K.Y.K.: SeSDF: self-evolved signed distance field for implicit 3D clothed human reconstruction. In: CVPR, pp. 4647–4657 (2023)
Google Scholar
Chen, L., Peng, S., Zhou, X.: Towards efficient and photorealistic 3D human reconstruction: a brief survey. Vis. Inf. 5(4), 11–19 (2021)
MATH Google Scholar
Cheng, W., et al.: Generalizable neural performer: learning robust radiance fields for human novel view synthesis. arXiv preprint arXiv:2204.11798 (2022)
Chun, S., Park, S., Chang, J.Y.: Learnable human mesh triangulation for 3D human pose and shape estimation. In: WACV, pp. 2850–2859 (2023)
Google Scholar
Dong, Z., Xu, K., Duan, Z., Bao, H., Xu, W., Lau, R.: Geometry-aware two-scale PIFu representation for human reconstruction. NIPS 35, 31130–31144 (2022)
Google Scholar
Dou, M., Fuchs, H.: Temporally enhanced 3D capture of room-sized dynamic scenes with commodity depth cameras. In: IEEE VR, pp. 39–44 (2014)
Google Scholar
Fang, X., Qian, Y., He, J., Wang, L., Liu, Z.: Fine back surfaces oriented human reconstruction for single RGB-D images. Comput. Graph. Forum 42(7), e14971 (2023)
Article Google Scholar
Feng, Y., Choutas, V., Bolkart, T., Tzionas, D., Black, M.J.: Collaborative regression of expressive bodies using moderation. In: 3DV, pp. 792–804 (2021)
Google Scholar
Blender Foundation: Blender. https://www.blender.org/. Accessed 28 May 2023
Gabeur, V., Franco, J.S., Martin, X., Schmid, C., Rogez, G.: Moulding humans: non-parametric 3D human shape estimation from single images. In: ICCV, pp. 2232–2241 (2019)
Google Scholar
Guo, K., Xu, F., Yu, T., Liu, X., Dai, Q., Liu, Y.: Real-time geometry, albedo, and motion reconstruction using a single RGB-D camera. ACM TOG 36(4), 1 (2017)
Article MATH Google Scholar
Hu, S., Hong, F., Pan, L., Mei, H., Yang, L., Liu, Z.: SHERF: generalizable human NeRF from a single image. arXiv preprint arXiv:2303.12791 (2023)
Huang, Y., et al.: TeCH: text-guided reconstruction of lifelike clothed humans. In: 3DV (2024)
Google Scholar
Innmann, M., Zollhöfer, M., Nießner, M., Theobalt, C., Stamminger, M.: VolumeDeform: real-time volumetric non-rigid reconstruction. In: ECCV, pp. 362–379. Springer (2016)
Google Scholar
Jiang, B., Zhang, J., Hong, Y., Luo, J., Liu, L., Bao, H.: BCNet: learning body and cloth shape from a single image. In: ECCV, pp. 18–35. Springer (2020)
Google Scholar
Keller, M., Lefloch, D., Lambers, M., Izadi, S., Weyrich, T., Kolb, A.: Real-time 3D reconstruction in dynamic scenes using point-based fusion. In: 3DV, pp. 1–8 (2013)
Google Scholar
Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: CVPR, pp. 4501–4510 (2019)
Google Scholar
Lazova, V., Insafutdinov, E., Pons-Moll, G.: 360-degree textures of people in clothing from a single image. In: 3DV, pp. 643–653. IEEE (2019)
Google Scholar
Li, X., Fan, Y., Xu, D., He, W., Lv, G., Liu, S.: SFNet: clothed human 3D reconstruction via single side-to-front view RGB-D image. In: 2022 8th International Conference on Virtual Reality (ICVR), pp. 15–20 (2022)
Google Scholar
Liao, T., et al.: High-fidelity clothed avatar reconstruction from a single image. In: CVPR, pp. 8662–8672 (2023)
Google Scholar
Liu, C., Wang, A., Bu, C., Liu, T., Sui, J., He, S.: Reconstructing complete human bodies using limited scans obtained via viewpoint auto-changing oriented toward rescue operations. IEEE TIM 72, 1–12 (2023)
Google Scholar
Liu, X., et al.: GVA: Reconstructing vivid 3D Gaussian avatars from monocular videos. arXiv preprint (2024)
Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM TOG 34(6), 1–16 (2015)
Article Google Scholar
Lu, Y., Yu, H., Ni, W., Song, L.: 3D real-time human reconstruction with a single RGBD camera. Appl. Intell. 53(8), 8735–8745 (2023)
Article MATH Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. CACM 65(1), 99–106 (2021)
Article Google Scholar
Natsume, R., et al.: SiCloPe: silhouette-based clothed people. In: CVPR, pp. 4480–4490 (2019)
Google Scholar
Newcombe, R.A., Fox, D., Seitz, S.M.: DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: CVPR, pp. 343–352 (2015)
Google Scholar
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: CVPR, pp. 10975–10985 (2019)
Google Scholar
Pesavento, M., et al.: ANIM: accurate neural implicit model for human reconstruction from a single RGB-D image. In: CVPR (2024)
Google Scholar
Qian, Z., Wang, S., Mihajlovic, M., Geiger, A., Tang, S.: 3DGS-Avatar: animatable avatars via deformable 3D Gaussian splatting. arXiv preprint arXiv:2312.09228 (2023)
rlczddl: Awesome 3D human reconstruction. https://github.com/rlczddl/awesome-3d-human-reconstruction. Accessed 28 May 2023
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: MICCAI, pp. 234–241. Springer (2015)
Google Scholar
Saito, S., Simon, T., Saragih, J., Joo, H.: PIFuHD: multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In: CVPR, pp. 84–93 (2020)
Google Scholar
Saito, S., Yang, J., Ma, Q., Black, M.J.: SCANimate: weakly supervised learning of skinned clothed avatar networks. In: CVPR, pp. 2886–2897 (2021)
Google Scholar
Shen, X., Yang, Z., Wang, X., Ma, J., Zhou, C., Yang, Y.: Global-to-local modeling for video-based 3D human pose and shape estimation. In: CVPR, pp. 8887–8896 (2023)
Google Scholar
Slavcheva, M., Baust, M., Ilic, S.: SobolevFusion: 3D reconstruction of scenes undergoing free non-rigid motion. In: CVPR, pp. 2646–2655 (2018)
Google Scholar
Song, D.Y., Lee, H., Seo, J., Cho, D.: DIFu: depth-guided implicit function for clothed human reconstruction. In: CVPR, pp. 8738–8747 (2023)
Google Scholar
Tretschk, E., et al.: State of the art in dense monocular non-rigid 3D reconstruction. Comput. Graph. Forum 42(2), 485–520 (2023)
Article MATH Google Scholar
Vo, K., Pham, T.T., Yamazaki, K., Tran, M., Le, N.: DNA: deformable neural articulations network for template-free dynamic 3D human reconstruction from monocular RGB-D video. In: CVPR, pp. 3675–3684 (2023)
Google Scholar
Wang, L., Zhao, X., Yu, T., Wang, S., Liu, Y.: NormalGAN: learning detailed 3D human from a single RGB-D image. In: ECCV, pp. 430–446 (2020)
Google Scholar
Xiu, Y., Yang, J., Cao, X., Tzionas, D., Black, M.J.: ECON: explicit clothed humans obtained from normal integration. In: CVPR, pp. 512–523 (2023)
Google Scholar
Xiu, Y., Yang, J., Tzionas, D., Black, M.J.: ICON: implicit clothed humans obtained from normals. In: CVPR, pp. 13286–13296. IEEE (2022)
Google Scholar
Yu, T., Zheng, Z., Guo, K., Liu, P., Dai, Q., Liu, Y.: Function4D: real-time human volumetric capture from very sparse consumer RGBD sensors. In: CVPR, pp. 5746–5756 (2021)
Google Scholar
Yu, T., et al.: DoubleFusion: real-time capture of human performances with inner body shapes from a single depth sensor. In: CVPR, pp. 7287–7296 (2018)
Google Scholar
Zhang, Z., Yang, Z., Yang, Y.: SIFU: side-view conditioned implicit function for real-world usable clothed human reconstruction. In: CVPR (2024)
Google Scholar
Zhao, X., Hu, Y.T., Ren, Z., Schwing, A.G.: Occupancy planes for single-view RGB-D human reconstruction. In: AAAI (2023)
Google Scholar
Zheng, R., Li, P., Wang, H., Yu, T.: Learning visibility field for detailed 3D human reconstruction and relighting. In: CVPR, pp. 216–226 (2023)
Google Scholar
Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: DeepHuman: 3D human reconstruction from a single image. In: ICCV, pp. 7739–7749 (2019)
Google Scholar
Zhi, T., Lassner, C., Tung, T., Stoll, C., Narasimhan, S.G., Vo, M.: TexMesh: reconstructing detailed human texture and geometry from RGB-D video. In: ECCV, pp. 492–509 (2020)
Google Scholar
Zollhöfer, M., et al.: State of the art on 3D reconstruction with RGB-D cameras. Comput. Graph. Forum 37(2), 625–652 (2018)
Article MATH Google Scholar

Download references

Acknowledgement

This work is supported by the Key Natural Science Fund of Department of Education of Anhui Province (KJ2021A0042).

Author information

Authors and Affiliations

School of Computer Science and Technology, Anhui University, 230601, Hefei, China
Yan Zhu, Yu Qian, Renlong Dai, Linbo Wang, Zhengyi Liu & Xianyong Fang

Authors

Yan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yu Qian
View author publications
You can also search for this author in PubMed Google Scholar
Renlong Dai
View author publications
You can also search for this author in PubMed Google Scholar
Linbo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhengyi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xianyong Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xianyong Fang .

Editor information

Editors and Affiliations

University of Geneva, Geneva, Switzerland
Nadia Magnenat-Thalmann
The University of Sydney, Sydney, NSW, Australia
Jinman Kim
Shanghai Jiao Tong University, Shanghai, China
Bin Sheng
University of Houston, Houston, TX, USA
Zhigang Deng
EPFL, Lausanne, Switzerland
Daniel Thalmann
The Hong Kong Polytechnic University, Kowloon, Hong Kong
Ping Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, Y., Qian, Y., Dai, R., Wang, L., Liu, Z., Fang, X. (2025). Towards Finer Human Reconstruction for Single RGB-D Images. In: Magnenat-Thalmann, N., Kim, J., Sheng, B., Deng, Z., Thalmann, D., Li, P. (eds) Advances in Computer Graphics. CGI 2024. Lecture Notes in Computer Science, vol 15339. Springer, Cham. https://doi.org/10.1007/978-3-031-82021-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-82021-2_9
Published: 01 March 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-82020-5
Online ISBN: 978-3-031-82021-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Finer Human Reconstruction for Single RGB-D Images