Abstract
Existing methods on the parametric model assisted human surface reconstruction from single RGB-D images are still difficult to obtain fine results. This article proposes an improved method which includes three tactics to overcome this limitation. First, a direct optimization scheme is adopted to refine the parametric model for better back prior, considering that the estimated model can be inaccurate and thus affect the reconstruction performances. Second, a new encoder-decoder structured residual-feature based back refinement network is proposed to further polish the initial back surface. It can preserve the global human shapes and poses without missing body parts while keeping local details. Here, a learnable weighted based cross attention module (LCA) is embedded, which adaptively merges the residual features in high levels from both the SMPL-X and initial back depths via cross-attention for rich details. Thirdly, a new silhouette loss on both front and back surfaces is introduced, so that fine back surfaces with smooth transition between the front and back can be reached. With those three tactics, a novel framework is proposed for robust surface reconstruction for single RGB-D images. Experiment results show that the proposed approach can obtain surfaces with significant details without missing parts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alldieck, T., Zanfir, M., Sminchisescu, C.: Photorealistic monocular 3D reconstruction of humans wearing clothing. In: CVPR, pp. 1506–1515 (2022)
Burov, A., Nießner, M., Thies, J.: Dynamic surface function networks for clothed human bodies. In: ICCV, pp. 10754–10764 (2021)
Cai, H., Feng, W., Feng, X., Wang, Y., Zhang, J.: Neural surface reconstruction of dynamic scenes with monocular RGB-D camera. In: NIPS (2022)
Cai, Z., et al.: SMPLer-X: scaling up expressive human pose and shape estimation. NIPS 36 (2024)
Cao, Y., Han, K., Wong, K.Y.K.: SeSDF: self-evolved signed distance field for implicit 3D clothed human reconstruction. In: CVPR, pp. 4647–4657 (2023)
Chen, L., Peng, S., Zhou, X.: Towards efficient and photorealistic 3D human reconstruction: a brief survey. Vis. Inf. 5(4), 11–19 (2021)
Cheng, W., et al.: Generalizable neural performer: learning robust radiance fields for human novel view synthesis. arXiv preprint arXiv:2204.11798 (2022)
Chun, S., Park, S., Chang, J.Y.: Learnable human mesh triangulation for 3D human pose and shape estimation. In: WACV, pp. 2850–2859 (2023)
Dong, Z., Xu, K., Duan, Z., Bao, H., Xu, W., Lau, R.: Geometry-aware two-scale PIFu representation for human reconstruction. NIPS 35, 31130–31144 (2022)
Dou, M., Fuchs, H.: Temporally enhanced 3D capture of room-sized dynamic scenes with commodity depth cameras. In: IEEE VR, pp. 39–44 (2014)
Fang, X., Qian, Y., He, J., Wang, L., Liu, Z.: Fine back surfaces oriented human reconstruction for single RGB-D images. Comput. Graph. Forum 42(7), e14971 (2023)
Feng, Y., Choutas, V., Bolkart, T., Tzionas, D., Black, M.J.: Collaborative regression of expressive bodies using moderation. In: 3DV, pp. 792–804 (2021)
Blender Foundation: Blender. https://www.blender.org/. Accessed 28 May 2023
Gabeur, V., Franco, J.S., Martin, X., Schmid, C., Rogez, G.: Moulding humans: non-parametric 3D human shape estimation from single images. In: ICCV, pp. 2232–2241 (2019)
Guo, K., Xu, F., Yu, T., Liu, X., Dai, Q., Liu, Y.: Real-time geometry, albedo, and motion reconstruction using a single RGB-D camera. ACM TOG 36(4), 1 (2017)
Hu, S., Hong, F., Pan, L., Mei, H., Yang, L., Liu, Z.: SHERF: generalizable human NeRF from a single image. arXiv preprint arXiv:2303.12791 (2023)
Huang, Y., et al.: TeCH: text-guided reconstruction of lifelike clothed humans. In: 3DV (2024)
Innmann, M., Zollhöfer, M., Nießner, M., Theobalt, C., Stamminger, M.: VolumeDeform: real-time volumetric non-rigid reconstruction. In: ECCV, pp. 362–379. Springer (2016)
Jiang, B., Zhang, J., Hong, Y., Luo, J., Liu, L., Bao, H.: BCNet: learning body and cloth shape from a single image. In: ECCV, pp. 18–35. Springer (2020)
Keller, M., Lefloch, D., Lambers, M., Izadi, S., Weyrich, T., Kolb, A.: Real-time 3D reconstruction in dynamic scenes using point-based fusion. In: 3DV, pp. 1–8 (2013)
Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: CVPR, pp. 4501–4510 (2019)
Lazova, V., Insafutdinov, E., Pons-Moll, G.: 360-degree textures of people in clothing from a single image. In: 3DV, pp. 643–653. IEEE (2019)
Li, X., Fan, Y., Xu, D., He, W., Lv, G., Liu, S.: SFNet: clothed human 3D reconstruction via single side-to-front view RGB-D image. In: 2022 8th International Conference on Virtual Reality (ICVR), pp. 15–20 (2022)
Liao, T., et al.: High-fidelity clothed avatar reconstruction from a single image. In: CVPR, pp. 8662–8672 (2023)
Liu, C., Wang, A., Bu, C., Liu, T., Sui, J., He, S.: Reconstructing complete human bodies using limited scans obtained via viewpoint auto-changing oriented toward rescue operations. IEEE TIM 72, 1–12 (2023)
Liu, X., et al.: GVA: Reconstructing vivid 3D Gaussian avatars from monocular videos. arXiv preprint (2024)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM TOG 34(6), 1–16 (2015)
Lu, Y., Yu, H., Ni, W., Song, L.: 3D real-time human reconstruction with a single RGBD camera. Appl. Intell. 53(8), 8735–8745 (2023)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. CACM 65(1), 99–106 (2021)
Natsume, R., et al.: SiCloPe: silhouette-based clothed people. In: CVPR, pp. 4480–4490 (2019)
Newcombe, R.A., Fox, D., Seitz, S.M.: DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: CVPR, pp. 343–352 (2015)
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: CVPR, pp. 10975–10985 (2019)
Pesavento, M., et al.: ANIM: accurate neural implicit model for human reconstruction from a single RGB-D image. In: CVPR (2024)
Qian, Z., Wang, S., Mihajlovic, M., Geiger, A., Tang, S.: 3DGS-Avatar: animatable avatars via deformable 3D Gaussian splatting. arXiv preprint arXiv:2312.09228 (2023)
rlczddl: Awesome 3D human reconstruction. https://github.com/rlczddl/awesome-3d-human-reconstruction. Accessed 28 May 2023
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: MICCAI, pp. 234–241. Springer (2015)
Saito, S., Simon, T., Saragih, J., Joo, H.: PIFuHD: multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In: CVPR, pp. 84–93 (2020)
Saito, S., Yang, J., Ma, Q., Black, M.J.: SCANimate: weakly supervised learning of skinned clothed avatar networks. In: CVPR, pp. 2886–2897 (2021)
Shen, X., Yang, Z., Wang, X., Ma, J., Zhou, C., Yang, Y.: Global-to-local modeling for video-based 3D human pose and shape estimation. In: CVPR, pp. 8887–8896 (2023)
Slavcheva, M., Baust, M., Ilic, S.: SobolevFusion: 3D reconstruction of scenes undergoing free non-rigid motion. In: CVPR, pp. 2646–2655 (2018)
Song, D.Y., Lee, H., Seo, J., Cho, D.: DIFu: depth-guided implicit function for clothed human reconstruction. In: CVPR, pp. 8738–8747 (2023)
Tretschk, E., et al.: State of the art in dense monocular non-rigid 3D reconstruction. Comput. Graph. Forum 42(2), 485–520 (2023)
Vo, K., Pham, T.T., Yamazaki, K., Tran, M., Le, N.: DNA: deformable neural articulations network for template-free dynamic 3D human reconstruction from monocular RGB-D video. In: CVPR, pp. 3675–3684 (2023)
Wang, L., Zhao, X., Yu, T., Wang, S., Liu, Y.: NormalGAN: learning detailed 3D human from a single RGB-D image. In: ECCV, pp. 430–446 (2020)
Xiu, Y., Yang, J., Cao, X., Tzionas, D., Black, M.J.: ECON: explicit clothed humans obtained from normal integration. In: CVPR, pp. 512–523 (2023)
Xiu, Y., Yang, J., Tzionas, D., Black, M.J.: ICON: implicit clothed humans obtained from normals. In: CVPR, pp. 13286–13296. IEEE (2022)
Yu, T., Zheng, Z., Guo, K., Liu, P., Dai, Q., Liu, Y.: Function4D: real-time human volumetric capture from very sparse consumer RGBD sensors. In: CVPR, pp. 5746–5756 (2021)
Yu, T., et al.: DoubleFusion: real-time capture of human performances with inner body shapes from a single depth sensor. In: CVPR, pp. 7287–7296 (2018)
Zhang, Z., Yang, Z., Yang, Y.: SIFU: side-view conditioned implicit function for real-world usable clothed human reconstruction. In: CVPR (2024)
Zhao, X., Hu, Y.T., Ren, Z., Schwing, A.G.: Occupancy planes for single-view RGB-D human reconstruction. In: AAAI (2023)
Zheng, R., Li, P., Wang, H., Yu, T.: Learning visibility field for detailed 3D human reconstruction and relighting. In: CVPR, pp. 216–226 (2023)
Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: DeepHuman: 3D human reconstruction from a single image. In: ICCV, pp. 7739–7749 (2019)
Zhi, T., Lassner, C., Tung, T., Stoll, C., Narasimhan, S.G., Vo, M.: TexMesh: reconstructing detailed human texture and geometry from RGB-D video. In: ECCV, pp. 492–509 (2020)
Zollhöfer, M., et al.: State of the art on 3D reconstruction with RGB-D cameras. Comput. Graph. Forum 37(2), 625–652 (2018)
Acknowledgement
This work is supported by the Key Natural Science Fund of Department of Education of Anhui Province (KJ2021A0042).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhu, Y., Qian, Y., Dai, R., Wang, L., Liu, Z., Fang, X. (2025). Towards Finer Human Reconstruction for Single RGB-D Images. In: Magnenat-Thalmann, N., Kim, J., Sheng, B., Deng, Z., Thalmann, D., Li, P. (eds) Advances in Computer Graphics. CGI 2024. Lecture Notes in Computer Science, vol 15339. Springer, Cham. https://doi.org/10.1007/978-3-031-82021-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-82021-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-82020-5
Online ISBN: 978-3-031-82021-2
eBook Packages: Computer ScienceComputer Science (R0)