3D real-time human reconstruction with a single RGBD camera

Lu, Yang; Yu, Han; Ni, Wei; Song, Liang

doi:10.1007/s10489-022-03969-4

3D real-time human reconstruction with a single RGBD camera

Published: 02 August 2022

Volume 53, pages 8735–8745, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yang Lu¹,
Han Yu¹,
Wei Ni² &
…
Liang Song¹

6925 Accesses
19 Citations
4 Altmetric
Explore all metrics

Abstract

3D human reconstruction is an important technology connecting the real world and the virtual world, but most of previous work needs expensive computing resources, making it difficult in real-time scenarios. We propose a lightweight human body reconstruction system based on parametric model, which employs only one RGBD camera as input. To generate a human model end to end, we build a fast and lightweight deep-learning network named Fast Body Net (FBN). The network pays more attention on the face and hands to enrich the local details. Additionally, we train a denoising auto-encoder to reduce unreasonable states of human model. Due to the lack of human dataset based on RGBD images, we propose an Indoor-Human dataset to train the network, which contains a total of 2500 frames of action data of five actors collected by Azure Kinect camera. Depth images avoid using RGB to extract depth features, which makes FBN lightweight and high-speed in reconstructing parametric human model. Qualitative and quantitative analysis on experimental results show that our method can improve at least 57% in efficiency with similar accuracy, as compared to state-of-the-art methods. Through our study, it is also demonstrated that consumer-grade RGBD cameras can provide great applications in real-time display and interaction for virtual reality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A RGB-D based instant body-scanning solution for compact box installation

Depth-Based Privileged Information for Boosting 3D Human Pose Estimation on RGB

Free-Viewpoint RGB-D Human Performance Capture and Rendering

References

Cao Z, Hidalgo G, Simon T, Wei SE, Sheikh Y (2019) Openpose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell 43(1):172–186
Article Google Scholar
Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1302–1310 Honolulu. https://doi.org/10.1109/CVPR.2017.143 https://doi.org/10.1109/CVPR.2017.143. http://ieeexplore.ieee.org/document/8099626/
Choi H, Moon G, Chang JY, Lee KM (2021) Beyond static features for temporally consistent 3d human pose and shape from a video. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1964–1973
Choutas V, Pavlakos G, Bolkart T, Tzionas D, Black MJ (2020) Monocular expressive body regression through body-driven attention. In: European conference on computer vision. Springer, pp 20–40
Dai A, Nießner M, Zollhö fer M, Izadi S, Theobalt C (2017) Bundlefusion: real-time globally consistent 3d reconstruction using on-the-fly surface reintegration. ACM Trans Graph (ToG) 36(4):1
Dou M, Davidson P, Fanello SR, Khamis S, Kowdle A, Rhemann C, Tankovich V, Izadi S (2017) Motion2fusion: real-time volumetric performance capture. ACM Trans Graph 36(6):1–16. https://doi.org/10.1145/3130800.3130801
Article Google Scholar
Dou M, Khamis S, Degtyarev Y, Davidson P, Fanello SR, Kowdle A, Escolano SO, Rhemann C, Kim D, Taylor J, Kohli P, Tankovich V, Izadi S (2016) Fusion4D: real-time performance capture of challenging scenes. ACM Trans Grap 35(4):1–13. https://doi.org/10.1145/2897824.2925969
Article Google Scholar
Fang Q, Shuai Q, Dong J, Bao H, Zhou X (2021) Reconstructing 3d human pose by watching humans in the mirror. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12814–12823
Gabeur V, Franco JS, Martin X, Schmid C, Rogez G (2019) Moulding humans: non-parametric 3d human shape estimation from single images. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2232–2241
Gao W, Tedrake R (2019) Surfelwarp: Efficient non-volumetric single view dynamic reconstruction. arXiv:1904.13073
Innmann M, Zollhöfer M, Nießner M, Theobalt C, Stamminger M (2016). In: Leibe B., Matas J., Sebe N., Welling M. (eds) VolumeDeform: Real-Time Volumetric Non-rigid Reconstruction, vol 9912. Springer International Publishing, Cham, pp 362–379. DOI10.1007/978-3-319-46484-8_22 Series Title: Lecture Notes in Computer Science
Kanazawa A, Black MJ, Jacobs DW, Malik J (2018) End-to-end recovery of human shape and pose. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7122–7131 Salt Lake City. https://doi.org/10.1109/CVPR.2018.00744. https://ieeexplore.ieee.org/document/8578842/
Kolotouros N, Pavlakos G, Black MJ, Daniilidis K (2019) Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 2252–2261
Li J, Xu C, Chen Z, Bian S, Yang L, Lu C (2021) Hybrik: a hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3383–3393
Li Z, Yu T, Zheng Z, Guo K, Liu Y (2021) Posefusion: Pose-guided selective fusion for single-view human volumetric capture. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14162–14172
Lin K, Wang L, Liu Z (2021) End-to-end human pose and mesh reconstruction with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1954–1963
Loper M, Mahmood N, Romero J, Pons-Moll G, Black MJ (2015) SMPL: a skinned multi-person linear model. ACM Trans Graph 34(6):1–16. https://doi.org/10.1145/2816795.2818013
Article Google Scholar
von Marcard T, Henschel R, Black MJ, Rosenhahn B, Pons-Moll G (2018) Recovering accurate 3d human pose in the wild using imus and a moving camera. In: Proceedings of the European conference on computer vision (ECCV), pp 601–617
May D, Auer M (2021) Cross reality and data science in engineering - proceedings of the 17th international conference on remote engineering and virtual instrumentation. https://doi.org/10.1007/978-3-030-52575-0 https://doi.org/10.1007/978-3-030-52575-0
Mehta D, Rhodin H, Casas D, Fua P, Sotnychenko O, Xu W, Theobalt C (2017) Monocular 3d human pose estimation in the wild using improved cnn supervision. In: 2017 International conference on 3d vision (3DV). IEEE, pp‘506–516
Omran M, Lassner C, Pons-Moll G, Gehler P, Schiele B (2018) Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: 2018 international conference on 3D vision (3DV). IEEE, pp 484–494, Verona. https://doi.org/10.1109/3DV.2018.00062. https://ieeexplore.ieee.org/document/8491000/
Orts-Escolano S, Rhemann C, Fanello S, Chang W, Kowdle A, Degtyarev Y, Kim D, Davidson PL, Khamis S, Dou M, Tankovich V, Loop C, Cai Q, Chou PA, Mennicken S, Valentin J, Pradeep V, Wang S, Kang SB, Kohli P, Lutchyn Y, Keskin C, Izadi S (2016) Holoportation: virtual 3D teleportation in real-time. In: Proceedings of the 29th annual symposium on user interface software and technology. ACM, pp 741–754, Tokyo. https://doi.org/10.1145/2984511.2984517
Pavlakos G, Choutas V, Ghorbani N, Bolkart T, Osman AA, Tzionas D, Black MJ (2019) Expressive body capture: 3d hands, face, and body from a single image. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 10967–10977, USA. https://doi.org/10.1109/CVPR.2019.01123. ttps://ieeexplore.ieee.org/document/8953319/
Stiegler C (2020) No line on the horizon: virtual reality in digital ecosystems and the politics of immersive storytelling. In: Handbook of research on recent developments in internet activism and political participation. IGI Global, pp 53–68
Su Z, Xu L, Zheng Z, Yu T, Liu Y, Fang L (2020) Robustfusion: human volumetric capture with data-driven visual cues using a rgbd camera. In: European conference on computer vision. Springer, pp 246–264
Vasylevska K, Kaufmann H (2017) Compressing VR: fitting large virtual environments within limited physical space. IEEE Comput Graph Appl 37(5):85–91. https://doi.org/10.1109/MCG.2017.3621226. http://ieeexplore.ieee.org/document/8047456/
Article Google Scholar
Venkat A, Jinka SS, Sharma A (2018) Deep textured 3D reconstruction of human bodies. arXiv:1809.06547
Wan C, Probst T, Van Gool L, Yao A (2019) Self-supervised 3D hand pose estimation through training by fitting, p 10854. https://doi.org/10.1109/CVPR.2019.01111
Wang L, Zhao X, Yu T, Wang S, Liu Y (2020) Normalgan: learning detailed 3D human from a single rgb-d image. In: European conference on computer vision. Springer, pp 430–446
Wu F, Bao L, Chen Y, Ling Y, Song Y, Li S, Ngan K, Liu W (2019) MVF-Net: multi-view 3D face morphable model regression, p 968. https://doi.org/10.1109/CVPR.2019.00105
Xu H, Alldieck T, Sminchisescu C (2021) H-nerf: neural radiance fields for rendering and temporal reconstruction of humans in motion. Adv Neural Inf Process Syst, vol 34
Xu L, Su Z, Han L, Yu T, Liu Y, Fang L (2019) Unstructuredfusion: realtime 4d geometry and texture reconstruction using commercial rgbd cameras. IEEE Trans Pattern Anal Mach Intell 42 (10):2508–2522
Article Google Scholar
Ying L, Jiong Z, Wei S, Jingchun W, Xiaopeng G (2017) VREX: virtual reality education expansion could help to improve the class experience (VREX platform and community for VR based education), p 5. https://doi.org/10.1109/FIE.2017.8190660
Yu T, Zheng Z, Guo K, Liu P, Dai Q, Liu Y (2021) Function4d: real-time human volumetric capture from very sparse consumer rgbd sensors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5746–5756
Zhao T, Li S, Ngan KN (2018) Wu, f.: 3-d reconstruction of human body shape from a single commodity depth camera. IEEE Trans Multimedia 21(1):114–123
Article Google Scholar

Download references

Acknowledgements

This work is sponsored by Shanghai Key Research Laboratory of NSAI.

Author information

Authors and Affiliations

Academy of Engineering and Technology, Fudan University, Shanghai, China
Yang Lu, Han Yu & Liang Song
Shanghai Key Research Laboratory of NSAI, Shanghai, China
Wei Ni

Authors

Yang Lu
View author publications
Search author on:PubMed Google Scholar
Han Yu
View author publications
Search author on:PubMed Google Scholar
Wei Ni
View author publications
Search author on:PubMed Google Scholar
Liang Song
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Liang Song.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, Y., Yu, H., Ni, W. et al. 3D real-time human reconstruction with a single RGBD camera. Appl Intell 53, 8735–8745 (2023). https://doi.org/10.1007/s10489-022-03969-4

Download citation

Accepted: 05 July 2022
Published: 02 August 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s10489-022-03969-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D real-time human reconstruction with a single RGBD camera

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A RGB-D based instant body-scanning solution for compact box installation

Depth-Based Privileged Information for Boosting 3D Human Pose Estimation on RGB

Free-Viewpoint RGB-D Human Performance Capture and Rendering

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now