3D human pose and shape estimation with dense correspondence from a single depth image

Wang, Kangkan; Zhang, Guofeng; Yang, Jian

doi:10.1007/s00371-021-02339-4

3D human pose and shape estimation with dense correspondence from a single depth image

Original article
Published: 07 January 2022

Volume 39, pages 429–441, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

855 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

We propose a novel approach to estimate the 3D pose and shape of human bodies with dense correspondence from a single depth image. In contrast to most current 3D body model recovery methods from depth images that employ motion information of depth sequences to compute point correspondences, we reconstruct 3D human body models from a single depth image by combining the correspondence learning and the parametric model fitting. Specifically, a novel multi-view coarse-to-fine correspondence network is proposed by projecting a 3D template model into multi-view depth images. The proposed correspondence network can predict 2D flows of the input depth relative to each projected depth in a coarse-to-fine manner. The predicted multi-view flows are then aggregated to establish accurate dense point correspondences between the 3D template and the input depth with the known 3D-to-2D projection. Based on the learnt correspondences, the 3D human pose and shape represented by a parametric 3D body model are recovered through a model fitting method that incorporates an adversarial prior. We conduct extensive experiments on SURREAL, Human3.6M, DFAUST, and real depth data of human bodies. The experimental results demonstrate that the proposed method outperforms the state-of-the-art methods in terms of reconstruction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

Article 23 March 2023

3D Face Reconstruction in Deep Learning Era: A Survey

Article 10 January 2022

Deep learning-based 3D reconstruction: a survey

Article 28 January 2023

References

Newcombe, R., Fox, D., Seitz, S.: Dynamicfusion: reconstruction and tracking of non-rigid scenes in real-time. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Dou, M., Khamis, S., Degtyarev, Y., Davidson, P., Fanello, S., Kowdle, A., Escolano, S. O. Rhemann, C., Kim, D., Taylor, J., Kohli, P., Tankovich, V., Izadi, S.: Fusion4D: real-time performance capture of challenging scenes. In: ACM SIGGRAPH (2016)
Yu, T., Guo, K., Xu, F., Dong, Y., Su, Z., Zhao, J., Li, J., Dai, Q., Liu, Y.: Bodyfusion: real-time capture of human motion and surface geometry using a single depth camera. In: Proceedings of the IEEE International Conference on Computer Vision (October 2017)
Yu, T., Zheng, Z., Guo, K., Zhao, J., Dai, Q., Li, H., Pons-Moll, G., Liu, Y.: Doublefusion: real-time capture of human performances with inner body shapes from a single depth sensor. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Yu, T., Zheng, Z., Guo, K., Liu, P., Dai, Q., Liu, Y.: Function4D: real-time human volumetric capture from very sparse consumer RGBD sensors. In: CVPR (2021)
Bhatnagar, B.L., Sminchisescu, C., Theobalt, C., Pons-Moll, G.: Combining implicit function learning and parametric models for 3d human reconstruction. In: European Conference on Computer Vision (2020)
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: IEEE European Conference on Computer Vision, pp. 561–578 (2016)
Wei, L., Huang, Q., Ceylan, D., Vouga, E., Li, H.: Dense human body correspondences using convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Groueix, T., Fisher, M., Kim, V.G., Russell, B., Aubry, M.: 3D-CODED: 3D correspondences by deep deformation. In: European Conference on Computer Vision (2018)
Dosovitskiy, A., Fischer, P., Ilg, E., Häusser, P., Hazirbaş, C., Golkov, V.: FlowNet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (2015)
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (July 2017)
Ranjan, A., Black, M.J.: Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4161–4170 (2017)
Ranjan, A., Romero, J., Black, M.J.: Learning human optical flow. In: British Machine Vision Conference (2018)
Ye, G., Liu, Y., Hasler, N., Ji, X., Dai, Q., Theobalt, C.: Performance capture of interacting characters with handheld kinects. In: European Conference on Computer Vision, pp. 828–841 (October 2012)
Ye, M., Yang, R.: Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)
Guo, K., Xu, F., Wang, Y., Liu, Y., Dai, Q.: Robust non-rigid motion tracking and surface reconstruction using L0 regularization. In: Proceedings of the IEEE International Conference on Computer Vision (2015)
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. ACM Trans. Graph. 24, 408–416 (2005)
Article Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248 (2015)
Article Google Scholar
Zhang, C., Pujades, S., Black, M., Pons-Moll, G.: Detailed, accurate, human shape estimation from clothed 3D scan sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Pons-Moll, G., Pujades, S., Hu, S., Black, M.J.: ClothCap: seamless 4D clothing capture and retargeting. ACM Trans. Graph. (TOG), vol. 36, no. 4 (July 2017)
Dou, M., Taylor, J., Fuchs, H., Fitzgibbon, A., Izadi, S.: 3D scanning deformable objects with a single RGBD sensor. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Guo, K., Xu, F., Yu, T., Liu, X., Dai, Q., Liu, Y.: Real-time geometry, albedo and motion reconstruction using a single RGBD camera. ACM Trans. Graph. 36(3), 1–13 (2017)
Article Google Scholar
Innmann, M., Zollhöfer, M., Nießner, M., Theobalt, C., Stamminger, M.: Volumedeform: real-time volumetric non-rigid reconstruction. In: European Conference on Computer Vision, pp. 362–379 (2016)
Zheng, Z., Yu, T., Li, H., Guo, K., Dai, Q., Fang, L., Liu, Y.: Hybridfusion: real-time performance capture using a single depth sensor and sparse imus. In: European Conference on Computer Vision, pp. 384–400 (2018)
Chibane, J., Alldieck, T., Pons-Moll, G.: Implicit functions in feature space for 3D shape reconstruction and completion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Wang, K., Xie, J., Zhang, G., Liu, L., Yang, J.: Sequential 3D human pose and shape estimation from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7275–7284 (2020)
Güler, R.A., Trigeorgis, G., Antonakos, E., Snape, P., Zafeiriou, S., Kokkinos, I.: DenseReg: fully convolutional dense shape regression in-the-wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Güler, R. A., Neverova, N., Kokkinos, I.: Densepose: dense human pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7297–7306 (2018)
Zeng, W., Ouyang, W., Luo, P., Liu, W., Wang, X.: 3D human mesh regression with dense correspondence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Bhatnagar, B.L., Sminchisescu, C., Theobalt, C., Pons-Moll, G.: LoopReg: self-supervised learning of implicit surface correspondences, pose and shape for 3d human mesh registration. In: Advances in Neural Information Processing Systems (NeurIPS) (December 2020)
Weinzaepfel, P., Revaud, J., Harchaoui, Z., Schmid, C.: Deepflow: large displacement optical flow with deep matching. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1385–1392 (2013)
Yu, R., Saito, S., Li, H., Ceylan, D., Li, H.: Learning dense facial correspondences in unconstrained images. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Zhou, T., Krähenbühl, P., Aubry, M., Huang, Q., Efros, A.A.: Learning dense correspondence via 3D-guided cycle consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (June 2016)
Habermann, M., Xu, W., Zollhoefer, M., Pons-Moll, G., Theobalt, C.: LiveCap: real-time human performance capture from monocular video. ACM Trans. Graph. 38, 1–17 (2019)
Article Google Scholar
Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 459–468 (2018)
Varol, G., Ceylan, D., Russell, B., Yang, J., Yumer, E., Laptev, I., Schmid, C.: BodyNet: volumetric inference of 3D human body shapes. In: European Conference on Computer Vision (2018)
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131 (2018)
Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE International Conference on Computer Vision (2019)
Saito, S., Simon, T., Saragih, J., Joo, H.: PIFuHD: multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Zhu, H., Zuo, X., Wang, S., Cao, X., Yang, R.: Detailed human shape estimation from a single image by hierarchical mesh deformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: Deephuman: 3d human reconstruction from a single image. In: Proceedings of the IEEE International Conference on Computer Vision (2019)
Kanazawa, A., Zhang, J. Y., Felsen, P., Malik, J.: Learning 3D human dynamics from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Habermann, M., Xu, W., Zollhoefer, M., Pons-Moll, G., Theobalt, C.: Deepcap: monocular human performance capture using weak supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Peng, S., Zhang, Y., Xu, Y., Wang, Q., Shuai, Q., Bao, H., Zhou, X.: Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In: CVPR (2021)
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)
Graham, B.: Sparse 3D convolutional neural networks. In: British Machine Vision Conference (2015)
Wang, P.-S., Liu, Y., Guo, Y.-X., Sun, C.-Y., Tong, X.: O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Trans. Graph. 36(4), 1–11 (2017)
Google Scholar
Bai, S., Bai, X., Zhou, Z., Zhang, Z., Latecki, L.J.: GIFT: a real-time and scalable 3D shape search engine. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5023–5032 (2016)
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 945–953 (2015)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)
Klokov, R., Lempitsky, V.: Escape from cells: deep kd-networks for the recognition of 3D point cloud models. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on X-Transformed points. In: NeurIPS, pp. 828–838 (2018)
Zhao, H., Jiang, L., Fu, C.-W., Jia, J.: PointWeb: enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Wu, W., Qi, Z., Li, F.: PointConv: deep convolutional networks on 3D point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)
Loper, M., Mahmood, N., Black, M.J.: MoSh: motion and shape capture from sparse markers. In: ACM Transactions on Graphics, (Proceedings of SIGGRAPH Asia), vol. 33, no. 6, pp. 220:1–220:13 (November 2014)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference for Learning Representations (2015)
Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M.J., Laptev, I., Schmid, C.: Learning from synthetic humans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 109–117 (2017)
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)
Article Google Scholar
Bogo, F., Romero, J., Pons-Moll, G., Black, M.J.: Dynamic FAUST: registering human bodies in motion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (July 2017)
Donati, N., Sharma, A., Ovsjanikov, M.: Deep geometric functional maps: robust feature learning for shape correspondence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Zeng, Y., Qian, Y., Zhu, Z., Hou, J., Yuan, H., He, Y.: CorrNet3D: unsupervised end-to-end learning of dense correspondence for 3D point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2021)
Liang, Y., He, F., Zeng, X.: 3D mesh simplification with feature preservation based on whale optimization algorithm and differential evolution. Integr. Comput. Aided Eng. 27(4), 417–435 (2020)
Article Google Scholar
Li, S., Huang, S., Chen, S.: Crowdsourcing aggregation with deep Bayesian learning. Sci. China Inf. Sci. 64(3) (2021)

Download references

Funding

This work was funded by the Natural Science Foundation of China (No. 61602444). This work was also supported in part by the Fundamental Research Funds for the Central Universities (NJ2020023), in part by the Open Project Program of the State Key Laboratory of Virtual Reality Technology and Systems of Beihang University (No. VRLAB2021C03), in part by the Open Project Program of the State Key Laboratory of CAD&CG of Zhejiang University (Grant No. A2106), and in part by the Open Project Program of the State Key Laboratory of Novel Software Technology of Nanjing University (No. KFKT2021B19).

Author information

Authors and Affiliations

PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, and Jiangsu Key Lab of Image and Video Understanding for Social Security, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, People’s Republic of China
Kangkan Wang & Jian Yang
State Key Lab of CAD&CG, Zijingang Campus, Zhejiang University, Hangzhou, 310058, People’s Republic of China
Guofeng Zhang

Authors

Kangkan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guofeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kangkan Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, K., Zhang, G. & Yang, J. 3D human pose and shape estimation with dense correspondence from a single depth image. Vis Comput 39, 429–441 (2023). https://doi.org/10.1007/s00371-021-02339-4

Download citation

Accepted: 16 October 2021
Published: 07 January 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s00371-021-02339-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D human pose and shape estimation with dense correspondence from a single depth image

Abstract

Access this article

Similar content being viewed by others

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

3D Face Reconstruction in Deep Learning Era: A Survey

Deep learning-based 3D reconstruction: a survey

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

3D human pose and shape estimation with dense correspondence from a single depth image

Abstract

Access this article

Similar content being viewed by others

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

3D Face Reconstruction in Deep Learning Era: A Survey

Deep learning-based 3D reconstruction: a survey

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation