Skip to main content
Log in

3D human pose and shape estimation with dense correspondence from a single depth image

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

We propose a novel approach to estimate the 3D pose and shape of human bodies with dense correspondence from a single depth image. In contrast to most current 3D body model recovery methods from depth images that employ motion information of depth sequences to compute point correspondences, we reconstruct 3D human body models from a single depth image by combining the correspondence learning and the parametric model fitting. Specifically, a novel multi-view coarse-to-fine correspondence network is proposed by projecting a 3D template model into multi-view depth images. The proposed correspondence network can predict 2D flows of the input depth relative to each projected depth in a coarse-to-fine manner. The predicted multi-view flows are then aggregated to establish accurate dense point correspondences between the 3D template and the input depth with the known 3D-to-2D projection. Based on the learnt correspondences, the 3D human pose and shape represented by a parametric 3D body model are recovered through a model fitting method that incorporates an adversarial prior. We conduct extensive experiments on SURREAL, Human3.6M, DFAUST, and real depth data of human bodies. The experimental results demonstrate that the proposed method outperforms the state-of-the-art methods in terms of reconstruction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Newcombe, R., Fox, D., Seitz, S.: Dynamicfusion: reconstruction and tracking of non-rigid scenes in real-time. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

  2. Dou, M., Khamis, S., Degtyarev, Y., Davidson, P., Fanello, S., Kowdle, A., Escolano, S. O. Rhemann, C., Kim, D., Taylor, J., Kohli, P., Tankovich, V., Izadi, S.: Fusion4D: real-time performance capture of challenging scenes. In: ACM SIGGRAPH (2016)

  3. Yu, T., Guo, K., Xu, F., Dong, Y., Su, Z., Zhao, J., Li, J., Dai, Q., Liu, Y.: Bodyfusion: real-time capture of human motion and surface geometry using a single depth camera. In: Proceedings of the IEEE International Conference on Computer Vision (October 2017)

  4. Yu, T., Zheng, Z., Guo, K., Zhao, J., Dai, Q., Li, H., Pons-Moll, G., Liu, Y.: Doublefusion: real-time capture of human performances with inner body shapes from a single depth sensor. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

  5. Yu, T., Zheng, Z., Guo, K., Liu, P., Dai, Q., Liu, Y.: Function4D: real-time human volumetric capture from very sparse consumer RGBD sensors. In: CVPR (2021)

  6. Bhatnagar, B.L., Sminchisescu, C., Theobalt, C., Pons-Moll, G.: Combining implicit function learning and parametric models for 3d human reconstruction. In: European Conference on Computer Vision (2020)

  7. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: IEEE European Conference on Computer Vision, pp. 561–578 (2016)

  8. Wei, L., Huang, Q., Ceylan, D., Vouga, E., Li, H.: Dense human body correspondences using convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

  9. Groueix, T., Fisher, M., Kim, V.G., Russell, B., Aubry, M.: 3D-CODED: 3D correspondences by deep deformation. In: European Conference on Computer Vision (2018)

  10. Dosovitskiy, A., Fischer, P., Ilg, E., Häusser, P., Hazirbaş, C., Golkov, V.: FlowNet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (2015)

  11. Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (July 2017)

  12. Ranjan, A., Black, M.J.: Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4161–4170 (2017)

  13. Ranjan, A., Romero, J., Black, M.J.: Learning human optical flow. In: British Machine Vision Conference (2018)

  14. Ye, G., Liu, Y., Hasler, N., Ji, X., Dai, Q., Theobalt, C.: Performance capture of interacting characters with handheld kinects. In: European Conference on Computer Vision, pp. 828–841 (October 2012)

  15. Ye, M., Yang, R.: Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)

  16. Guo, K., Xu, F., Wang, Y., Liu, Y., Dai, Q.: Robust non-rigid motion tracking and surface reconstruction using L0 regularization. In: Proceedings of the IEEE International Conference on Computer Vision (2015)

  17. Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. ACM Trans. Graph. 24, 408–416 (2005)

    Article  Google Scholar 

  18. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248 (2015)

    Article  Google Scholar 

  19. Zhang, C., Pujades, S., Black, M., Pons-Moll, G.: Detailed, accurate, human shape estimation from clothed 3D scan sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

  20. Pons-Moll, G., Pujades, S., Hu, S., Black, M.J.: ClothCap: seamless 4D clothing capture and retargeting. ACM Trans. Graph. (TOG), vol. 36, no. 4 (July 2017)

  21. Dou, M., Taylor, J., Fuchs, H., Fitzgibbon, A., Izadi, S.: 3D scanning deformable objects with a single RGBD sensor. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

  22. Guo, K., Xu, F., Yu, T., Liu, X., Dai, Q., Liu, Y.: Real-time geometry, albedo and motion reconstruction using a single RGBD camera. ACM Trans. Graph. 36(3), 1–13 (2017)

    Article  Google Scholar 

  23. Innmann, M., Zollhöfer, M., Nießner, M., Theobalt, C., Stamminger, M.: Volumedeform: real-time volumetric non-rigid reconstruction. In: European Conference on Computer Vision, pp. 362–379 (2016)

  24. Zheng, Z., Yu, T., Li, H., Guo, K., Dai, Q., Fang, L., Liu, Y.: Hybridfusion: real-time performance capture using a single depth sensor and sparse imus. In: European Conference on Computer Vision, pp. 384–400 (2018)

  25. Chibane, J., Alldieck, T., Pons-Moll, G.: Implicit functions in feature space for 3D shape reconstruction and completion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

  26. Wang, K., Xie, J., Zhang, G., Liu, L., Yang, J.: Sequential 3D human pose and shape estimation from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7275–7284 (2020)

  27. Güler, R.A., Trigeorgis, G., Antonakos, E., Snape, P., Zafeiriou, S., Kokkinos, I.: DenseReg: fully convolutional dense shape regression in-the-wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

  28. Güler, R. A., Neverova, N., Kokkinos, I.: Densepose: dense human pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7297–7306 (2018)

  29. Zeng, W., Ouyang, W., Luo, P., Liu, W., Wang, X.: 3D human mesh regression with dense correspondence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

  30. Bhatnagar, B.L., Sminchisescu, C., Theobalt, C., Pons-Moll, G.: LoopReg: self-supervised learning of implicit surface correspondences, pose and shape for 3d human mesh registration. In: Advances in Neural Information Processing Systems (NeurIPS) (December 2020)

  31. Weinzaepfel, P., Revaud, J., Harchaoui, Z., Schmid, C.: Deepflow: large displacement optical flow with deep matching. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1385–1392 (2013)

  32. Yu, R., Saito, S., Li, H., Ceylan, D., Li, H.: Learning dense facial correspondences in unconstrained images. In: Proceedings of the IEEE International Conference on Computer Vision (2017)

  33. Zhou, T., Krähenbühl, P., Aubry, M., Huang, Q., Efros, A.A.: Learning dense correspondence via 3D-guided cycle consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (June 2016)

  34. Habermann, M., Xu, W., Zollhoefer, M., Pons-Moll, G., Theobalt, C.: LiveCap: real-time human performance capture from monocular video. ACM Trans. Graph. 38, 1–17 (2019)

    Article  Google Scholar 

  35. Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 459–468 (2018)

  36. Varol, G., Ceylan, D., Russell, B., Yang, J., Yumer, E., Laptev, I., Schmid, C.: BodyNet: volumetric inference of 3D human body shapes. In: European Conference on Computer Vision (2018)

  37. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131 (2018)

  38. Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

  39. Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE International Conference on Computer Vision (2019)

  40. Saito, S., Simon, T., Saragih, J., Joo, H.: PIFuHD: multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

  41. Zhu, H., Zuo, X., Wang, S., Cao, X., Yang, R.: Detailed human shape estimation from a single image by hierarchical mesh deformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

  42. Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: Deephuman: 3d human reconstruction from a single image. In: Proceedings of the IEEE International Conference on Computer Vision (2019)

  43. Kanazawa, A., Zhang, J. Y., Felsen, P., Malik, J.: Learning 3D human dynamics from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

  44. Habermann, M., Xu, W., Zollhoefer, M., Pons-Moll, G., Theobalt, C.: Deepcap: monocular human performance capture using weak supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

  45. Peng, S., Zhang, Y., Xu, Y., Wang, Q., Shuai, Q., Bao, H., Zhou, X.: Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In: CVPR (2021)

  46. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)

  47. Graham, B.: Sparse 3D convolutional neural networks. In: British Machine Vision Conference (2015)

  48. Wang, P.-S., Liu, Y., Guo, Y.-X., Sun, C.-Y., Tong, X.: O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Trans. Graph. 36(4), 1–11 (2017)

    Google Scholar 

  49. Bai, S., Bai, X., Zhou, Z., Zhang, Z., Latecki, L.J.: GIFT: a real-time and scalable 3D shape search engine. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5023–5032 (2016)

  50. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 945–953 (2015)

  51. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)

  52. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)

  53. Klokov, R., Lempitsky, V.: Escape from cells: deep kd-networks for the recognition of 3D point cloud models. In: Proceedings of the IEEE International Conference on Computer Vision (2017)

  54. Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on X-Transformed points. In: NeurIPS, pp. 828–838 (2018)

  55. Zhao, H., Jiang, L., Fu, C.-W., Jia, J.: PointWeb: enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

  56. Wu, W., Qi, Z., Li, F.: PointConv: deep convolutional networks on 3D point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

  57. Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)

  58. Loper, M., Mahmood, N., Black, M.J.: MoSh: motion and shape capture from sparse markers. In: ACM Transactions on Graphics, (Proceedings of SIGGRAPH Asia), vol. 33, no. 6, pp. 220:1–220:13 (November 2014)

  59. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference for Learning Representations (2015)

  60. Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M.J., Laptev, I., Schmid, C.: Learning from synthetic humans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 109–117 (2017)

  61. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)

    Article  Google Scholar 

  62. Bogo, F., Romero, J., Pons-Moll, G., Black, M.J.: Dynamic FAUST: registering human bodies in motion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (July 2017)

  63. Donati, N., Sharma, A., Ovsjanikov, M.: Deep geometric functional maps: robust feature learning for shape correspondence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

  64. Zeng, Y., Qian, Y., Zhu, Z., Hou, J., Yuan, H., He, Y.: CorrNet3D: unsupervised end-to-end learning of dense correspondence for 3D point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2021)

  65. Liang, Y., He, F., Zeng, X.: 3D mesh simplification with feature preservation based on whale optimization algorithm and differential evolution. Integr. Comput. Aided Eng. 27(4), 417–435 (2020)

    Article  Google Scholar 

  66. Li, S., Huang, S., Chen, S.: Crowdsourcing aggregation with deep Bayesian learning. Sci. China Inf. Sci. 64(3) (2021)

Download references

Funding

This work was funded by the Natural Science Foundation of China (No. 61602444). This work was also supported in part by the Fundamental Research Funds for the Central Universities (NJ2020023), in part by the Open Project Program of the State Key Laboratory of Virtual Reality Technology and Systems of Beihang University (No. VRLAB2021C03), in part by the Open Project Program of the State Key Laboratory of CAD&CG of Zhejiang University (Grant No. A2106), and in part by the Open Project Program of the State Key Laboratory of Novel Software Technology of Nanjing University (No. KFKT2021B19).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kangkan Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, K., Zhang, G. & Yang, J. 3D human pose and shape estimation with dense correspondence from a single depth image. Vis Comput 39, 429–441 (2023). https://doi.org/10.1007/s00371-021-02339-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02339-4

Keywords

Navigation