3D human pose estimation by depth map

Wu, Jianzhai; Hu, Dewen; Xiang, Fengtao; Yuan, Xingsheng; Su, Jiongming

doi:10.1007/s00371-019-01740-4

3D human pose estimation by depth map

Original Article
Published: 03 September 2019

Volume 36, pages 1401–1410, (2020)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Jianzhai Wu ORCID: orcid.org/0000-0001-8671-7210¹,
Dewen Hu¹,
Fengtao Xiang¹,
Xingsheng Yuan¹ &
…
Jiongming Su¹

1046 Accesses
8 Citations
Explore all metrics

Abstract

We present a new approach for 3D human pose estimation from a single image. State-of-the-art methods for 3D pose estimation have focused on predicting a full-body pose of a single person and have not given enough attention to the challenges in application: incompleteness of body pose and existence of multiple persons in image. In this paper, we introduce depth maps to solve these problems. Our approach predicts the depths of human pose over all spatial grids, which supports 3D poses estimation for incomplete or full bodies of multiple persons. The proposed depth maps encode depths of limbs rather than joints. They are more informative and reversibly convertible to depths of joints. The unified network is trained end to end using mixed 2D and 3D annotated samples. The experiments reveal that our algorithm achieves the state of the art on Human3.6M, the largest publicly available 3D pose estimation benchmark. Moreover, qualitative results have been reported to demonstrate the effectiveness of our approach for 3D pose estimation for incomplete human bodies and multiple persons.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D Human Pose Estimation with 2D Human Pose and Depthmap

A Sequential Approach to 3D Human Pose Estimation: Separation of Localization and Identification of Body Joints

HMOR: Hierarchical Multi-person Ordinal Relations for Monocular Multi-person 3D Pose Estimation

Notes

References

Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: new benchmark and state of the art analysis. In: CVPR (2014)
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: ECCV (2016)
Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. In: ECCV (2016)
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR (2017)
Chen, C.H., Ramanan, D.: 3d human pose estimation = 2d pose estimation + matching. In: CVPR (2017)
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV (2015)
Girshick, R., He, K., Gkioxari, G., Dollár, P.: Mask r-cnn. In: ICCV (2017)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2017). arXiv preprint arXiv:1704.04861
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: Deepercut: a deeper, stronger, and faster multiperson pose estimation model. In: ECCV (2016)
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)
Article Google Scholar
Katircioglu, I., Tekin, B., Salzmann, M., Lepetit, V., Fua, P.: Learning latent representations of 3d human pose with deep neural networks. Int. J. Comput. Vis. 126, 1326–1341 (2018)
Article Google Scholar
Li, S., Chan, A.B.: 3d human pose estimation from monocular images with deep convolutional neural network. In: ACCV (2014)
Li, S., Zhang, W., Chan, A.B.: Maximum-margin structured learning with deep networks for 3d human pose estimation. Int. J. Comput. Vis. 122(1), 149–168 (2017)
Article MathSciNet Google Scholar
Liang, H., Yuan, J., Thalmann, D., Zhang, Z.: Model-based hand pose estimation via spatial-temporal hand parsing and 3d fingertip localization. Vis. Comput. 29(6–8), 837–848 (2013)
Article Google Scholar
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: ECCV (2014)
Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: CVPR (2015)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: A skinned multi-person linear model. ACM Trans. Graph. 34(6), 248 (2015)
Article Google Scholar
Ma, C., Wang, A., Chen, G., Xu, C.: Hand joints-based gesture recognition for noisy dataset using nested interval unscented kalman filter with lstm network. Vis. Comput. 34(6–8), 1053–1063 (2018)
Article Google Scholar
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3d human pose estimation. In: ICCV (2017)
Mehta, D., Rhodin, H., Casas, D., Sotnychenko, O., Xu, W., Theobalt, C.: Monocular 3d human pose estimation using transfer learning and improved cnn supervision (2016). arXiv preprint arXiv:1611.09813
Moreno-Noguer, F.: 3d human pose estimation from a single image via distance matrix regression. In: CVPR (2017)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: ECCV (2016)
Nie, B.X., Wei, P., Zhu, S.C.: Monocular 3d human pose estimation by predicting depth on joints. In: ICCV (2017)
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3d human pose. In: CVPR (2017)
Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3d human pose and shape from a single color image. In: CVPR (2018)
Peng, X., Tang, Z., Yang, F., Feris, R.S., Metaxas, D.: Jointly optimize data augmentation and network training: adversarial data augmentation in human pose estimation. In: CVPR (2018)
Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: ICCV (2015)
Popa, A., Zanfir, M., Sminchisescu, C.: Deep multitask architecture for integrated 2d and 3d human sensing. In: CVPR (2017)
Rogez, G., Schmid, C.: Image-based synthesis for deep 3d human pose estimation. Int. J. Comput. Vis. 126(9), 993–1008 (2018)
Article Google Scholar
Rogez, G., Weinzaepfel, P., Schmid, C.: Lcr-net: Localization-classification-regression for human pose. In: CVPR (2017)
Sanzari, M., Ntouskos, V., Pirri, F.: Bayesian image based 3d pose estimation. In: ECCV (2016)
Sarafianos, N., Boteanu, B., Ionescu, B., Kakadiaris, I.A.: 3d human pose estimation: a review of the literature and analysis of covariates. Comput. Vis. Image Underst. 152, 1–20 (2016)
Article Google Scholar
Saxena, A., Sun, M., Ng, A.Y.: Learning 3-d scene structure from a single still image. In: ICCV (2007)
Sigal, L., Isard, M., Haussecker, H., Black, M.J.: Loose-limbed people: estimating 3d human pose and motion using non-parametric belief propagation. Int. J. Comput. Vis. 98, 15–48 (2012)
Article MathSciNet Google Scholar
Song, J., Wang, L., Gool, L.V., Hilliges, O.: Thin-slicing network: a deep structured model for pose estimation in videos. In: CVPR (2017)
Tekin, B., Katircioglu, I., Salzmann, M., Lepetit, V., Fua, P.: Structured prediction of 3d human pose with deep neural networks (2016). arXiv preprint arXiv:1605.05180
Tekin, B., Márquez-Neila, P., Salzmann, M., Fua, P.: Fusing 2d uncertainty and 3d cues for monocular body pose estimation (2016). arXiv preprint arXiv:1611.05708
Tekin, B., Rozantsev, A., Lepetit, V., Fua, P.: Direct prediction of 3d body poses from motion compensated sequences. In: CVPR (2016)
Tome, D., Russell, C., Agapito, L.: Lifting from the deep: convolutional 3d pose estimation from a single image. In: CVPR (2017)
Wang, C., Wang, Y., Lin, Z., Yuille, A.: Robust 3d human pose estimation from single images or video sequences. IEEE Trans. Pattern Anal. Mach. Intell. 41, 1227–1241 (2018)
Article Google Scholar
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)
Yasin, H., Iqbal, U., Kruger, B., Weber, A., Gall, J.: A dual-source approach for 3d pose estimation from a single image. In: CVPR (2016)
Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M.: Shape from shading: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 21(8), 690–706 (1999)
Article Google Scholar
Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3d human pose estimation in the wild: a weakly-supervised approach. In: ICCV (2017)
Zhou, X., Sun, X., Zhang, W., Liang, S., Wei, Y.: Deep kinematic pose regression. In: ECCV (2016)
Zhou, X., Zhu, M., Leonardos, S., Derpanis, K.G., Daniilidis, K.: Sparseness meets deepness: 3d human pose estimation from monocular video. In: CVPR (2016)
Zhou, X., Zhu, M., Pavlakos, G., Leonardos, S., Derpanis, K.G., Daniilidis, K.: Monocap: monocular human motion capture using a cnn coupled with a geometric prior. IEEE Trans. Pattern Anal. Mach. Intell. 41, 901–914 (2018)
Article Google Scholar

Download references

Funding

These studies of Jianzhai Wu, Dewen Hu, FengTao Xiang, Xingsheng Yuan and Jiongming Su are funded by the Natural Science Foundation of China (Grant Nos. 61603402, 91420302, 61603403, 61703417 and 61806212, respectively).

Author information

Authors and Affiliations

National University of Defense Technology, Changsha, 410073, Hunan, China
Jianzhai Wu, Dewen Hu, Fengtao Xiang, Xingsheng Yuan & Jiongming Su

Authors

Jianzhai Wu
View author publications
You can also search for this author in PubMed Google Scholar
Dewen Hu
View author publications
You can also search for this author in PubMed Google Scholar
Fengtao Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Xingsheng Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Jiongming Su
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianzhai Wu.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, J., Hu, D., Xiang, F. et al. 3D human pose estimation by depth map. Vis Comput 36, 1401–1410 (2020). https://doi.org/10.1007/s00371-019-01740-4

Download citation

Published: 03 September 2019
Issue Date: July 2020
DOI: https://doi.org/10.1007/s00371-019-01740-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D human pose estimation by depth map

Abstract

Access this article

Similar content being viewed by others

3D Human Pose Estimation with 2D Human Pose and Depthmap

A Sequential Approach to 3D Human Pose Estimation: Separation of Localization and Identification of Body Joints

HMOR: Hierarchical Multi-person Ordinal Relations for Monocular Multi-person 3D Pose Estimation

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

3D human pose estimation by depth map

Abstract

Access this article

Similar content being viewed by others

3D Human Pose Estimation with 2D Human Pose and Depthmap

A Sequential Approach to 3D Human Pose Estimation: Separation of Localization and Identification of Body Joints

HMOR: Hierarchical Multi-person Ordinal Relations for Monocular Multi-person 3D Pose Estimation

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation