Skip to main content

Advertisement

Log in

3D human pose estimation by depth map

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

We present a new approach for 3D human pose estimation from a single image. State-of-the-art methods for 3D pose estimation have focused on predicting a full-body pose of a single person and have not given enough attention to the challenges in application: incompleteness of body pose and existence of multiple persons in image. In this paper, we introduce depth maps to solve these problems. Our approach predicts the depths of human pose over all spatial grids, which supports 3D poses estimation for incomplete or full bodies of multiple persons. The proposed depth maps encode depths of limbs rather than joints. They are more informative and reversibly convertible to depths of joints. The unified network is trained end to end using mixed 2D and 3D annotated samples. The experiments reveal that our algorithm achieves the state of the art on Human3.6M, the largest publicly available 3D pose estimation benchmark. Moreover, qualitative results have been reported to demonstrate the effectiveness of our approach for 3D pose estimation for incomplete human bodies and multiple persons.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://pytorch.org/.

  2. https://github.com/last-one/Pytorch_Realtime_Multi-Person_Pose_Estimation.

References

  1. Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: new benchmark and state of the art analysis. In: CVPR (2014)

  2. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: ECCV (2016)

  3. Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. In: ECCV (2016)

  4. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR (2017)

  5. Chen, C.H., Ramanan, D.: 3d human pose estimation = 2d pose estimation + matching. In: CVPR (2017)

  6. Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV (2015)

  7. Girshick, R., He, K., Gkioxari, G., Dollár, P.: Mask r-cnn. In: ICCV (2017)

  8. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2017). arXiv preprint arXiv:1704.04861

  9. Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: Deepercut: a deeper, stronger, and faster multiperson pose estimation model. In: ECCV (2016)

  10. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)

    Article  Google Scholar 

  11. Katircioglu, I., Tekin, B., Salzmann, M., Lepetit, V., Fua, P.: Learning latent representations of 3d human pose with deep neural networks. Int. J. Comput. Vis. 126, 1326–1341 (2018)

    Article  Google Scholar 

  12. Li, S., Chan, A.B.: 3d human pose estimation from monocular images with deep convolutional neural network. In: ACCV (2014)

  13. Li, S., Zhang, W., Chan, A.B.: Maximum-margin structured learning with deep networks for 3d human pose estimation. Int. J. Comput. Vis. 122(1), 149–168 (2017)

    Article  MathSciNet  Google Scholar 

  14. Liang, H., Yuan, J., Thalmann, D., Zhang, Z.: Model-based hand pose estimation via spatial-temporal hand parsing and 3d fingertip localization. Vis. Comput. 29(6–8), 837–848 (2013)

    Article  Google Scholar 

  15. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: ECCV (2014)

  16. Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: CVPR (2015)

  17. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: A skinned multi-person linear model. ACM Trans. Graph. 34(6), 248 (2015)

    Article  Google Scholar 

  18. Ma, C., Wang, A., Chen, G., Xu, C.: Hand joints-based gesture recognition for noisy dataset using nested interval unscented kalman filter with lstm network. Vis. Comput. 34(6–8), 1053–1063 (2018)

    Article  Google Scholar 

  19. Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3d human pose estimation. In: ICCV (2017)

  20. Mehta, D., Rhodin, H., Casas, D., Sotnychenko, O., Xu, W., Theobalt, C.: Monocular 3d human pose estimation using transfer learning and improved cnn supervision (2016). arXiv preprint arXiv:1611.09813

  21. Moreno-Noguer, F.: 3d human pose estimation from a single image via distance matrix regression. In: CVPR (2017)

  22. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: ECCV (2016)

  23. Nie, B.X., Wei, P., Zhu, S.C.: Monocular 3d human pose estimation by predicting depth on joints. In: ICCV (2017)

  24. Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3d human pose. In: CVPR (2017)

  25. Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3d human pose and shape from a single color image. In: CVPR (2018)

  26. Peng, X., Tang, Z., Yang, F., Feris, R.S., Metaxas, D.: Jointly optimize data augmentation and network training: adversarial data augmentation in human pose estimation. In: CVPR (2018)

  27. Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: ICCV (2015)

  28. Popa, A., Zanfir, M., Sminchisescu, C.: Deep multitask architecture for integrated 2d and 3d human sensing. In: CVPR (2017)

  29. Rogez, G., Schmid, C.: Image-based synthesis for deep 3d human pose estimation. Int. J. Comput. Vis. 126(9), 993–1008 (2018)

    Article  Google Scholar 

  30. Rogez, G., Weinzaepfel, P., Schmid, C.: Lcr-net: Localization-classification-regression for human pose. In: CVPR (2017)

  31. Sanzari, M., Ntouskos, V., Pirri, F.: Bayesian image based 3d pose estimation. In: ECCV (2016)

  32. Sarafianos, N., Boteanu, B., Ionescu, B., Kakadiaris, I.A.: 3d human pose estimation: a review of the literature and analysis of covariates. Comput. Vis. Image Underst. 152, 1–20 (2016)

    Article  Google Scholar 

  33. Saxena, A., Sun, M., Ng, A.Y.: Learning 3-d scene structure from a single still image. In: ICCV (2007)

  34. Sigal, L., Isard, M., Haussecker, H., Black, M.J.: Loose-limbed people: estimating 3d human pose and motion using non-parametric belief propagation. Int. J. Comput. Vis. 98, 15–48 (2012)

    Article  MathSciNet  Google Scholar 

  35. Song, J., Wang, L., Gool, L.V., Hilliges, O.: Thin-slicing network: a deep structured model for pose estimation in videos. In: CVPR (2017)

  36. Tekin, B., Katircioglu, I., Salzmann, M., Lepetit, V., Fua, P.: Structured prediction of 3d human pose with deep neural networks (2016). arXiv preprint arXiv:1605.05180

  37. Tekin, B., Márquez-Neila, P., Salzmann, M., Fua, P.: Fusing 2d uncertainty and 3d cues for monocular body pose estimation (2016). arXiv preprint arXiv:1611.05708

  38. Tekin, B., Rozantsev, A., Lepetit, V., Fua, P.: Direct prediction of 3d body poses from motion compensated sequences. In: CVPR (2016)

  39. Tome, D., Russell, C., Agapito, L.: Lifting from the deep: convolutional 3d pose estimation from a single image. In: CVPR (2017)

  40. Wang, C., Wang, Y., Lin, Z., Yuille, A.: Robust 3d human pose estimation from single images or video sequences. IEEE Trans. Pattern Anal. Mach. Intell. 41, 1227–1241 (2018)

    Article  Google Scholar 

  41. Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)

  42. Yasin, H., Iqbal, U., Kruger, B., Weber, A., Gall, J.: A dual-source approach for 3d pose estimation from a single image. In: CVPR (2016)

  43. Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M.: Shape from shading: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 21(8), 690–706 (1999)

    Article  Google Scholar 

  44. Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3d human pose estimation in the wild: a weakly-supervised approach. In: ICCV (2017)

  45. Zhou, X., Sun, X., Zhang, W., Liang, S., Wei, Y.: Deep kinematic pose regression. In: ECCV (2016)

  46. Zhou, X., Zhu, M., Leonardos, S., Derpanis, K.G., Daniilidis, K.: Sparseness meets deepness: 3d human pose estimation from monocular video. In: CVPR (2016)

  47. Zhou, X., Zhu, M., Pavlakos, G., Leonardos, S., Derpanis, K.G., Daniilidis, K.: Monocap: monocular human motion capture using a cnn coupled with a geometric prior. IEEE Trans. Pattern Anal. Mach. Intell. 41, 901–914 (2018)

    Article  Google Scholar 

Download references

Funding

These studies of Jianzhai Wu, Dewen Hu, FengTao Xiang, Xingsheng Yuan and Jiongming Su are funded by the Natural Science Foundation of China (Grant Nos. 61603402, 91420302, 61603403, 61703417 and 61806212, respectively).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianzhai Wu.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, J., Hu, D., Xiang, F. et al. 3D human pose estimation by depth map. Vis Comput 36, 1401–1410 (2020). https://doi.org/10.1007/s00371-019-01740-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-019-01740-4

Keywords

Navigation