Skip to main content

Advertisement

Log in

Position constrained network for 3D human pose estimation

  • Special Issue Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Human pose estimation is a challenging research task in the field of computer vision. The current mainstream works have made great progress in pose estimation, but these works still have weakness in two aspects: first, the feature extraction module is not competent for representation learning; second, the training process does not take fully advantage of the projection model from 3D space to 2D plane. In this work, we propose a human pose estimation framework which exploits 3D root coordinates as subordinate input to 2D joint coordinates to provide positional reference to the recovered 3D joint coordinates, and employs inner camera parameters to construct additional projection constraints for recovering 3D joint coordinates. Moreover, we enhance the feature learning through residual branch. We tested our method on two benchmark datasets for human pose estimation, and the experimental results show that the proposed model is superior to current mainstream algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Cai, Y., Ge, L., Liu, J., Cai, J., Cham, TJ., Yuan, J., halmann, TNM.: Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional net works. In: ICCV, pp. 2272–2281 (2019)

  2. Chen, CH., Ramanan, D.: 3D human pose estimation = 2D pose estimation + matching. In: CVPR, pp. 5759–5767 (2017)

  3. Chen, Y., Wang, Z., Peng, Y., Zhang, Z.: Cascaded pyramid network for multi-person pose estimation. In: CVPR, pp. 7103–7112 (2018)

  4. Chen, X., Lin, K., Liu, W., Qian, C., Lin, L.: Weakly-supervised discovery of geometry-aware representation for 3D HumanPose estimation. In: CVPR, pp. 10895–10904 (2019)

  5. Gupta, V.: Back to the future: joint aware temporal deep learning 3D human pose estimation. in arXiv preprint arXiv. 2020 (2020)

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

  7. Hossain, MRI., Little, JJ.: Exploiting temporal information for 3d pose estimation. In: ECCV (2018)

  8. Huang, K., Sui, TQ., Wu, H.: 3D human pose estimation with multi-scale graph convolution and hierarchical body pooling. In: Multimedia Systems (2021)

  9. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: Large scale datasets and predictive methods for 3D human sensing in natural environments. In: TPAMI, pp. 1325–1339 (2014)

  10. Iqbal, U., Molchanov, P., Kautz, J.: Weakly-supervised 3D human pose learning via multi-view images in the wild. In: CVPR, pp. 5243–5252 (2020)

  11. Jiang, H.: 3D human pose reconstruction using millions of exemplars. In: ICPR, pp. 1674–1677 (2010)

  12. Kanazawa, A., Black, MJ., Jacobs, DW., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR. pp. 7122–7131 (2018)

  13. Kanazawa, A., Zhang, JY., Felsen, P., Malik, J.: Learning 3d human dynamics from video. In: CVPR, pp. 5614–5623 (2019)

  14. Katircioglu, I., Tekin, B., Salzmann, M., Lepetit, V., Fua, P.: Learning latent representations of 3d human pose with deep neural networks. In: IJCV, pp. 1–16 (2018)

  15. Lee, K., Lee, I., Lee, S.: Propagating LSTM: 3d pose estimation based on joint interdependency. In: ECCV. pp. 119–135 (2018)

  16. Li, C., Lee, GH.: Generating multiple hypotheses for 3d human pose estimation with mixture density network. In: CVPR, pp. 9887–9895 (2019)

  17. Lin, M., Lin, L., Liang, X., Wang, K., Cheng, H.: Recurrent 3d pose sequence machines. In: CVPR, pp. 810–819 (2017)

  18. Lin, J., Lee, GH.: Trajectory space factorization for deep video-based 3d human pose estimation. In: arXiv preprint arXiv. 2019 (2019)

  19. Lin, K., Wang, L., Liu, Z.: End-to-end human pose and mesh reconstruction with transformers. In: arXiv preprint arXiv. 2020 (2020)

  20. Liu, R., Shen, J., Wang, H., Chen, C., Cheung, S., Asari, V.: Attention mechanism exploits temporal contexts: Real-time 3d human pose reconstruction. In: CVPR, pp. 5064–5073 (2020)

  21. Martinez, J., Hossain, R., Romero, J., Little, JJ.: A simple yet effective baseline for 3d human pose estimation. In: ICCV, pp. 2659–2668 (2017)

  22. Moreno-Noguer, F.: 3d human pose estimation from a single image via distance matrix regression. In: CVPR, pp. 2823–2832 (2017)

  23. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: ECCV, pp. 483–499 (2016)

  24. Pavlakos, G., Zhou, X., Derpanis, KG., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3d human pose. In: CVPR, pp. 1263–1272 (2017)

  25. Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3d human pose estimation in video with temporal convolutions and semi-supervised training. In: CVPR, pp. 7753–7762 (2019)

  26. Reddi, SJ., Kale, S., Kumar, S.: On the convergence of Adam and beyond. In: ICLR (2018)

  27. Sigal, L., Balan, AO., Black, MJ.: HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. In: IJCV (2010)

  28. Skakov, K., Burkov, E., Lempitsky, V., Malkov, Y.: Learnable Triangulation of Human Pose. In: CVPR. pp. 7718–7727 (2019)

  29. Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: ECCV, pp. 529–545 (2018)

  30. Tekin, B., Katircioglu, I., Salzmann, M., Lepetit, V., Fua, P.: Structured prediction of 3d human pose with deep neural networks. In: BMVC (2016)

  31. Tekin, B., Rozantsev, A., Lepetit, V., Fua, P.: Direct prediction of 3d body poses from motion compensated sequences. In: CVPR, pp. 991–1000 (2016)

  32. Tekin, B., Marquez-Neila, P., Salzmann, M., Fua, P.: Learning to fuse 2d and 3d image cues for monocular body pose estimation. In: ICCV, pp. 3941–3950 (2017)

  33. Wang, Z., Wei, D., Hu, X., Luo, Y.: Human skeleton mutual learning for person reidentification. In: Neurocomputing, pp. 309–323 (2020)

  34. Wang, J., Yan, S., Xiong, Y., Lin, D.: Motion guided 3d pose estimation from videos. In: ECCV, pp. 764–780 (2020)

  35. Xie, R., Wang, C., Wang, Y.: MetaFuse: A Pre-trained Fusion Model for Human Pose Estimation. In: CVPR, pp. 13686–13695 (2020)

  36. Xu, T., Takano, W.: Graph Stacked Hourglass Networks for 3D Human Pose Estimation. In: CVPR, pp. 16105–16114 (2021)

  37. Yang, Y., Deng, C., Tao, D., Zhang, S., Liu, W., Gao, X.: Latent max-margin multitask learning with skelets for 3-d action recognition. In: IEEE Transactions on Cybernetics, pp. 439–448 (2017)

  38. Yeh, R., Hu, Y., Schwing, A.: Chirality nets for human pose regression. In: NeurIPS (2019)

  39. Yu, J., Rui, Y., Chen, B.: Exploiting Click Constraints and Multi-view Features for Image Re-ranking. In: IEEE Transactions on Multimedia, pp. 159–168 (2013)

  40. Yu, J., Rui, Y., Tao, D.: Click Prediction for Web Image Reranking Using Multimodal Sparse Coding. In: IEEE Transactions on Image Processing, pp. 2019–2032 (2014)

  41. Yu, J., Tan, M., Zhang, H., Tao, D., Rui, Y.: Hierarchical deep click feature prediction for fine-grained image recognition. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–14 (2019)

  42. Yu, Z., Yu, J., Cui, Y., Tao, D., Tian, Q.: Deep modular co-attention networks for visual question answering. In CVPR, pp. 6274–6283 (2019)

  43. Zhang, Z., Wang, C., Qin, W., Zeng, W.: Fusing wearable IMUs with multi-view images for human pose estimation: a geometric approach. In: CVPR, pp. 2200–2209 (2020)

  44. Zhu, J., Zou, W., Zhu, Z., Hu, Y.: Convolutional relation network for skeleton-based action recognition. In: Neurocomputing, pp. 109–117 (2019)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant 61836002, Grant 62125201, Grant 62020106007, Grant 62002314 and Grant 61972361

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Yu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by the National Natural Science Foundation of China under Grant 61836002, Grant 62125201, Grant 62020106007, Grant 62002314 and Grant 61972361.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, X., Yu, J. & Zhang, J. Position constrained network for 3D human pose estimation. Multimedia Systems 29, 459–468 (2023). https://doi.org/10.1007/s00530-021-00880-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-021-00880-9

Keywords

Navigation