Skip to main content
Log in

Learning high resolution reservation for human pose estimation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The human pose estimation in images and videos is a challenging task in many applications. Most of the network structures used to estimate the pose only use the convolution feature of the last layer, which will cause the loss of information. In this paper, we propose a multi-scales fusion framework based on the hourglass network for the human pose estimation, which can effectively obtain sufficient information of different resolutions. In the process of extracting different resolution features, the network constantly complements the high resolution features. Additionally, we design the depth pyramid residual module to fuse different various scales features. The whole network is stacked by sub-networks. For applying in limited storage space better, we only use 2-stage stacked network. We test the network on standard benchmarks MPII dataset, our method achieves 88.9% PCKh score and improves the PCK score by 0.7%, compared with the original network. Our approach gains state-of-the-art results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Andriluka M, Pishchulin L, Gehler P, et al. (2014) 2d human pose estimation: new benchmark and state of the art analysis. Proc IEEE Conf Comput Vis Pattern Recognit 3686–3693

  2. Bastanfard A, Ataelahi E. (2016). An Improved Confidence-Based Boosting Face Recognition Algorithm Under Large Pose Variations

  3. Bastanfard A, Deramgozin MM (2016) Face Recognition Improvement in Angled Status Using Invasive Weed Optimization Algorithm And fuzzy System. First International Conference on New Research Achievements in Electrical and Computer Engineering in IEEE. IEEE, 2016/09/20

  4. Bourdev L, Malik J (2009) Poselets: body part detectors trained using 3D human pose annotations[C]. Int Conference Comput Vis, 1365–1372

  5. Bulat A, Kossaifi J, Tzimiropoulos G, et al. (2020) Toward fast and accurate human pose estimation via soft-gated skip connections[J]. arXiv preprint arXiv:2002.11098

  6. Cai X, Zhou W, Wu L et al (2015) Effective active skeleton representation for low latency human action recognition. IEEE Trans Multimed 18(2):141–154

    Article  Google Scholar 

  7. Carreira J, Agrawal P, Fragkiadaki K, et al. (2016) Human pose estimation with iterative error feedback. Proc IEEE Conf Comput Vis Pattern Recognit 4733–4742

  8. Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7103–7112

  9. Cheng B, Xiao B, Wang J, et al. (2020) HigherHRNet: scale-aware representation learning for bottom-up human pose estimation.[J]. arXiv: computer vision and Pattern Recognition

  10. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. InCVPR IEEE Computer Society Conference on. IEEE, 886–893

  11. Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vis 61(1):55–79

    Article  Google Scholar 

  12. Felzenszwalb PF, Mcallester D, Ramanan D, et al. (2008) A discriminatively trained, multiscale, deformable part model[C]. Comput Vis Patt Recogn, 1–8

  13. Gao S, Wang J, Lu H, et al. (2020) Pose-guided visible part matching for occluded person ReID. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 11744–11752

  14. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. p. 770-778

  15. He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. Proc IEEE Conf Comput Vis Pattern Recognit 770–778

  16. Hernandezvela A, Sclaroff S, Escalera S et al (2016) Poselet-based contextual rescoring for human pose estimation via pictorial structures[J]. Int J Comput Vis 118(1):49–64

    Article  MathSciNet  Google Scholar 

  17. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition. p. 2261–2269

  18. Ke L, Chang M C, Qi H, et al. (2018) Multi-scale structure-aware network for human pose estimation. Proceedings of the European Conference on Computer Vision. 713–728

  19. Li C, Zhang B, Chen C, Ye Q, Han J, Guo G, Ji R (2019) Deep manifold structure transfer for action recognition[J]. IEEE Trans Image Process 28(9):4646–4658

    Article  MathSciNet  Google Scholar 

  20. Martinez G H, Raaj Y, Idrees H, et al. (2019) Single-Network Whole-Body Pose Estimation[C]. international conference on computer vision, 6982–6991

  21. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision. Springer, Cham, p. 483–499

  22. Papandreou G, Zhu T, Kanazawa N et al (2017) Towards accurate multi-person pose estimation in the wild. Proc IEEE Conf Comput Vis Pattern Recognit:4903–4911

  23. Pishchulin L, Andriluka M, Gehler PV et al. Poselet conditioned pictorial structures[C]. Comput Vis Patt Recogn, 2013: 588–595

  24. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  25. Sun X, Xiao B, Wei F et al. (2018) Integral human pose regression. Proceedings of the European Conference on Computer Vision. 529–545

  26. Sun K, Xiao B, Liu D, et al. (2019) Deep high-resolution representation learning for human pose estimation. arXiv preprint arXiv:1902.09212

  27. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D ... Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. p. 1–9

  28. Szegedy C, Vanhoucke V, Ioffe S et al (2016) Rethinking the inception architecture for computer vision. Proc IEEE Conf Comput Vis Pattern Recognit:2818–2826

  29. Tang Z, Peng X, Geng S, et al. (2018) Quantized densely connected u-nets for efficient landmark localization. Proceedings of the European Conference on Computer Vision 339–354

  30. Tian TP, Sclaroff S (2010) Fast globally optimal 2d human detection with loopy graph models. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 81–88

  31. Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems. 1799-1807

  32. Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 1653-1660

  33. Wang C, Wang Y, Yuille AL (2013) An approach to pose-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. p. 915-922

  34. Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Pishchulin machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4724-4732

  35. Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. Proceedings of the European Conference on Computer Vision. 466–481

  36. Xie C, Li C, Zhang B, et al. (2018) Memory Attention Networks for Skeleton-based Action Recognition[J]. arXiv: Computer Vision and Pattern Recognition

  37. Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In CVPR IEEE, 1385–1392

  38. Yang W, Li S, Ouyang W, Li H, Wang X (2017) Learning feature pyramids for human pose estimation. In IEEE International Conference on Computer Vision. 1290–1299

  39. Zhang J, Shum HP, Han J et al (2018) Action recognition from arbitrary views using transferable dictionary learning[J]. IEEE Trans Image Process 27(10):4709–4723

    Article  MathSciNet  Google Scholar 

  40. Zhao R, Xu W, Su H, et al. (2019) Bayesian hierarchical dynamic model for human action recognition. Proc IEEE Conf Comput Vis Pattern Recognit 7733–7742

Download references

Funding

This study was funded by the NEPU Natural Science Foundation under Grant No. 2017P Y ZL − 05, JY CX CX06 2018 and JY CX JG06 2018.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongbo Bi.

Ethics declarations

Conflict of interest

Author Bingkun Gao declares that he has no conflict of interest. Author Ke Ma declares that she has no conflict of interest. Author Hongbo Bi declares that he has no conflict of interest. Author Ling Wang declares that she has no conflict of interest. Author Chenlei Wu declares that he has no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, B., Ma, K., Bi, H. et al. Learning high resolution reservation for human pose estimation. Multimed Tools Appl 80, 29251–29265 (2021). https://doi.org/10.1007/s11042-021-10731-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-10731-4

Keywords

Navigation