Skip to main content
Log in

Learning Enriched Global Context Information for Human Pose Estimation

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

A classic method for human pose estimation is to generate a heatmap centered on each keypoint location as a kind of small-region representation for supervised learning. The networks of such a method need to learn multi-scale feature maps and global context information under different receptive fields. For human pose estimation, a larger receptive field could learn more human body structure information, which contains more global and higher semantic features, and learn more long-distance keypoint connection features. However, as a local operation, convolution has defects in capturing the global relationship, and it is difficult to consider the surrounding pixel information fully. Furthermore, the resolution of detected results for small-region representation is generally very low, which limits the accuracy of keypoint detection. In this paper, we propose a switchable convolution operation that can adaptively select a larger receptive field, and obtain richer global context information. In addition, we utilize a dual attention unit to reconstruct the feature map to enhance gainful features and further enhance the structural information between human body parts in the heatmap. Experiments on the COCO and MPII datasets prove that our method can effectively improve the performance for human pose estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032

    Article  MathSciNet  Google Scholar 

  2. Luvizon DC, Picard D, Tabia H (2018) 2D/3D pose estimation and action recognition using multitask deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5137–5146

  3. Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 143–152

  4. Xiu Y, Li J, Wang H, Fang Y, Lu C (2018) Pose flow: efficient online pose tracking arXiv preprint arXiv:1802.00977

  5. Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the european conference on computer vision (ECCV), pp 466–481

  6. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  7. Cheng B, Xiao B, Wang J, Shi H, Huang TS, Zhang L (2020) Higherhrnet: scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5386–5395

  8. Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4724–4732

  9. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision (ECCV). Springer, Cham, pp 483–499

  10. Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5693–5703

  11. Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision. Springer, Cham, pp 740–755

  12. Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3686–3693

  13. Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660

  14. Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, pp. 1799–1807

  15. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  16. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 234–241

  17. Zhang Z, Tang J, Wu G (2019) Simple and lightweight human pose estimation. arXiv preprint arXiv:1911.10346

  18. Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi-context attention for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1831–1840

  19. Ke L, Chang MC, Qi H, Lyu S (2018) Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 713–728

  20. Zheng G, Wang S, Yang B (2020) Hierarchical structure correlation inference for pose estimation. Neurocomputing, pp 186–197

  21. Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal MachIntell

  22. Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4903–4911

  23. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497

  24. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  25. Güler RA, Neverova N, Kokkinos I (2018) Densepose: Dense human pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7297–7306

  26. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125

  27. Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103–7112

  28. Li W, Wang Z, Yin B, et al (2019) Rethinking on multi-stage networks for human pose estimation. arXiv preprint arXiv:1901.00148

  29. Su K, Yu D, Xu Z, Geng X, Wang C (2019) Multi-person pose estimation with enhanced channel-wise and spatial information. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5674–5682

  30. Fang HS, Xie S, Tai YW, Lu C (2017) Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 2334–2343

  31. Li J, Wang C, Zhu H, Mao Y, Fang HS, Lu C (2019) Crowdpose: efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10863–10872

  32. Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler PV, Schiele B (2016) Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4929–4937

  33. Insafutdinov E, Pishchulin L, Andres B, Andrilkula M, Schiele B (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: European conference on computer vision (ECCV). Springer, Cham, pp 34–50

  34. Cao Z, Hidalgo G, Simon T, Wei SE, Sheikh Y (2019) OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE Trans Pattern Anal Mach Intell 43(1):172–186

    Article  Google Scholar 

  35. Hidalgo G, Raaj Y, Idrees H, Xiang D, Joo H, Simon T, Sheikh Y (2019) Single-network whole-body pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 6982–6991

  36. Osokin D (2018) Real-time 2D multi-person pose estimation on CPU: Lightweight OpenPose. arXiv preprint arXiv:1811.12004

  37. Papandreou G, Zhu T, Chen LC, Gidaris S, Tompson J, Murphy K (2018) Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In: Proceedings of the European conference on computer vision (ECCV), pp 269–286

  38. Kreiss S, Bertoni L, Alahi A (2019) Pifpaf: Composite fields for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11977–11986

  39. Nie X, Feng J, Zhang J, Yan S (2019) Single-stage multi-person pose machines. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6951–6960

  40. Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:2006.02334

  41. Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  42. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

  43. Zhang F, Zhu X, Dai H, Ye M, Zhu C (2020) Distribution-aware coordinate representation for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7093–7102

  44. Newell A, Huang Z, Deng J (2016) Associative embedding: End-to-end learning for joint detection and grouping. arXiv preprint arXiv:1611.05424

  45. Kocabas M, Karagoz S, Akbas E (2018) Multiposenet: Fast multi-person pose estimation using pose residual network. In: Proceedings of the european conference on computer vision (ECCV), pp 417–433

  46. Sun X, Xiao B, Wei F, Liang S, Wei Y (2018) Integral human pose regression. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 529–545

  47. Huang S, Gong M, Tao D (2017) A coarse-fine network for keypoint localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3028–3037

  48. Johnson S, Everingham M (2010) Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation. In bmvc

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (NSFC) under Grant 61771299.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangyang Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, R., Liu, R., Li, Y. et al. Learning Enriched Global Context Information for Human Pose Estimation. Neural Process Lett 54, 1663–1678 (2022). https://doi.org/10.1007/s11063-021-10699-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-021-10699-0

Keywords

Navigation