Learning Enriched Global Context Information for Human Pose Estimation

Wang, Rui; Liu, Ruyi; Li, Yanping; Wang, Xiangyang

doi:10.1007/s11063-021-10699-0

Learning Enriched Global Context Information for Human Pose Estimation

Published: 27 January 2022

Volume 54, pages 1663–1678, (2022)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Rui Wang¹,
Ruyi Liu¹,
Yanping Li² &
…
Xiangyang Wang ORCID: orcid.org/0000-0003-1394-6068¹

456 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

A classic method for human pose estimation is to generate a heatmap centered on each keypoint location as a kind of small-region representation for supervised learning. The networks of such a method need to learn multi-scale feature maps and global context information under different receptive fields. For human pose estimation, a larger receptive field could learn more human body structure information, which contains more global and higher semantic features, and learn more long-distance keypoint connection features. However, as a local operation, convolution has defects in capturing the global relationship, and it is difficult to consider the surrounding pixel information fully. Furthermore, the resolution of detected results for small-region representation is generally very low, which limits the accuracy of keypoint detection. In this paper, we propose a switchable convolution operation that can adaptively select a larger receptive field, and obtain richer global context information. In addition, we utilize a dual attention unit to reconstruct the feature map to enhance gainful features and further enhance the structural information between human body parts in the heatmap. Experiments on the COCO and MPII datasets prove that our method can effectively improve the performance for human pose estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining detailed appearance and multi-scale representation: a structure-context complementary network for human pose estimation

Article 19 July 2022

A lightweight pose estimation network with multi-scale receptive field

Article 25 June 2023

Multi-scale Adaptive Structure Network for Human Pose Estimation from Color Images

References

Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032
Article MathSciNet Google Scholar
Luvizon DC, Picard D, Tabia H (2018) 2D/3D pose estimation and action recognition using multitask deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5137–5146
Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 143–152
Xiu Y, Li J, Wang H, Fang Y, Lu C (2018) Pose flow: efficient online pose tracking arXiv preprint arXiv:1802.00977
Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the european conference on computer vision (ECCV), pp 466–481
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Cheng B, Xiao B, Wang J, Shi H, Huang TS, Zhang L (2020) Higherhrnet: scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5386–5395
Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4724–4732
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision (ECCV). Springer, Cham, pp 483–499
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5693–5703
Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision. Springer, Cham, pp 740–755
Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3686–3693
Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660
Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, pp. 1799–1807
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 234–241
Zhang Z, Tang J, Wu G (2019) Simple and lightweight human pose estimation. arXiv preprint arXiv:1911.10346
Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi-context attention for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1831–1840
Ke L, Chang MC, Qi H, Lyu S (2018) Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 713–728
Zheng G, Wang S, Yang B (2020) Hierarchical structure correlation inference for pose estimation. Neurocomputing, pp 186–197
Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal MachIntell
Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4903–4911
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Güler RA, Neverova N, Kokkinos I (2018) Densepose: Dense human pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7297–7306
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125
Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103–7112
Li W, Wang Z, Yin B, et al (2019) Rethinking on multi-stage networks for human pose estimation. arXiv preprint arXiv:1901.00148
Su K, Yu D, Xu Z, Geng X, Wang C (2019) Multi-person pose estimation with enhanced channel-wise and spatial information. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5674–5682
Fang HS, Xie S, Tai YW, Lu C (2017) Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 2334–2343
Li J, Wang C, Zhu H, Mao Y, Fang HS, Lu C (2019) Crowdpose: efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10863–10872
Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler PV, Schiele B (2016) Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4929–4937
Insafutdinov E, Pishchulin L, Andres B, Andrilkula M, Schiele B (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: European conference on computer vision (ECCV). Springer, Cham, pp 34–50
Cao Z, Hidalgo G, Simon T, Wei SE, Sheikh Y (2019) OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE Trans Pattern Anal Mach Intell 43(1):172–186
Article Google Scholar
Hidalgo G, Raaj Y, Idrees H, Xiang D, Joo H, Simon T, Sheikh Y (2019) Single-network whole-body pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 6982–6991
Osokin D (2018) Real-time 2D multi-person pose estimation on CPU: Lightweight OpenPose. arXiv preprint arXiv:1811.12004
Papandreou G, Zhu T, Chen LC, Gidaris S, Tompson J, Murphy K (2018) Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In: Proceedings of the European conference on computer vision (ECCV), pp 269–286
Kreiss S, Bertoni L, Alahi A (2019) Pifpaf: Composite fields for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11977–11986
Nie X, Feng J, Zhang J, Yan S (2019) Single-stage multi-person pose machines. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6951–6960
Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:2006.02334
Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Zhang F, Zhu X, Dai H, Ye M, Zhu C (2020) Distribution-aware coordinate representation for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7093–7102
Newell A, Huang Z, Deng J (2016) Associative embedding: End-to-end learning for joint detection and grouping. arXiv preprint arXiv:1611.05424
Kocabas M, Karagoz S, Akbas E (2018) Multiposenet: Fast multi-person pose estimation using pose residual network. In: Proceedings of the european conference on computer vision (ECCV), pp 417–433
Sun X, Xiao B, Wei F, Liang S, Wei Y (2018) Integral human pose regression. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 529–545
Huang S, Gong M, Tao D (2017) A coarse-fine network for keypoint localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3028–3037
Johnson S, Everingham M (2010) Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation. In bmvc

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (NSFC) under Grant 61771299.

Author information

Authors and Affiliations

School of Communication and Information Engineering, Shanghai University, Shanghai, China
Rui Wang, Ruyi Liu & Xiangyang Wang
Shanghai University, Shanghai, China
Yanping Li

Authors

Rui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ruyi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yanping Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangyang Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, R., Liu, R., Li, Y. et al. Learning Enriched Global Context Information for Human Pose Estimation. Neural Process Lett 54, 1663–1678 (2022). https://doi.org/10.1007/s11063-021-10699-0

Download citation

Accepted: 17 November 2021
Published: 27 January 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11063-021-10699-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Enriched Global Context Information for Human Pose Estimation

Abstract

Access this article

Similar content being viewed by others

Combining detailed appearance and multi-scale representation: a structure-context complementary network for human pose estimation

A lightweight pose estimation network with multi-scale receptive field

Multi-scale Adaptive Structure Network for Human Pose Estimation from Color Images

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning Enriched Global Context Information for Human Pose Estimation

Abstract

Access this article

Similar content being viewed by others

Combining detailed appearance and multi-scale representation: a structure-context complementary network for human pose estimation

A lightweight pose estimation network with multi-scale receptive field

Multi-scale Adaptive Structure Network for Human Pose Estimation from Color Images

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation