SA-GCN: structure-aware graph convolutional networks for crowd pose estimation

Wang, Jia; Luo, Yanmin

doi:10.1007/s11227-023-05055-z

SA-GCN: structure-aware graph convolutional networks for crowd pose estimation

Published: 30 January 2023

Volume 79, pages 10046–10062, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

340 Accesses
Explore all metrics

Abstract

In this paper, we aim to capture the structure dependency of human joints and improve the localization accuracy of invisible joints. We propose a novel framework: Structure-aware Graph Convolutional Network (SA-GCN) for crowd pose estimation, which can be divided into two components: Sample Pose Net and Refined Pose Net. Firstly, Sample Pose Net includes a multi-scale feature fusion module, which uses multi-scale features to capture small-scale characters and extract the global “rough” pose as much as possible. Secondly, channel and spatial attention are injected into the multi-scale feature fusion module to strengthen the characteristics of small-scale characters. Finally, graph convolution obtained by the disentangled several parallel sub-graph convolution modules in Refined Pose Net. Global and structural advantages of graph convolution are more conducive to predicting difficult points in sample Pose. In addition, SA-GCN obtains lower parameters compared with the popular pose estimation networks. By which, we apply a novel framework SA-GCN to get feature maps for proposal and refinement, respectively. Comprehensive experiments demonstrate that the proposed method achieves superior pose estimation results on two benchmark datasets, CrowdPose and MSCOCO. Moreover, SA-GCN significantly outperforms state-of-the-art performance on CrowdPose and almost always generates plausible human pose predictions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual Graph Networks for Pose Estimation in Crowded Scenes

Article 30 September 2023

3D Human Pose Estimation Using Improved Semantic Graph Convolutional Based on Fusing Non-local Neural Network and Multi-Head Attention

Article 13 April 2024

3D human pose estimation with multi-scale graph convolution and hierarchical body pooling

Article 28 May 2021

Availability of data and materials

The data comes from the common dataset.

Code availability

Custom code.

References

Brasó G, Kister N, Leal-Taixé L (2021) The center of attention: Center-keypoint grouping via attention for multi-person pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 11,853–11,863
Bulat A, Tzimiropoulos G (2016) Human pose estimation via convolutional part heatmap regression. In: European Conference on Computer Vision. Springer, pp 717–732
Cao Z, Simon T, Wei SE et al (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7291–7299
Chen Y, Wang Z, Peng Y et al (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7103–7112
Chen Y, Rohrbach M, Yan Z et al (2019) Graph-based global reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 433–442
Cheng B, Xiao B, Wang J et al (2020) Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5386–5395
Chu X, Yang W, Ouyang W et al (2017) Multi-context attention for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1831–1840
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inf Process Syst 29
Fang HS, Xie S, Tai YW et al (2017) Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2334–2343
Fieraru M, Khoreva A, Pishchulin L et al (2018) Learning to refine human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 205–214
Golda T, Kalb T, Schumann A et al (2019) Human pose estimation for real-world crowded scenarios. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, pp 1–8
He K, Gkioxari G, Dollár P et al (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
Insafutdinov E, Pishchulin L, Andres B et al (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: European Conference on Computer Vision. Springer, pp 34–50
Jiao L, Wu H, Wang H et al (2019) Multi-scale semantic image inpainting with residual learning and gan. Neurocomputing 331:199–212
Article Google Scholar
Jin S, Liu W, Xie E et al (2020) Differentiable hierarchical graph grouping for multi-person pose estimation. In: European Conference on Computer Vision. Springer, pp 718–734
Ke L, Chang MC, Qi H et al (2018) Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 713–728
Khirodkar R, Chari V, Agrawal A et al (2021) Multi-instance pose networks: rethinking top-down pose estimation, pp 3122–3131
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. Preprint arXiv:1412.6980
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. Preprint arXiv:1609.02907
Li J, Wang C, Zhu H et al (2019) Crowdpose: efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10,863–10,872
Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European Conference on Computer Vision. Springer, pp 740–755
Liu C, Yuen PC (2010) Human action recognition using boosted eigenactions. Image Vis Comput 28(5):825–835
Article Google Scholar
Luo Y, Xu Z, Liu P et al (2018) Multi-person pose estimation via multi-layer fractal network and joints kinship pattern. IEEE Trans Image Process 28(1):142–155
Article MATH MathSciNet Google Scholar
Luo Y, Ou Z, Wan T et al (2022) Fastnet: fast high-resolution network for human pose estimation. Image Vis Comput:104390
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision. Springer, pp 483–499
Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. Adv Neural Inf Process Syst 30
Ou Z, Luo Y, Chen J et al (2022) Srfnet: selective receptive field network for human pose estimation. J Supercomput 78(1):691–711
Article Google Scholar
Papandreou G, Zhu T, Kanazawa N et al (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4903–4911
Paszke A, Gross S, Chintala S et al (2017) Automatic differentiation in pytorch
Pishchulin L, Insafutdinov E, Tang S et al (2016) Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4929–4937
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. Preprint arXiv:1804.02767
Sun K, Xiao B, Liu D et al (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5693–5703
Tang W, Yu P, Wu Y (2018) Deeply learned compositional models for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 190–206
Wan T, Luo Y, Zhang Z et al (2022) Tsnet: tree structure network for human pose estimation. Signal Image Video Process 16(2):551–558
Article Google Scholar
Wei SE, Ramakrishna V, Kanade T et al (2016) Convolutional pose machines. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 4724–4732
Woo S, Park J, Lee JY et al (2018) Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 466–481
Yang W, Li S, Ouyang W et al (2017) Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1281–1290
Zhang Z, Luo Y, Gou J (2021) Double anchor embedding for accurate multi-person 2d pose estimation. Image Vis Comput 111(104):198
Google Scholar
Zhao L, Peng X, Tian Y et al (2019) Semantic graph convolutional networks for 3d human pose regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3425–3435
Zhu Y, Ma C, Du J (2019) Rotated cascade r-cnn: a shape robust detector with coordinate regression. Pattern Recogn 96(106):964
Google Scholar

Download references

Acknowledgements

This work was supported by Natural Science Foundation of Fujian Province, China under Grant 2020J01082.

Funding

Natural Science Foundation of Fujian Province, China under Grant 2020J01082.

Author information

Authors and Affiliations

College of Computer Science and Technology, Huaqiao University, Xiamen, 361021, People’s Republic of China
Jia Wang & Yanmin Luo
Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen, 361021, People’s Republic of China
Jia Wang & Yanmin Luo

Authors

Jia Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yanmin Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanmin Luo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, J., Luo, Y. SA-GCN: structure-aware graph convolutional networks for crowd pose estimation. J Supercomput 79, 10046–10062 (2023). https://doi.org/10.1007/s11227-023-05055-z

Download citation

Accepted: 10 January 2023
Published: 30 January 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11227-023-05055-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SA-GCN: structure-aware graph convolutional networks for crowd pose estimation

Abstract

Access this article

Similar content being viewed by others

Dual Graph Networks for Pose Estimation in Crowded Scenes

3D Human Pose Estimation Using Improved Semantic Graph Convolutional Based on Fusing Non-local Neural Network and Multi-Head Attention

3D human pose estimation with multi-scale graph convolution and hierarchical body pooling

Availability of data and materials

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SA-GCN: structure-aware graph convolutional networks for crowd pose estimation

Abstract

Access this article

Similar content being viewed by others

Dual Graph Networks for Pose Estimation in Crowded Scenes

3D Human Pose Estimation Using Improved Semantic Graph Convolutional Based on Fusing Non-local Neural Network and Multi-Head Attention

3D human pose estimation with multi-scale graph convolution and hierarchical body pooling

Availability of data and materials

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation