Skip to main content
Log in

SA-GCN: structure-aware graph convolutional networks for crowd pose estimation

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In this paper, we aim to capture the structure dependency of human joints and improve the localization accuracy of invisible joints. We propose a novel framework: Structure-aware Graph Convolutional Network (SA-GCN) for crowd pose estimation, which can be divided into two components: Sample Pose Net and Refined Pose Net. Firstly, Sample Pose Net includes a multi-scale feature fusion module, which uses multi-scale features to capture small-scale characters and extract the global “rough” pose as much as possible. Secondly, channel and spatial attention are injected into the multi-scale feature fusion module to strengthen the characteristics of small-scale characters. Finally, graph convolution obtained by the disentangled several parallel sub-graph convolution modules in Refined Pose Net. Global and structural advantages of graph convolution are more conducive to predicting difficult points in sample Pose. In addition, SA-GCN obtains lower parameters compared with the popular pose estimation networks. By which, we apply a novel framework SA-GCN to get feature maps for proposal and refinement, respectively. Comprehensive experiments demonstrate that the proposed method achieves superior pose estimation results on two benchmark datasets, CrowdPose and MSCOCO. Moreover, SA-GCN significantly outperforms state-of-the-art performance on CrowdPose and almost always generates plausible human pose predictions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of data and materials

The data comes from the common dataset.

Code availability

Custom code.

References

  1. Brasó G, Kister N, Leal-Taixé L (2021) The center of attention: Center-keypoint grouping via attention for multi-person pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 11,853–11,863

  2. Bulat A, Tzimiropoulos G (2016) Human pose estimation via convolutional part heatmap regression. In: European Conference on Computer Vision. Springer, pp 717–732

  3. Cao Z, Simon T, Wei SE et al (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7291–7299

  4. Chen Y, Wang Z, Peng Y et al (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7103–7112

  5. Chen Y, Rohrbach M, Yan Z et al (2019) Graph-based global reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 433–442

  6. Cheng B, Xiao B, Wang J et al (2020) Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5386–5395

  7. Chu X, Yang W, Ouyang W et al (2017) Multi-context attention for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1831–1840

  8. Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inf Process Syst 29

  9. Fang HS, Xie S, Tai YW et al (2017) Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2334–2343

  10. Fieraru M, Khoreva A, Pishchulin L et al (2018) Learning to refine human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 205–214

  11. Golda T, Kalb T, Schumann A et al (2019) Human pose estimation for real-world crowded scenarios. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, pp 1–8

  12. He K, Gkioxari G, Dollár P et al (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969

  13. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141

  14. Insafutdinov E, Pishchulin L, Andres B et al (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: European Conference on Computer Vision. Springer, pp 34–50

  15. Jiao L, Wu H, Wang H et al (2019) Multi-scale semantic image inpainting with residual learning and gan. Neurocomputing 331:199–212

    Article  Google Scholar 

  16. Jin S, Liu W, Xie E et al (2020) Differentiable hierarchical graph grouping for multi-person pose estimation. In: European Conference on Computer Vision. Springer, pp 718–734

  17. Ke L, Chang MC, Qi H et al (2018) Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 713–728

  18. Khirodkar R, Chari V, Agrawal A et al (2021) Multi-instance pose networks: rethinking top-down pose estimation, pp 3122–3131

  19. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. Preprint arXiv:1412.6980

  20. Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. Preprint arXiv:1609.02907

  21. Li J, Wang C, Zhu H et al (2019) Crowdpose: efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10,863–10,872

  22. Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European Conference on Computer Vision. Springer, pp 740–755

  23. Liu C, Yuen PC (2010) Human action recognition using boosted eigenactions. Image Vis Comput 28(5):825–835

    Article  Google Scholar 

  24. Luo Y, Xu Z, Liu P et al (2018) Multi-person pose estimation via multi-layer fractal network and joints kinship pattern. IEEE Trans Image Process 28(1):142–155

    Article  MATH  MathSciNet  Google Scholar 

  25. Luo Y, Ou Z, Wan T et al (2022) Fastnet: fast high-resolution network for human pose estimation. Image Vis Comput:104390

  26. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision. Springer, pp 483–499

  27. Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. Adv Neural Inf Process Syst 30

  28. Ou Z, Luo Y, Chen J et al (2022) Srfnet: selective receptive field network for human pose estimation. J Supercomput 78(1):691–711

    Article  Google Scholar 

  29. Papandreou G, Zhu T, Kanazawa N et al (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4903–4911

  30. Paszke A, Gross S, Chintala S et al (2017) Automatic differentiation in pytorch

  31. Pishchulin L, Insafutdinov E, Tang S et al (2016) Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4929–4937

  32. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. Preprint arXiv:1804.02767

  33. Sun K, Xiao B, Liu D et al (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5693–5703

  34. Tang W, Yu P, Wu Y (2018) Deeply learned compositional models for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 190–206

  35. Wan T, Luo Y, Zhang Z et al (2022) Tsnet: tree structure network for human pose estimation. Signal Image Video Process 16(2):551–558

    Article  Google Scholar 

  36. Wei SE, Ramakrishna V, Kanade T et al (2016) Convolutional pose machines. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 4724–4732

  37. Woo S, Park J, Lee JY et al (2018) Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19

  38. Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 466–481

  39. Yang W, Li S, Ouyang W et al (2017) Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1281–1290

  40. Zhang Z, Luo Y, Gou J (2021) Double anchor embedding for accurate multi-person 2d pose estimation. Image Vis Comput 111(104):198

    Google Scholar 

  41. Zhao L, Peng X, Tian Y et al (2019) Semantic graph convolutional networks for 3d human pose regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3425–3435

  42. Zhu Y, Ma C, Du J (2019) Rotated cascade r-cnn: a shape robust detector with coordinate regression. Pattern Recogn 96(106):964

    Google Scholar 

Download references

Acknowledgements

This work was supported by Natural Science Foundation of Fujian Province, China under Grant 2020J01082.

Funding

Natural Science Foundation of Fujian Province, China under Grant 2020J01082.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanmin Luo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Luo, Y. SA-GCN: structure-aware graph convolutional networks for crowd pose estimation. J Supercomput 79, 10046–10062 (2023). https://doi.org/10.1007/s11227-023-05055-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05055-z

Keywords

Navigation