Abstract
In this paper, we aim to capture the structure dependency of human joints and improve the localization accuracy of invisible joints. We propose a novel framework: Structure-aware Graph Convolutional Network (SA-GCN) for crowd pose estimation, which can be divided into two components: Sample Pose Net and Refined Pose Net. Firstly, Sample Pose Net includes a multi-scale feature fusion module, which uses multi-scale features to capture small-scale characters and extract the global “rough” pose as much as possible. Secondly, channel and spatial attention are injected into the multi-scale feature fusion module to strengthen the characteristics of small-scale characters. Finally, graph convolution obtained by the disentangled several parallel sub-graph convolution modules in Refined Pose Net. Global and structural advantages of graph convolution are more conducive to predicting difficult points in sample Pose. In addition, SA-GCN obtains lower parameters compared with the popular pose estimation networks. By which, we apply a novel framework SA-GCN to get feature maps for proposal and refinement, respectively. Comprehensive experiments demonstrate that the proposed method achieves superior pose estimation results on two benchmark datasets, CrowdPose and MSCOCO. Moreover, SA-GCN significantly outperforms state-of-the-art performance on CrowdPose and almost always generates plausible human pose predictions.
Similar content being viewed by others
Availability of data and materials
The data comes from the common dataset.
Code availability
Custom code.
References
Brasó G, Kister N, Leal-Taixé L (2021) The center of attention: Center-keypoint grouping via attention for multi-person pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 11,853–11,863
Bulat A, Tzimiropoulos G (2016) Human pose estimation via convolutional part heatmap regression. In: European Conference on Computer Vision. Springer, pp 717–732
Cao Z, Simon T, Wei SE et al (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7291–7299
Chen Y, Wang Z, Peng Y et al (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7103–7112
Chen Y, Rohrbach M, Yan Z et al (2019) Graph-based global reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 433–442
Cheng B, Xiao B, Wang J et al (2020) Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5386–5395
Chu X, Yang W, Ouyang W et al (2017) Multi-context attention for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1831–1840
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inf Process Syst 29
Fang HS, Xie S, Tai YW et al (2017) Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2334–2343
Fieraru M, Khoreva A, Pishchulin L et al (2018) Learning to refine human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 205–214
Golda T, Kalb T, Schumann A et al (2019) Human pose estimation for real-world crowded scenarios. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, pp 1–8
He K, Gkioxari G, Dollár P et al (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
Insafutdinov E, Pishchulin L, Andres B et al (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: European Conference on Computer Vision. Springer, pp 34–50
Jiao L, Wu H, Wang H et al (2019) Multi-scale semantic image inpainting with residual learning and gan. Neurocomputing 331:199–212
Jin S, Liu W, Xie E et al (2020) Differentiable hierarchical graph grouping for multi-person pose estimation. In: European Conference on Computer Vision. Springer, pp 718–734
Ke L, Chang MC, Qi H et al (2018) Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 713–728
Khirodkar R, Chari V, Agrawal A et al (2021) Multi-instance pose networks: rethinking top-down pose estimation, pp 3122–3131
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. Preprint arXiv:1412.6980
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. Preprint arXiv:1609.02907
Li J, Wang C, Zhu H et al (2019) Crowdpose: efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10,863–10,872
Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European Conference on Computer Vision. Springer, pp 740–755
Liu C, Yuen PC (2010) Human action recognition using boosted eigenactions. Image Vis Comput 28(5):825–835
Luo Y, Xu Z, Liu P et al (2018) Multi-person pose estimation via multi-layer fractal network and joints kinship pattern. IEEE Trans Image Process 28(1):142–155
Luo Y, Ou Z, Wan T et al (2022) Fastnet: fast high-resolution network for human pose estimation. Image Vis Comput:104390
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision. Springer, pp 483–499
Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. Adv Neural Inf Process Syst 30
Ou Z, Luo Y, Chen J et al (2022) Srfnet: selective receptive field network for human pose estimation. J Supercomput 78(1):691–711
Papandreou G, Zhu T, Kanazawa N et al (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4903–4911
Paszke A, Gross S, Chintala S et al (2017) Automatic differentiation in pytorch
Pishchulin L, Insafutdinov E, Tang S et al (2016) Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4929–4937
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. Preprint arXiv:1804.02767
Sun K, Xiao B, Liu D et al (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5693–5703
Tang W, Yu P, Wu Y (2018) Deeply learned compositional models for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 190–206
Wan T, Luo Y, Zhang Z et al (2022) Tsnet: tree structure network for human pose estimation. Signal Image Video Process 16(2):551–558
Wei SE, Ramakrishna V, Kanade T et al (2016) Convolutional pose machines. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 4724–4732
Woo S, Park J, Lee JY et al (2018) Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 466–481
Yang W, Li S, Ouyang W et al (2017) Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1281–1290
Zhang Z, Luo Y, Gou J (2021) Double anchor embedding for accurate multi-person 2d pose estimation. Image Vis Comput 111(104):198
Zhao L, Peng X, Tian Y et al (2019) Semantic graph convolutional networks for 3d human pose regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3425–3435
Zhu Y, Ma C, Du J (2019) Rotated cascade r-cnn: a shape robust detector with coordinate regression. Pattern Recogn 96(106):964
Acknowledgements
This work was supported by Natural Science Foundation of Fujian Province, China under Grant 2020J01082.
Funding
Natural Science Foundation of Fujian Province, China under Grant 2020J01082.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, J., Luo, Y. SA-GCN: structure-aware graph convolutional networks for crowd pose estimation. J Supercomput 79, 10046–10062 (2023). https://doi.org/10.1007/s11227-023-05055-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05055-z