Abstract
Human pose estimation is fundamental to many computer vision tasks and has made significant progress in recent years. However, the problem of unbalanced performance among joints has not been paid enough attention. Basing on simple baseline Xiao et al. (Proceedings of the European conference on computer vision, 2018), we propose a weighted summation method of local keypoint, selective receptive field (SRF) unit and use the feature fuse method to tackle this problem. Initially, the weighted summation method of local keypoint is designed to make the network explicitly address keypoints with large loss value. This method calculation weights according to the loss value of each joint. Subsequently, the SRF unit was proposed to adaptively select receptive field size for keypoints. Firstly, multiple branches with different kernel sizes are compared using softmax attention. Secondly, the Select operator chooses one of these branches to yield effective receptive fields. Then, the features coming from the encoder are merged in the decoder using concatenation to solve the occlusion joint. This method enhances communication between spatial information and semantic information. The experimental results show that as a model-agnostic approach, our method promotes SimpleBaseline-\(50-256\times 192\) by 4.3 AP on COCO validation set. Extensive experiments demonstrate that the proposed approach is superior to several state-of-the-art methods in terms of accuracy and robustness.
Similar content being viewed by others
References
AlZu’bi S, Shehab M, Al-Ayyoub M, Jararweh Y, Gupta B (2020) Parallel implementation for 3D medical volume fuzzy segmentation. Pattern Recogn Lett 130:312–318
Alsmirat MA, Al-Alem F, Al-Ayyoub M, Jararweh Y, Gupta B (2019) Impact of digital fingerprint image quality on the fingerprint recognition accuracy. Multimedia Tools Appl 78(3):3649–3688. https://doi.org/10.1007/s11042-017-5537-5
Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D Human pose estimation: new benchmark and state of the art analysis. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (cvpr), pp 3686–3693. https://doi.org/10.1109/CVPR.2014.471. ISSN: 1063-6919 WOS:000361555603094
Bin Y, Cao X, Chen X, Ge Y, Tai Y, Wang C, Li J, Huang F, Gao C, Sang N (2020) Adversarial semantic data augmentation for human pose estimation
Bulat A, Tzimiropoulos G (2016) Human pose estimation via convolutional part heatmap regression. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision-ECCV 2016. Springer, Cham, pp 717–732
Bulat A, Kossaifi J, Tzimiropoulos G, Pantic M (2020) Toward fast and accurate human pose estimation via soft-gated skip connections. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (FG), pp 101–108 https://doi.org/10.1109/FG47880.2020.00014. Journal Abbreviation: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (FG)
Cao Z, Simon T, Wei S, Sheikh Y (2017) Realtime multi-person 2D pose estimation using part affinity fields. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1302–1310. https://doi.org/10.1109/CVPR.2017.143. Journal Abbreviation: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Cao Z, Hidalgo G, Simon T, Wei SE, Sheikh Y (2018) OpenPose: realtime multi-person 2D pose estimation using part affinity fields. arXiv preprint arXiv:1812.08008
Chen Y, Shen C, Wei XS, Liu L, Yang J (2017) Adversarial PoseNet: a structure-aware convolutional network for human pose estimation. In: 2017 IEEE International Conference on Computer Vision (iccv), pp 1221–1230 https://doi.org/10.1109/ICCV.2017.137. ISSN: 1550-5499 WOS:000425498401030
Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7103–7112
Cheng B, Xiao B, Wang J, Shi H, Huang TS, Zhang L (2020) HigherHRNet: scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5386–5395
Chou CJ, Chien JT, Chen HT (2018) Self adversarial training for human pose estimation. In: 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp 17–30. IEEE
Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi-context attention for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1831–1840
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118
Fang H, Xie S, Tai Y, Lu C (2017) RMPE: regional multi-person pose estimation. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 2353–2362. https://doi.org/10.1109/ICCV.2017.256. Journal Abbreviation: 2017 IEEE International Conference on Computer Vision (ICCV)
Fatemidokht H, Rafsanjani MK, Gupta BB, Hsu CH (2021) Efficient and secure routing protocol based on artificial intelligence algorithms with UAV-Assisted for vehicular ad hoc networks in intelligent transportation systems. IEEE Trans Intell Transp Syst, pp 1–13. https://doi.org/10.1109/TITS.2020.3041746
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (iccv), pp 2980–2988. https://doi.org/10.1109/ICCV.2017.322. ISSN: 1550-5499 WOS:000425498403005
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90. ISSN: 1063-6919
Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision-Eccv 2016, Pt Vi, vol 9910, pp. 34–50. https://doi.org/10.1007/978-3-319-46466-4_3. ISSN: 0302-9743 WOS:000389499900003
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37, ICML’15, pp 448–456. JMLR.org, Lille, France
Iqbal U, Milan A, Gall J (2017) Posetrack: joint multi-person pose estimation and tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2011–2020
Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neurosci 2(3):194–203
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Ke L, Chang MC, Qi H, Lyu S (2018) Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 713–728
Li D, Deng L, Bhooshan Gupta B, Wang H, Choi C (2019) A novel CNN based security guaranteed image watermarking generation scenario for smart city applications. Inf Sci 479:432–447
Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 510–519
Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 30th IEEE Conference on Computer Vision and Pattern Recognition (cvpr 2017), pp 936–944. https://doi.org/10.1109/CVPR.2017.106. ISSN: 1063-6919 WOS:000418371401001
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision-Eccv 2014, Pt V, vol. 8693, pp. 740–755. https://doi.org/10.1007/978-3-319-10602-1_48. ISSN: 0302-9743 WOS:000345528200048
Luo Y, Xu Z, Liu P, Du Y, Guo JM (2018) Multi-person pose estimation via multi-layer fractal network and joints kinship pattern. IEEE Trans Image Process 28(1):142–155
Luvizon DC, Tabia H, Picard D (2019) Human pose regression by combining indirect part detection and contextual information. Comput Graph 85:15–22
Mnih V, Heess N, Graves A (2014) Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp 2204–2212
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision-Eccv 2016, Pt Viii, vol 9912, pp 483–499. https://doi.org/10.1007/978-3-319-46484-8_29. ISSN: 0302-9743 WOS:000389500600029
Ning G, Zhang Z, He Z (2017) Knowledge-guided deep fractal neural networks for human pose estimation. IEEE Trans Multimedia 20(5):1246–1259
Olshausen BA, Anderson CH, Van Essen DC (1993) A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J Neurosci 13(11):4700–4719
Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4903–4911
Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P, Schiele B (2016) DeepCut: joint subset partition and labeling for multi person pose estimation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (cvpr), pp 4929–4937. https://doi.org/10.1109/CVPR.2016.533. ISSN: 1063-6919 WOS:000400012305001
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031.WOS:000401091200007
Sahoo SR, Gupta B (2021) Multiple features based approach for automatic fake news detection on social networks using deep learning. Appl Soft Comput 100:106983
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. arXiv:abs/1409.1556
Su Z, Ye M, Zhang G, Dai L, Sheng J (2019) Cascade feature aggregation for human pose estimation. arXiv preprint arXiv:1902.07837
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5693–5703
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. arXiv:1602.07261 [cs]
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich, A (2015) Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9. https://doi.org/10.1109/CVPR.2015.7298594. ISSN: 1063-6919
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826
Tang W, Yu P, Wu Y (2018) Deeply learned compositional models for human pose estimation. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision-ECCV 2018. Springer International Publishing, Cham, pp 197–214
Tang Z, Peng X, Geng S, Wu L, Zhang S, Metaxas D (2018) Quantized densely connected u-nets for efficient landmark localization. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 339–354
Toshev A, Szegedy C (2014) DeepPose: human pose estimation via deep neural networks. In: 2014 Ieee conference on computer vision and pattern recognition (cvpr), pp. 1653–1660. https://doi.org/10.1109/CVPR.2014.214. ISSN: 1063-6919 WOS:000361555601089
Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (cvpr), pp 4724–4732. https://doi.org/10.1109/CVPR.2016.511. ISSN: 1063-6919 WOS:000400012304085
Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 466–481
Yang W, Li S, Ouyang W, Li H, Wang X (2017) Learning feature pyramids for human pose estimation. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 1290–1299. https://doi.org/10.1109/ICCV.2017.144
You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4651–4659
Zhang F, Zhu X, Dai H, Ye M, Zhu C (2020) Distribution-aware coordinate representation for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zhang H, Ouyang H, Liu S, Qi X, Shen X, Yang R, Jia J (2019) Human pose estimation with spatial contextual information. arXiv preprint arXiv:1901.01760
Zhu X, Jiang Y, Luo Z (2017) Multi-person pose estimation for posetrack with enhanced part affinity fields. In: ICCV PoseTrack Workshop, vol 7
Acknowledgements
This work was supported by the Science and Technology Bureau of Quanzhou under Grant 2018C113R, and in part by Natural Science Foundation of Fujian Province, China, under grant 2020J01082.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ou, Z., Luo, Y., Chen, J. et al. SRFNet: selective receptive field network for human pose estimation. J Supercomput 78, 691–711 (2022). https://doi.org/10.1007/s11227-021-03889-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03889-z