SRFNet: selective receptive field network for human pose estimation

Ou, Zhilong; Luo, YanMin; Chen, Jin; Chen, Geng

doi:10.1007/s11227-021-03889-z

SRFNet: selective receptive field network for human pose estimation

Published: 31 May 2021

Volume 78, pages 691–711, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Zhilong Ou^1,2,
YanMin Luo ORCID: orcid.org/0000-0001-7596-3299^1,2,
Jin Chen³ &
…
Geng Chen^1,2

526 Accesses
10 Citations
Explore all metrics

Abstract

Human pose estimation is fundamental to many computer vision tasks and has made significant progress in recent years. However, the problem of unbalanced performance among joints has not been paid enough attention. Basing on simple baseline Xiao et al. (Proceedings of the European conference on computer vision, 2018), we propose a weighted summation method of local keypoint, selective receptive field (SRF) unit and use the feature fuse method to tackle this problem. Initially, the weighted summation method of local keypoint is designed to make the network explicitly address keypoints with large loss value. This method calculation weights according to the loss value of each joint. Subsequently, the SRF unit was proposed to adaptively select receptive field size for keypoints. Firstly, multiple branches with different kernel sizes are compared using softmax attention. Secondly, the Select operator chooses one of these branches to yield effective receptive fields. Then, the features coming from the encoder are merged in the decoder using concatenation to solve the occlusion joint. This method enhances communication between spatial information and semantic information. The experimental results show that as a model-agnostic approach, our method promotes SimpleBaseline-\(50-256\times 192\) by 4.3 AP on COCO validation set. Extensive experiments demonstrate that the proposed approach is superior to several state-of-the-art methods in terms of accuracy and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

References

AlZu’bi S, Shehab M, Al-Ayyoub M, Jararweh Y, Gupta B (2020) Parallel implementation for 3D medical volume fuzzy segmentation. Pattern Recogn Lett 130:312–318
Article Google Scholar
Alsmirat MA, Al-Alem F, Al-Ayyoub M, Jararweh Y, Gupta B (2019) Impact of digital fingerprint image quality on the fingerprint recognition accuracy. Multimedia Tools Appl 78(3):3649–3688. https://doi.org/10.1007/s11042-017-5537-5
Article Google Scholar
Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D Human pose estimation: new benchmark and state of the art analysis. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (cvpr), pp 3686–3693. https://doi.org/10.1109/CVPR.2014.471. ISSN: 1063-6919 WOS:000361555603094
Bin Y, Cao X, Chen X, Ge Y, Tai Y, Wang C, Li J, Huang F, Gao C, Sang N (2020) Adversarial semantic data augmentation for human pose estimation
Bulat A, Tzimiropoulos G (2016) Human pose estimation via convolutional part heatmap regression. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision-ECCV 2016. Springer, Cham, pp 717–732
Chapter Google Scholar
Bulat A, Kossaifi J, Tzimiropoulos G, Pantic M (2020) Toward fast and accurate human pose estimation via soft-gated skip connections. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (FG), pp 101–108 https://doi.org/10.1109/FG47880.2020.00014. Journal Abbreviation: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (FG)
Cao Z, Simon T, Wei S, Sheikh Y (2017) Realtime multi-person 2D pose estimation using part affinity fields. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1302–1310. https://doi.org/10.1109/CVPR.2017.143. Journal Abbreviation: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Cao Z, Hidalgo G, Simon T, Wei SE, Sheikh Y (2018) OpenPose: realtime multi-person 2D pose estimation using part affinity fields. arXiv preprint arXiv:1812.08008
Chen Y, Shen C, Wei XS, Liu L, Yang J (2017) Adversarial PoseNet: a structure-aware convolutional network for human pose estimation. In: 2017 IEEE International Conference on Computer Vision (iccv), pp 1221–1230 https://doi.org/10.1109/ICCV.2017.137. ISSN: 1550-5499 WOS:000425498401030
Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7103–7112
Cheng B, Xiao B, Wang J, Shi H, Huang TS, Zhang L (2020) HigherHRNet: scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5386–5395
Chou CJ, Chien JT, Chen HT (2018) Self adversarial training for human pose estimation. In: 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp 17–30. IEEE
Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi-context attention for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1831–1840
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118
Fang H, Xie S, Tai Y, Lu C (2017) RMPE: regional multi-person pose estimation. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 2353–2362. https://doi.org/10.1109/ICCV.2017.256. Journal Abbreviation: 2017 IEEE International Conference on Computer Vision (ICCV)
Fatemidokht H, Rafsanjani MK, Gupta BB, Hsu CH (2021) Efficient and secure routing protocol based on artificial intelligence algorithms with UAV-Assisted for vehicular ad hoc networks in intelligent transportation systems. IEEE Trans Intell Transp Syst, pp 1–13. https://doi.org/10.1109/TITS.2020.3041746
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (iccv), pp 2980–2988. https://doi.org/10.1109/ICCV.2017.322. ISSN: 1550-5499 WOS:000425498403005
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90. ISSN: 1063-6919
Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision-Eccv 2016, Pt Vi, vol 9910, pp. 34–50. https://doi.org/10.1007/978-3-319-46466-4_3. ISSN: 0302-9743 WOS:000389499900003
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37, ICML’15, pp 448–456. JMLR.org, Lille, France
Iqbal U, Milan A, Gall J (2017) Posetrack: joint multi-person pose estimation and tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2011–2020
Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neurosci 2(3):194–203
Article Google Scholar
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Article Google Scholar
Ke L, Chang MC, Qi H, Lyu S (2018) Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 713–728
Li D, Deng L, Bhooshan Gupta B, Wang H, Choi C (2019) A novel CNN based security guaranteed image watermarking generation scenario for smart city applications. Inf Sci 479:432–447
Article Google Scholar
Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 510–519
Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 30th IEEE Conference on Computer Vision and Pattern Recognition (cvpr 2017), pp 936–944. https://doi.org/10.1109/CVPR.2017.106. ISSN: 1063-6919 WOS:000418371401001
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision-Eccv 2014, Pt V, vol. 8693, pp. 740–755. https://doi.org/10.1007/978-3-319-10602-1_48. ISSN: 0302-9743 WOS:000345528200048
Luo Y, Xu Z, Liu P, Du Y, Guo JM (2018) Multi-person pose estimation via multi-layer fractal network and joints kinship pattern. IEEE Trans Image Process 28(1):142–155
Article MathSciNet Google Scholar
Luvizon DC, Tabia H, Picard D (2019) Human pose regression by combining indirect part detection and contextual information. Comput Graph 85:15–22
Article Google Scholar
Mnih V, Heess N, Graves A (2014) Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp 2204–2212
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision-Eccv 2016, Pt Viii, vol 9912, pp 483–499. https://doi.org/10.1007/978-3-319-46484-8_29. ISSN: 0302-9743 WOS:000389500600029
Ning G, Zhang Z, He Z (2017) Knowledge-guided deep fractal neural networks for human pose estimation. IEEE Trans Multimedia 20(5):1246–1259
Article Google Scholar
Olshausen BA, Anderson CH, Van Essen DC (1993) A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J Neurosci 13(11):4700–4719
Article Google Scholar
Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4903–4911
Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P, Schiele B (2016) DeepCut: joint subset partition and labeling for multi person pose estimation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (cvpr), pp 4929–4937. https://doi.org/10.1109/CVPR.2016.533. ISSN: 1063-6919 WOS:000400012305001
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031.WOS:000401091200007
Article Google Scholar
Sahoo SR, Gupta B (2021) Multiple features based approach for automatic fake news detection on social networks using deep learning. Appl Soft Comput 100:106983
Article Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. arXiv:abs/1409.1556
Su Z, Ye M, Zhang G, Dai L, Sheng J (2019) Cascade feature aggregation for human pose estimation. arXiv preprint arXiv:1902.07837
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5693–5703
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. arXiv:1602.07261 [cs]
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich, A (2015) Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9. https://doi.org/10.1109/CVPR.2015.7298594. ISSN: 1063-6919
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826
Tang W, Yu P, Wu Y (2018) Deeply learned compositional models for human pose estimation. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision-ECCV 2018. Springer International Publishing, Cham, pp 197–214
Chapter Google Scholar
Tang Z, Peng X, Geng S, Wu L, Zhang S, Metaxas D (2018) Quantized densely connected u-nets for efficient landmark localization. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 339–354
Toshev A, Szegedy C (2014) DeepPose: human pose estimation via deep neural networks. In: 2014 Ieee conference on computer vision and pattern recognition (cvpr), pp. 1653–1660. https://doi.org/10.1109/CVPR.2014.214. ISSN: 1063-6919 WOS:000361555601089
Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (cvpr), pp 4724–4732. https://doi.org/10.1109/CVPR.2016.511. ISSN: 1063-6919 WOS:000400012304085
Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 466–481
Yang W, Li S, Ouyang W, Li H, Wang X (2017) Learning feature pyramids for human pose estimation. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 1290–1299. https://doi.org/10.1109/ICCV.2017.144
You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4651–4659
Zhang F, Zhu X, Dai H, Ye M, Zhu C (2020) Distribution-aware coordinate representation for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zhang H, Ouyang H, Liu S, Qi X, Shen X, Yang R, Jia J (2019) Human pose estimation with spatial contextual information. arXiv preprint arXiv:1901.01760
Zhu X, Jiang Y, Luo Z (2017) Multi-person pose estimation for posetrack with enhanced part affinity fields. In: ICCV PoseTrack Workshop, vol 7

Download references

Acknowledgements

This work was supported by the Science and Technology Bureau of Quanzhou under Grant 2018C113R, and in part by Natural Science Foundation of Fujian Province, China, under grant 2020J01082.

Author information

Authors and Affiliations

College of Computer Science and Technology, Huaqiao University, Xiamen, 361021, People’s Republic of China
Zhilong Ou, YanMin Luo & Geng Chen
Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen, 361021, People’s Republic of China
Zhilong Ou, YanMin Luo & Geng Chen
Fujian Normal University, Fujian, People’s Republic of China
Jin Chen

Authors

Zhilong Ou
View author publications
You can also search for this author in PubMed Google Scholar
YanMin Luo
View author publications
You can also search for this author in PubMed Google Scholar
Jin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Geng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to YanMin Luo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ou, Z., Luo, Y., Chen, J. et al. SRFNet: selective receptive field network for human pose estimation. J Supercomput 78, 691–711 (2022). https://doi.org/10.1007/s11227-021-03889-z

Download citation

Accepted: 12 May 2021
Published: 31 May 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s11227-021-03889-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SRFNet: selective receptive field network for human pose estimation

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SRFNet: selective receptive field network for human pose estimation

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation