Human pose estimation based on feature enhancement and multi-scale feature fusion

Cao, Dandan; Liu, Weibin; Xing, Weiwei; Wei, Xiang

doi:10.1007/s11760-022-02271-7

Human pose estimation based on feature enhancement and multi-scale feature fusion

Original Paper
Published: 18 June 2022

Volume 17, pages 643–650, (2023)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Dandan Cao¹,
Weibin Liu¹,
Weiwei Xing² &
…
Xiang Wei²

376 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

The human pose estimation has been greatly improved with the development of deep neural network. However, there are some challenges in this task, such as the occlusions in images and various scales of the human body. In this study, we propose a novel convolutional neural network architecture based on dual attention mechanism and multi-scale feature fusion to generate keypoints prediction and estimate the location of human body parts in images. Firstly, the feature enhancement module(FEM) performs local feature enhancement process for each feature map of the network using the double-attention mechanism, where channel attention is used to filter out the channels that need more attention and spatial attention is used to enhance the local features of each feature map at the spatial level. Secondly, we design a multi-scale feature fusion(MSFF) module by using the cascade of atrous convolution to aggregate contextual information and enhance the expressiveness of features. The multi-scale contextual information is increased by expanding the perceptual field, which helps to detect adjacent keypoints. Finally, we introduce an improved upsampling module that jointly uses upsampling2D and transposed convolution to better regress the obtained feature maps to higher resolution and output heatmaps. Extensive experiments on MPII and COCO human pose estimation benchmarks demonstrate the effectiveness of our network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing feature fusion for human pose estimation

Article 24 September 2020

Attention Refined Network for Human Pose Estimation

Article 20 May 2021

Simple Fine-Tuning Attention Modules for Human Pose Estimation

References

Miki, D., Abe, S., Chen, S., Demachi, K.: Robust human pose estimation from distorted wide-angle images through iterative search of transformation parameters. Signal Image Video Process. 14(4), 693–700 (2020)
Article Google Scholar
Chen, Y., Tian, Y., He, M.: Monocular human pose estimation: a survey of deep learning-based methods. Comput. Vis. Image Underst. 192, 102897 (2020)
Article Google Scholar
Pfister, T., Simonyan, K., Charles, J., Zisserman, A.: Deep convolutional neural networks for efficient pose estimation in gesture videos. In: Asian Conference on Computer Vision, pp. 538–552. Springer (2014)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: European conference on computer vision, pp. 483–499. Springer (2016)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7103–7112 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)
Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2602–2611 (2017)
Tompson, J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. arXiv preprint arXiv:1406.2984 (2014)
Tang, W., Wu, Y.: Does learning specific features for related parts help human pose estimation? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1107–1116 (2019)
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 466–481 (2018)
Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P.V., Schiele, B.: Deepcut: Joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4929–4937 (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural. Inf. Process. Syst. 28, 91–99 (2015)
Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)
Kreiss, S., Bertoni, L., Alahi, A.: Pifpaf: Composite fields for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11977–11986 (2019)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1281–1290 (2017)
Tang, W., Yu, P., Wu, Y.: Deeply learned compositional models for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 190–206 (2018)
Ryou, S., Jeong, S.G., Perona, P.: Anchor loss: Modulating loss scale based on prediction difficulty. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5992–6001 (2019)
Bin, Y., Chen, Z.M., Wei, X.S., Chen, X., Gao, C., Sang, N.: Structure-aware human pose estimation with graph convolutional networks. Pattern Recogn. 106, 107410 (2020
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Proc. Syst. 25 (2012)
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, pp. 3686–3693 (2014)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Su, K., Yu, D., Xu, Z., Geng, X., Wang, C.: Multi-person pose estimation with enhanced channel-wise and spatial information. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5674–5682 (2019)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Papandreou, G., Zhu, T., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., Murphy, K.: Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4903–4911 (2017)

Download references

Acknowledgements

This research is partially supported by the Beijing Natural Science Foundation (No. 4212025), National Natural Science Foundation of China (Nos. 61876018, 61906014, 61976017).

Author information

Authors and Affiliations

Institute of Information Science, Beijing Jiaotong University, Beijing, 100044, China
Dandan Cao & Weibin Liu
School of Software Engineering, Beijing Jiaotong University, Beijing, 100044, China
Weiwei Xing & Xiang Wei

Authors

Dandan Cao
View author publications
You can also search for this author in PubMed Google Scholar
Weibin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Weiwei Xing
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weibin Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants and/or animals performed by any of the authors.

Informed consent

There is no informed consent for this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, D., Liu, W., Xing, W. et al. Human pose estimation based on feature enhancement and multi-scale feature fusion. SIViP 17, 643–650 (2023). https://doi.org/10.1007/s11760-022-02271-7

Download citation

Received: 12 September 2021
Revised: 04 March 2022
Accepted: 16 May 2022
Published: 18 June 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s11760-022-02271-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human pose estimation based on feature enhancement and multi-scale feature fusion

Abstract

Access this article

Similar content being viewed by others

Enhancing feature fusion for human pose estimation

Attention Refined Network for Human Pose Estimation

Simple Fine-Tuning Attention Modules for Human Pose Estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Human pose estimation based on feature enhancement and multi-scale feature fusion

Abstract

Access this article

Similar content being viewed by others

Enhancing feature fusion for human pose estimation

Attention Refined Network for Human Pose Estimation

Simple Fine-Tuning Attention Modules for Human Pose Estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation