An optimization high-resolution network for human pose recognition based on attention mechanism

Yang, Jinlong; Feng, Yu

doi:10.1007/s11042-023-16793-w

An optimization high-resolution network for human pose recognition based on attention mechanism

Published: 23 October 2023

Volume 83, pages 45535–45552, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jinlong Yang¹ &
Yu Feng¹

116 Accesses
Explore all metrics

Abstract

In the high-resolution network (HRNet), the low layer of low resolution part can adopt shallow parallel network structure to maintain the high-resolution features and highlight global features. However, the high-resolution human posture estimation network has the problems of large amount of network parameters, high complex calculation and low recognition precision of similar actions. To solve these problems, we proposed an optimized HRNet based on attention mechanism. Firstly, the dilated convolution (DC) module is introduced into cross-channel sampling to obtain global features by increasing the receptive field of the feature map, which ensures that the feature map can cover all the information of the original image; Secondly, the channel attention Squeeze-and-Excitation (SE) module is introduced in the process of cross-channel feature fusion to learn the correlations, which can recalibrate the features, highlight the information features selectively and suppress the secondary features, improving the recognition precision without changing the parameter quantity and operation complexity; Finally, the experiment results on KTH dataset show that the HRNet with channel attention mechanism and dilated convolution has better accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Human pose estimation based on feature enhancement and multi-scale feature fusion

Article 18 June 2022

An improved lightweight high-resolution network based on multi-dimensional weighting for human pose estimation

Article Open access 04 May 2023

Scale-aware attention-based multi-resolution representation for multi-person pose estimation

Article 01 May 2021

Data availability

The datasets generated during and/or analyzed during the current study are available in the [KTH, Coco2017] repository, [http://www.nada.kth.se/cvap/actions/], [http://cocodataset.org/].

References

Peng C (2015) Pose Estimation Using Local Adjustment with Mixtures-of-parts Models. J Fiber Bioeng Informat 8(2):249–258
Article Google Scholar
Bin X, Haiping W, Yichen W (2018) Simple Baselines for Human Pose Estimation and Tracking. Proc Eur Conference Comput Vis:1208-1215
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. Proc IEEE Comput Soc Conference Comput Vis Pattern Recog 1:886–893
Google Scholar
Sang S, Huang Z, Kang Z (2018) A Human Activity Recognition Method using the Maximum Optical Flow based Feature Bounding Box. Proc Int Conference Machine Learn Compu:1330-1337
Wu Y, Wei L, Duan Y (2021) Deep spatiotemporal LSTM network with temporal pattern feature for 3D human action recognition. Comput Intell 99:11–23
Google Scholar
Nazir S, Yousaf MH, Velastin SA (2018) Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition. Comput Electric Eng 72:660–669
Article Google Scholar
Papandreou G, Zhu T, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. Proc Int Conf Comput Vis, 144-158
Huang S-L, Gong M-M, Tao D-C (2017) A coarse-fine network for keypoint localization. Proc Int Conference Comput Vis:244-252
Wang H, Schmid C (2013) Action recognition with improved trajectories. Proc IEEE Int Conf Comput Vis:3551-3558
Huang J, Zhu Z, Guo F, Huang G (2020) The devil is in the details: Delving into unbiased data processing for human pose estimation. Proc Eur Conf Comput Vis :246–256
Liu H, Tu J, Liu M (2017) Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition. Proc IEEE Conf Comput Vis Pattern Recog:1669–1676
Zhang Z, Hu Y, Chan S et al (2008) Motion context: A new representation for human action recognition. Proceedings of the European Conference on Computer Vision:817–829
Patel CI, Labana D, Pandya S et al (2020) Histogram of oriented gradient-based fusion of features for human action recognition in action video sequences. Sensors 20(24):7299
Article Google Scholar
Nazir S, Yousaf MH, Velastin SA (2018) Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition. Computers & Electrical Engineering 72:660–669
Article Google Scholar
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. Proceedings of the European conference on computer vision:483–499
Kaiming H, Xiangyu Z, Shaoqing R, et al (2016) Deep residual learning for image recognition. Proc IEEE Conf Comput Vis Pattern Recog:770-778
Sun K, Xiao B, Liu D, et al (2019) Deep High-Resolution Representation Learning for Human Pose Estimation. Proc IEEE Conf Comput Vis Pattern Recog:5693-5703
Abhronil S, Yuting et al (2019) Going Deeper in Spiking Neural Networks: VGG and Residual Architectures. Front Neurosci 13:95
Article Google Scholar
Zhou Z, Siddiquee MMR, Tajbakhsh N, et al (2018) Unet++: A nested u-net architecture for medical image segmentation. Deep Learn Med Image Anal Multimod Learn Clin Decision Support:3-11
Papandreou G, Zhu T, Chen LC, et al (2018) Personlab: Person pose estimation and instance segmentation with a bottom-up part-based geometric embedding model. Proc Eur Conf Comput Vis:269-286
Geng Z, Sun K, Xiao B, et al (2019) Bottom-Up Human Pose Estimation via Disentangled Keypoint Regression. Proc IEEE Data Driven Control Learn Syst Conf:174-187
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. Proc Int Conf Learn Represent 11:122
Google Scholar
Jin S, Ma X, Han Z et al (2017) Towards multi-person pose tracking: Bottom-up and top-down methods. Proc IEEE Int Conf Comput Vis 2(3):7–18
Google Scholar
Pavlakos G, Zhou X, Derpanis KG, Daniilidis K (2017) Coarse-to-fifine volumetric prediction for single-image 3D human pose. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Mehta D, Rhodin D, Casas P (2017) Monocular 3d human pose estimation in the wild using improved cnn supervision. Proc IEEE Int Conf Comput Vis:506-516
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. Proc IEEE conf Comput Vis Pattern Recogn:7132-7141
Martinez J, Hossain R, Little JJ (2017) A simple yet effective baseline for 3d human pose estimation. Proc IEEE Int Conf Comput Vis:218-223
Sun X, Shang J, Liang S, Wei Y (2017) Compositional human pose regression. Proc IEEE Int Conf Comput Vis:2702-2706
Yang W, Ouyang W, Wang X (2018) 3d human pose estimation in the wild by adversarial learning. Proc IEEE Conf Comput Vis Pattern Recog:443-451
Cao Z, Simon T, Wei S E, et al (2017) Realtime multi-person 2d pose estimation using part affinity fields. Proc IEEE Conf Comput Vis Pattern Recog:7291-7299
Xiao S, Bin X, Yichen W (2018) Integral Human Pose Regression. Proc IEEE Eur Conf Comput Vis:1024-1032
Cheng B, Xiao B, Wang J, et al (2020) Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. Proc IEEE/CVF Conf Comput Vis Pattern Recog:5386-5395
Congcong L, Jie Y, Haima Y et al (2021) Improved human action recognition approach based on two-stream convolutional neural network model. Vis Comput 37(6):1327–1341
Article Google Scholar
Geng Z, Sun K, Xiao B, et al (2021) Bottom-up human pose estimation via disentangled keypoint regression. Proc IEEE/CVF Conf Comput Vis Pattern Recog:14676-14686
Yu F, Koltun V (2016) Multi-Scale Context Aggregation by Dilated Convolutions. Proc Int Conf Learn Represent:446-456
Chen LC, Papandreou G, Kokkinos I et al (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar

Download references

Acknowledgements

This work is partially supported by the Natural Science Foundation of Jiangsu Province (No. BK20181340), and the National Natural Science Foundation of China (No. 61305017).

Author information

Authors and Affiliations

School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214122, China
Jinlong Yang & Yu Feng

Authors

Jinlong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Feng.

Ethics declarations

Competing interests

The authors declare that they have no known competing fnancial interests or personal relationships that could have appeared to infuence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, J., Feng, Y. An optimization high-resolution network for human pose recognition based on attention mechanism. Multimed Tools Appl 83, 45535–45552 (2024). https://doi.org/10.1007/s11042-023-16793-w

Download citation

Received: 22 August 2022
Revised: 24 July 2023
Accepted: 31 August 2023
Published: 23 October 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s11042-023-16793-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An optimization high-resolution network for human pose recognition based on attention mechanism

Abstract

Access this article

Similar content being viewed by others

Human pose estimation based on feature enhancement and multi-scale feature fusion

An improved lightweight high-resolution network based on multi-dimensional weighting for human pose estimation

Scale-aware attention-based multi-resolution representation for multi-person pose estimation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An optimization high-resolution network for human pose recognition based on attention mechanism

Abstract

Access this article

Similar content being viewed by others

Human pose estimation based on feature enhancement and multi-scale feature fusion

An improved lightweight high-resolution network based on multi-dimensional weighting for human pose estimation

Scale-aware attention-based multi-resolution representation for multi-person pose estimation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation