Skip to main content
Log in

An optimization high-resolution network for human pose recognition based on attention mechanism

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In the high-resolution network (HRNet), the low layer of low resolution part can adopt shallow parallel network structure to maintain the high-resolution features and highlight global features. However, the high-resolution human posture estimation network has the problems of large amount of network parameters, high complex calculation and low recognition precision of similar actions. To solve these problems, we proposed an optimized HRNet based on attention mechanism. Firstly, the dilated convolution (DC) module is introduced into cross-channel sampling to obtain global features by increasing the receptive field of the feature map, which ensures that the feature map can cover all the information of the original image; Secondly, the channel attention Squeeze-and-Excitation (SE) module is introduced in the process of cross-channel feature fusion to learn the correlations, which can recalibrate the features, highlight the information features selectively and suppress the secondary features, improving the recognition precision without changing the parameter quantity and operation complexity; Finally, the experiment results on KTH dataset show that the HRNet with channel attention mechanism and dilated convolution has better accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available in the [KTH, Coco2017] repository, [http://www.nada.kth.se/cvap/actions/], [http://cocodataset.org/].

References

  1. Peng C (2015) Pose Estimation Using Local Adjustment with Mixtures-of-parts Models. J Fiber Bioeng Informat 8(2):249–258

    Article  Google Scholar 

  2. Bin X, Haiping W, Yichen W (2018) Simple Baselines for Human Pose Estimation and Tracking. Proc Eur Conference Comput Vis:1208-1215

  3. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. Proc IEEE Comput Soc Conference Comput Vis Pattern Recog 1:886–893

    Google Scholar 

  4. Sang S, Huang Z, Kang Z (2018) A Human Activity Recognition Method using the Maximum Optical Flow based Feature Bounding Box. Proc Int Conference Machine Learn Compu:1330-1337

  5. Wu Y, Wei L, Duan Y (2021) Deep spatiotemporal LSTM network with temporal pattern feature for 3D human action recognition. Comput Intell 99:11–23

    Google Scholar 

  6. Nazir S, Yousaf MH, Velastin SA (2018) Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition. Comput Electric Eng 72:660–669

    Article  Google Scholar 

  7. Papandreou G, Zhu T, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. Proc Int Conf Comput Vis, 144-158

  8. Huang S-L, Gong M-M, Tao D-C (2017) A coarse-fine network for keypoint localization. Proc Int Conference Comput Vis:244-252

  9. Wang H, Schmid C (2013) Action recognition with improved trajectories. Proc IEEE Int Conf Comput Vis:3551-3558

  10. Huang J, Zhu Z, Guo F, Huang G (2020) The devil is in the details: Delving into unbiased data processing for human pose estimation. Proc Eur Conf Comput Vis :246–256

  11. Liu H, Tu J, Liu M (2017) Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition. Proc IEEE Conf Comput Vis Pattern Recog:1669–1676

  12. Zhang Z, Hu Y, Chan S et al (2008) Motion context: A new representation for human action recognition. Proceedings of the European Conference on Computer Vision:817–829

  13. Patel CI, Labana D, Pandya S et al (2020) Histogram of oriented gradient-based fusion of features for human action recognition in action video sequences. Sensors 20(24):7299

    Article  Google Scholar 

  14. Nazir S, Yousaf MH, Velastin SA (2018) Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition. Computers & Electrical Engineering 72:660–669

    Article  Google Scholar 

  15. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. Proceedings of the European conference on computer vision:483–499

  16. Kaiming H, Xiangyu Z, Shaoqing R, et al (2016) Deep residual learning for image recognition. Proc IEEE Conf Comput Vis Pattern Recog:770-778

  17. Sun K, Xiao B, Liu D, et al (2019) Deep High-Resolution Representation Learning for Human Pose Estimation. Proc IEEE Conf Comput Vis Pattern Recog:5693-5703

  18. Abhronil S, Yuting et al (2019) Going Deeper in Spiking Neural Networks: VGG and Residual Architectures. Front Neurosci 13:95

    Article  Google Scholar 

  19. Zhou Z, Siddiquee MMR, Tajbakhsh N, et al (2018) Unet++: A nested u-net architecture for medical image segmentation. Deep Learn Med Image Anal Multimod Learn Clin Decision Support:3-11

  20. Papandreou G, Zhu T, Chen LC, et al (2018) Personlab: Person pose estimation and instance segmentation with a bottom-up part-based geometric embedding model. Proc Eur Conf Comput Vis:269-286

  21. Geng Z, Sun K, Xiao B, et al (2019) Bottom-Up Human Pose Estimation via Disentangled Keypoint Regression. Proc IEEE Data Driven Control Learn Syst Conf:174-187

  22. Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. Proc Int Conf Learn Represent 11:122

    Google Scholar 

  23. Jin S, Ma X, Han Z et al (2017) Towards multi-person pose tracking: Bottom-up and top-down methods. Proc IEEE Int Conf Comput Vis 2(3):7–18

    Google Scholar 

  24. Pavlakos G, Zhou X, Derpanis KG, Daniilidis K (2017) Coarse-to-fifine volumetric prediction for single-image 3D human pose. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  25. Mehta D, Rhodin D, Casas P (2017) Monocular 3d human pose estimation in the wild using improved cnn supervision. Proc IEEE Int Conf Comput Vis:506-516

  26. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. Proc IEEE conf Comput Vis Pattern Recogn:7132-7141

  27. Martinez J, Hossain R, Little JJ (2017) A simple yet effective baseline for 3d human pose estimation. Proc IEEE Int Conf Comput Vis:218-223

  28. Sun X, Shang J, Liang S, Wei Y (2017) Compositional human pose regression. Proc IEEE Int Conf Comput Vis:2702-2706

  29. Yang W, Ouyang W, Wang X (2018) 3d human pose estimation in the wild by adversarial learning. Proc IEEE Conf Comput Vis Pattern Recog:443-451

  30. Cao Z, Simon T, Wei S E, et al (2017) Realtime multi-person 2d pose estimation using part affinity fields. Proc IEEE Conf Comput Vis Pattern Recog:7291-7299

  31. Xiao S, Bin X, Yichen W (2018) Integral Human Pose Regression. Proc IEEE Eur Conf Comput Vis:1024-1032

  32. Cheng B, Xiao B, Wang J, et al (2020) Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. Proc IEEE/CVF Conf Comput Vis Pattern Recog:5386-5395

  33. Congcong L, Jie Y, Haima Y et al (2021) Improved human action recognition approach based on two-stream convolutional neural network model. Vis Comput 37(6):1327–1341

    Article  Google Scholar 

  34. Geng Z, Sun K, Xiao B, et al (2021) Bottom-up human pose estimation via disentangled keypoint regression. Proc IEEE/CVF Conf Comput Vis Pattern Recog:14676-14686

  35. Yu F, Koltun V (2016) Multi-Scale Context Aggregation by Dilated Convolutions. Proc Int Conf Learn Represent:446-456

  36. Chen LC, Papandreou G, Kokkinos I et al (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

Download references

Acknowledgements

This work is partially supported by the Natural Science Foundation of Jiangsu Province (No. BK20181340), and the National Natural Science Foundation of China (No. 61305017).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Feng.

Ethics declarations

Competing interests

The authors declare that they have no known competing fnancial interests or personal relationships that could have appeared to infuence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, J., Feng, Y. An optimization high-resolution network for human pose recognition based on attention mechanism. Multimed Tools Appl 83, 45535–45552 (2024). https://doi.org/10.1007/s11042-023-16793-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16793-w

Keywords

Navigation