Action recognition method based on multi-stream attention-enhanced recursive graph convolution

Wang, Huaijun; Bai, Bingqian; Li, Junhuai; Ke, Hui; Xiang, Wei

doi:10.1007/s10489-024-05719-0

Action recognition method based on multi-stream attention-enhanced recursive graph convolution

Published: 07 August 2024

Volume 54, pages 10133–10147, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Huaijun Wang^1,2,
Bingqian Bai¹,
Junhuai Li ORCID: orcid.org/0000-0001-5483-5175^1,2,
Hui Ke¹ &
…
Wei Xiang³

239 Accesses
Explore all metrics

Abstract

Skeleton-based action recognition methods have become a research hotspot due to their robustness against variations in lighting, complex backgrounds, and viewpoint changes. Addressing the issues of long-distance joint associations and time-varying joint correlations in skeleton data, this paper proposes a Multi-Stream Attention-Enhanced Recursive Graph Convolution method for action recognition. This method extracts four types of features from the skeleton data: joints, bones, joint movements, and bone movements. It models the potential relationships between non-adjacent nodes through adaptive graph convolution and utilizes a long short-term memory network for recursive learning of the graph structure to capture the temporal correlations of joints. Additionally, a spatio-temporal channel attention module is introduced to enable the model to focus more on important joints, frames, and channel features, further improving performance. Finally, the recognition results of the four branches are fused at the decision level to complete the action recognition. Experimental results on public datasets (UTD-MHAD, CZU-MHAD) and a self-constructed dataset (KTH-Skeleton) demonstrate that the proposed method achieves accuracies of 94.65%, 95.01%, and 97.50%, respectively, fully proving the good performance of the method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Multi-stream adaptive 3D attention graph convolution network for skeleton-based action recognition

Article 04 November 2022

Combining channel-wise joint attention and temporal attention in graph convolutional networks for skeleton-based action recognition

Article 31 December 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Availability of data and materials

The data sets analyzed in the current study are publicly available data sets: https://personal.utdallas.edu/~kehtar/UTD-MHAD.html,https://github.com/yujmo/CZU_MHAD.

Code availability

Code availability not applicable.

References

Saroja M, Baskaran K, Priyanka P (2021) Human pose estimation approaches for human activity recognition. In: 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), pp 1–4
Wang H, Yang J, Cui C, Tu P, Li J, Fu B, Xiang W (2024) Human activity recognition based on local linear embedding and geodesic flow kernel on grassmann manifolds. Expert Syst Appl 241:122696
Article Google Scholar
Zheng B, Chen L, Wu M, Pedrycz W, Hirota K (2022) Skeleton-based action recognition using two-stream graph convolutional network with pose refinement. In: 2022 41st Chinese Control Conference (CCC), pp 6353–6356
Aggarwal JK, Xia L (2014) Human activity recognition from 3d data: a review. Pattern Recognit Lett 48:70–80
Article Google Scholar
Liu K, Gao L, Khan NM, Qi L, Guan L (2021) Integrating vertex and edge features with graph convolutional networks for skeleton-based action recognition. Neurocomputing 466:190–201
Article Google Scholar
Feng L, Zhao Y, Zhao W, Tang J (2022) A comparative review of graph convolutional networks for human skeleton-based action recognition. Artif Intell Rev 1–31
Si C, Jing Y, Wang W, Wang L, Tan T (2018) Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 103–118
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE International conference on computer vision, pp 2117–2126
Zhang H, Song Y, Zhang Y (2019) Graph convolutional lstm model for skeleton-based action recognition. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp 412–417
Wang H, Wang L (2017) Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 499–508
Yang W, Zhang J, Cai J, Xu Z (2023) Hybridnet: Integrating gcn and cnn for skeleton-based action recognition. Appl Intell 53(1):574–585
Article Google Scholar
Hou Y, Li Z, Wang P, Li W (2016) Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans Circuits Syst Video Technol 28(3):807–811
Article Google Scholar
Cao C, Lan C, Zhang Y, Zeng W, Lu H, Zhang Y (2018) Skeleton-based action recognition with gated convolutional neural networks. IEEE Trans Circuits Syst Video Technol 29(11):3247–3257
Article Google Scholar
Li C, Hou Y, Wang P, Li W (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628
Article Google Scholar
Khezeli F, Mohammadzade H (2019) Time-invariant 3d human action recognition with positive and negative movement memory using convolutional neural networks. In: 2019 4th International Conference on Pattern Recognition and Image Analysis (IPRIA), pp 26–31
Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit 68:346–362
Article Google Scholar
Caetano C, Brémond F, Schwartz WR (2019) Skeleton image representation for 3d action recognition based on tree structure and reference joints. In: 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pp 16–23
Ke Q, An S, Bennamoun M, Sohel F, Boussaid F (2017) Skeletonnet: Mining deep part features for 3-d action recognition. IEEE Signal Process Lett 24(6):731–735
Article Google Scholar
Li B, He M, Dai Y, Cheng X, Chen Y (2018) 3d skeleton based action recognition by video-domain translation-scale invariant mapping and multi-scale dilated cnn. Multimed Tools Appl 77:22901–22921
Article Google Scholar
Zhang H, Zhang X, Yu D, Guan L, Wang D, Zhou F, Zhang W (2023) Multi-modality adaptive feature fusion graph convolutional network for skeleton-based action recognition. Sensors 23(12):5414
Article Google Scholar
Zhu Q, Deng H (2023) Spatial adaptive graph convolutional network for skeleton-based action recognition. Appl Intell 1–13
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Tang Y, Tian Y, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5323–5332
Qin Y, Mo L, Li C, Luo J (2020) Skeleton-based action recognition by part-aware graph convolutional networks. Vis Comput 36:621–631
Article Google Scholar
Zhang X, Xu C, Tian X, Tao D (2019) Graph edge convolutional neural networks for skeleton-based action recognition. IEEE Trans Neural Netw Learn Syst 31(8):3047–3060
Article Google Scholar
Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7912–7921
Lu L, Lu Y, Yu R, Di H, Zhang L, Wang S (2019) Gaim: Graph attention interaction model for collective activity recognition. IEEE Trans Multimedia 22(2):524–539
Article Google Scholar
Song Y-F, Zhang Z, Shan C, Wang L (2020) Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Trans Circuits Syst Video Technol 31(5):1915–1925
Article Google Scholar
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Diba A, Fayyaz M, Sharma V, Arzani MM, Yousefzadeh R, Gall J, Van Gool L (2018) Spatio-temporal channel correlation networks for action classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 284–299
Yu L, Tian L, Du Q, Bhutto JA (2022) Multi-stream adaptive spatial-temporal attention graph convolutional network for skeleton-based action recognition. IET Comput Vis 16(2):143–158
Article Google Scholar
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12026–12035
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE international conference on computer vision, pp 2117–2126
Usmani A, Siddiqui N, Islam S (2023) Skeleton joint trajectories based human activity recognition using deep rnn. Multimed Tools Appl 1–25
Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: 2012 IEEE Conference on computer vision and pattern recognition, pp 1290–1297
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1227–1236
Chen C, Jafari R, Kehtarnavaz N (2015) Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International Conference on Image Processing (ICIP), pp 168–172. https://doi.org/10.1109/ICIP.2015.7350781
Chao X, Hou Z, Mo Y (2022) Czu-mhad: A multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors. IEEE Sens J 22(7):7034–7042. https://doi.org/10.1109/JSEN.2022.3150225
Article Google Scholar
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: Proceedings of the 17th International conference on pattern recognition, 2004. ICPR 2004., vol 3, pp 32–36
Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun, J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103–7112
Ahmad Z, Khan N (2018) Towards improved human action recognition using convolutional neural networks and multimodal fusion of depth and inertial sensor data, 223–230
Wang X, Lv T, Gan Z, He M, Jin L (2021) Fusion of skeleton and inertial data for human action recognition based on skeleton motion maps and dilated convolution. IEEE Sens J 21(21):24653–24664
Article Google Scholar
Liu J, Shahroudy A, Xu D, Kot AC, Wang G (2017) Skeleton-based action recognition using spatio-temporal lstm network with trust gates. IEEE Trans Pattern Anal Mach Intell 40(12):3007–3021
Article Google Scholar
Soo Kim T, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 20–28
Zhu G, Zhang L, Li H, Shen P, Shah SAA, Bennamoun M (2020) Topology-learnable graph convolution for skeleton-based action recognition. Pattern Recognit Lett 135:286–292
Article Google Scholar
Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 143–152
Yoon Y, Yu J, Jeon M (2022) Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition. Appl Intell 1–15

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 61971347), Doctoral Innovation Foundation of Xi’an University of Technology (No. 252072118), Natural Science Foundation of Shaanxi Province of China (No. 2021JM-344), Xi’an Science and Technology Plan Project (2022JH-RYFW-007) (24DCYJSGG0020), Key research and development program of Shaanxi Province (2022SF-353).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Xi’an University of Technology, No. 5 South Jinhua Road, Xi’an, 710048, Shaanxi, China
Huaijun Wang, Bingqian Bai, Junhuai Li & Hui Ke
Shaanxi Key Laboratory for Network Computing and Security Technology, No. 5 South Jinhua Road, Xi’an, 710048, Shaanxi, China
Huaijun Wang & Junhuai Li
School of Computing, Engineering and Mathematical Sciences, La Trobe University, Melbourne, 3086, Australia
Wei Xiang

Authors

Huaijun Wang
View author publications
You can also search for this author inPubMed Google Scholar
Bingqian Bai
View author publications
You can also search for this author inPubMed Google Scholar
Junhuai Li
View author publications
You can also search for this author inPubMed Google Scholar
Hui Ke
View author publications
You can also search for this author inPubMed Google Scholar
Wei Xiang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Huaijun Wang and Junhuai Li are responsible for the overall research design and guiding research work, providing research direction and guidance, and assisting in the writing and proofreading of papers. Hui Ke participated in the experiment, data collection, data analysis and other work, and also participated in the writing of the paper. Xiang Wei not only provided resource support, but also provided academic support and guidance. All authors read and approved the fnal manuscript.

Corresponding author

Correspondence to Junhuai Li.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, H., Bai, B., Li, J. et al. Action recognition method based on multi-stream attention-enhanced recursive graph convolution. Appl Intell 54, 10133–10147 (2024). https://doi.org/10.1007/s10489-024-05719-0

Download citation

Accepted: 27 July 2024
Published: 07 August 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s10489-024-05719-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Action recognition method based on multi-stream attention-enhanced recursive graph convolution

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Multi-stream adaptive 3D attention graph convolution network for skeleton-based action recognition

Combining channel-wise joint attention and temporal attention in graph convolutional networks for skeleton-based action recognition

Explore related subjects

Availability of data and materials

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now