Abstract
Skeleton-based action recognition methods have become a research hotspot due to their robustness against variations in lighting, complex backgrounds, and viewpoint changes. Addressing the issues of long-distance joint associations and time-varying joint correlations in skeleton data, this paper proposes a Multi-Stream Attention-Enhanced Recursive Graph Convolution method for action recognition. This method extracts four types of features from the skeleton data: joints, bones, joint movements, and bone movements. It models the potential relationships between non-adjacent nodes through adaptive graph convolution and utilizes a long short-term memory network for recursive learning of the graph structure to capture the temporal correlations of joints. Additionally, a spatio-temporal channel attention module is introduced to enable the model to focus more on important joints, frames, and channel features, further improving performance. Finally, the recognition results of the four branches are fused at the decision level to complete the action recognition. Experimental results on public datasets (UTD-MHAD, CZU-MHAD) and a self-constructed dataset (KTH-Skeleton) demonstrate that the proposed method achieves accuracies of 94.65%, 95.01%, and 97.50%, respectively, fully proving the good performance of the method.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
The data sets analyzed in the current study are publicly available data sets: https://personal.utdallas.edu/~kehtar/UTD-MHAD.html,https://github.com/yujmo/CZU_MHAD.
Code availability
Code availability not applicable.
References
Saroja M, Baskaran K, Priyanka P (2021) Human pose estimation approaches for human activity recognition. In: 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), pp 1–4
Wang H, Yang J, Cui C, Tu P, Li J, Fu B, Xiang W (2024) Human activity recognition based on local linear embedding and geodesic flow kernel on grassmann manifolds. Expert Syst Appl 241:122696
Zheng B, Chen L, Wu M, Pedrycz W, Hirota K (2022) Skeleton-based action recognition using two-stream graph convolutional network with pose refinement. In: 2022 41st Chinese Control Conference (CCC), pp 6353–6356
Aggarwal JK, Xia L (2014) Human activity recognition from 3d data: a review. Pattern Recognit Lett 48:70–80
Liu K, Gao L, Khan NM, Qi L, Guan L (2021) Integrating vertex and edge features with graph convolutional networks for skeleton-based action recognition. Neurocomputing 466:190–201
Feng L, Zhao Y, Zhao W, Tang J (2022) A comparative review of graph convolutional networks for human skeleton-based action recognition. Artif Intell Rev 1–31
Si C, Jing Y, Wang W, Wang L, Tan T (2018) Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 103–118
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE International conference on computer vision, pp 2117–2126
Zhang H, Song Y, Zhang Y (2019) Graph convolutional lstm model for skeleton-based action recognition. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp 412–417
Wang H, Wang L (2017) Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 499–508
Yang W, Zhang J, Cai J, Xu Z (2023) Hybridnet: Integrating gcn and cnn for skeleton-based action recognition. Appl Intell 53(1):574–585
Hou Y, Li Z, Wang P, Li W (2016) Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans Circuits Syst Video Technol 28(3):807–811
Cao C, Lan C, Zhang Y, Zeng W, Lu H, Zhang Y (2018) Skeleton-based action recognition with gated convolutional neural networks. IEEE Trans Circuits Syst Video Technol 29(11):3247–3257
Li C, Hou Y, Wang P, Li W (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628
Khezeli F, Mohammadzade H (2019) Time-invariant 3d human action recognition with positive and negative movement memory using convolutional neural networks. In: 2019 4th International Conference on Pattern Recognition and Image Analysis (IPRIA), pp 26–31
Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit 68:346–362
Caetano C, Brémond F, Schwartz WR (2019) Skeleton image representation for 3d action recognition based on tree structure and reference joints. In: 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pp 16–23
Ke Q, An S, Bennamoun M, Sohel F, Boussaid F (2017) Skeletonnet: Mining deep part features for 3-d action recognition. IEEE Signal Process Lett 24(6):731–735
Li B, He M, Dai Y, Cheng X, Chen Y (2018) 3d skeleton based action recognition by video-domain translation-scale invariant mapping and multi-scale dilated cnn. Multimed Tools Appl 77:22901–22921
Zhang H, Zhang X, Yu D, Guan L, Wang D, Zhou F, Zhang W (2023) Multi-modality adaptive feature fusion graph convolutional network for skeleton-based action recognition. Sensors 23(12):5414
Zhu Q, Deng H (2023) Spatial adaptive graph convolutional network for skeleton-based action recognition. Appl Intell 1–13
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Tang Y, Tian Y, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5323–5332
Qin Y, Mo L, Li C, Luo J (2020) Skeleton-based action recognition by part-aware graph convolutional networks. Vis Comput 36:621–631
Zhang X, Xu C, Tian X, Tao D (2019) Graph edge convolutional neural networks for skeleton-based action recognition. IEEE Trans Neural Netw Learn Syst 31(8):3047–3060
Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7912–7921
Lu L, Lu Y, Yu R, Di H, Zhang L, Wang S (2019) Gaim: Graph attention interaction model for collective activity recognition. IEEE Trans Multimedia 22(2):524–539
Song Y-F, Zhang Z, Shan C, Wang L (2020) Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Trans Circuits Syst Video Technol 31(5):1915–1925
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Diba A, Fayyaz M, Sharma V, Arzani MM, Yousefzadeh R, Gall J, Van Gool L (2018) Spatio-temporal channel correlation networks for action classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 284–299
Yu L, Tian L, Du Q, Bhutto JA (2022) Multi-stream adaptive spatial-temporal attention graph convolutional network for skeleton-based action recognition. IET Comput Vis 16(2):143–158
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12026–12035
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE international conference on computer vision, pp 2117–2126
Usmani A, Siddiqui N, Islam S (2023) Skeleton joint trajectories based human activity recognition using deep rnn. Multimed Tools Appl 1–25
Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: 2012 IEEE Conference on computer vision and pattern recognition, pp 1290–1297
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1227–1236
Chen C, Jafari R, Kehtarnavaz N (2015) Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International Conference on Image Processing (ICIP), pp 168–172. https://doi.org/10.1109/ICIP.2015.7350781
Chao X, Hou Z, Mo Y (2022) Czu-mhad: A multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors. IEEE Sens J 22(7):7034–7042. https://doi.org/10.1109/JSEN.2022.3150225
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: Proceedings of the 17th International conference on pattern recognition, 2004. ICPR 2004., vol 3, pp 32–36
Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun, J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103–7112
Ahmad Z, Khan N (2018) Towards improved human action recognition using convolutional neural networks and multimodal fusion of depth and inertial sensor data, 223–230
Wang X, Lv T, Gan Z, He M, Jin L (2021) Fusion of skeleton and inertial data for human action recognition based on skeleton motion maps and dilated convolution. IEEE Sens J 21(21):24653–24664
Liu J, Shahroudy A, Xu D, Kot AC, Wang G (2017) Skeleton-based action recognition using spatio-temporal lstm network with trust gates. IEEE Trans Pattern Anal Mach Intell 40(12):3007–3021
Soo Kim T, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 20–28
Zhu G, Zhang L, Li H, Shen P, Shah SAA, Bennamoun M (2020) Topology-learnable graph convolution for skeleton-based action recognition. Pattern Recognit Lett 135:286–292
Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 143–152
Yoon Y, Yu J, Jeon M (2022) Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition. Appl Intell 1–15
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No. 61971347), Doctoral Innovation Foundation of Xi’an University of Technology (No. 252072118), Natural Science Foundation of Shaanxi Province of China (No. 2021JM-344), Xi’an Science and Technology Plan Project (2022JH-RYFW-007) (24DCYJSGG0020), Key research and development program of Shaanxi Province (2022SF-353).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Huaijun Wang and Junhuai Li are responsible for the overall research design and guiding research work, providing research direction and guidance, and assisting in the writing and proofreading of papers. Hui Ke participated in the experiment, data collection, data analysis and other work, and also participated in the writing of the paper. Xiang Wei not only provided resource support, but also provided academic support and guidance. All authors read and approved the fnal manuscript.
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflicts of interest.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, H., Bai, B., Li, J. et al. Action recognition method based on multi-stream attention-enhanced recursive graph convolution. Appl Intell 54, 10133–10147 (2024). https://doi.org/10.1007/s10489-024-05719-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05719-0