Skip to main content
Log in

Action recognition method based on multi-stream attention-enhanced recursive graph convolution

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Skeleton-based action recognition methods have become a research hotspot due to their robustness against variations in lighting, complex backgrounds, and viewpoint changes. Addressing the issues of long-distance joint associations and time-varying joint correlations in skeleton data, this paper proposes a Multi-Stream Attention-Enhanced Recursive Graph Convolution method for action recognition. This method extracts four types of features from the skeleton data: joints, bones, joint movements, and bone movements. It models the potential relationships between non-adjacent nodes through adaptive graph convolution and utilizes a long short-term memory network for recursive learning of the graph structure to capture the temporal correlations of joints. Additionally, a spatio-temporal channel attention module is introduced to enable the model to focus more on important joints, frames, and channel features, further improving performance. Finally, the recognition results of the four branches are fused at the decision level to complete the action recognition. Experimental results on public datasets (UTD-MHAD, CZU-MHAD) and a self-constructed dataset (KTH-Skeleton) demonstrate that the proposed method achieves accuracies of 94.65%, 95.01%, and 97.50%, respectively, fully proving the good performance of the method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Availability of data and materials

The data sets analyzed in the current study are publicly available data sets: https://personal.utdallas.edu/~kehtar/UTD-MHAD.html,https://github.com/yujmo/CZU_MHAD.

Code availability

Code availability not applicable.

References

  1. Saroja M, Baskaran K, Priyanka P (2021) Human pose estimation approaches for human activity recognition. In: 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), pp 1–4

  2. Wang H, Yang J, Cui C, Tu P, Li J, Fu B, Xiang W (2024) Human activity recognition based on local linear embedding and geodesic flow kernel on grassmann manifolds. Expert Syst Appl 241:122696

    Article  Google Scholar 

  3. Zheng B, Chen L, Wu M, Pedrycz W, Hirota K (2022) Skeleton-based action recognition using two-stream graph convolutional network with pose refinement. In: 2022 41st Chinese Control Conference (CCC), pp 6353–6356

  4. Aggarwal JK, Xia L (2014) Human activity recognition from 3d data: a review. Pattern Recognit Lett 48:70–80

    Article  Google Scholar 

  5. Liu K, Gao L, Khan NM, Qi L, Guan L (2021) Integrating vertex and edge features with graph convolutional networks for skeleton-based action recognition. Neurocomputing 466:190–201

    Article  Google Scholar 

  6. Feng L, Zhao Y, Zhao W, Tang J (2022) A comparative review of graph convolutional networks for human skeleton-based action recognition. Artif Intell Rev 1–31

  7. Si C, Jing Y, Wang W, Wang L, Tan T (2018) Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 103–118

  8. Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118

  9. Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE International conference on computer vision, pp 2117–2126

  10. Zhang H, Song Y, Zhang Y (2019) Graph convolutional lstm model for skeleton-based action recognition. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp 412–417

  11. Wang H, Wang L (2017) Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 499–508

  12. Yang W, Zhang J, Cai J, Xu Z (2023) Hybridnet: Integrating gcn and cnn for skeleton-based action recognition. Appl Intell 53(1):574–585

    Article  Google Scholar 

  13. Hou Y, Li Z, Wang P, Li W (2016) Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans Circuits Syst Video Technol 28(3):807–811

    Article  Google Scholar 

  14. Cao C, Lan C, Zhang Y, Zeng W, Lu H, Zhang Y (2018) Skeleton-based action recognition with gated convolutional neural networks. IEEE Trans Circuits Syst Video Technol 29(11):3247–3257

    Article  Google Scholar 

  15. Li C, Hou Y, Wang P, Li W (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628

    Article  Google Scholar 

  16. Khezeli F, Mohammadzade H (2019) Time-invariant 3d human action recognition with positive and negative movement memory using convolutional neural networks. In: 2019 4th International Conference on Pattern Recognition and Image Analysis (IPRIA), pp 26–31

  17. Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit 68:346–362

    Article  Google Scholar 

  18. Caetano C, Brémond F, Schwartz WR (2019) Skeleton image representation for 3d action recognition based on tree structure and reference joints. In: 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pp 16–23

  19. Ke Q, An S, Bennamoun M, Sohel F, Boussaid F (2017) Skeletonnet: Mining deep part features for 3-d action recognition. IEEE Signal Process Lett 24(6):731–735

    Article  Google Scholar 

  20. Li B, He M, Dai Y, Cheng X, Chen Y (2018) 3d skeleton based action recognition by video-domain translation-scale invariant mapping and multi-scale dilated cnn. Multimed Tools Appl 77:22901–22921

    Article  Google Scholar 

  21. Zhang H, Zhang X, Yu D, Guan L, Wang D, Zhou F, Zhang W (2023) Multi-modality adaptive feature fusion graph convolutional network for skeleton-based action recognition. Sensors 23(12):5414

    Article  Google Scholar 

  22. Zhu Q, Deng H (2023) Spatial adaptive graph convolutional network for skeleton-based action recognition. Appl Intell 1–13

  23. Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  24. Tang Y, Tian Y, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5323–5332

  25. Qin Y, Mo L, Li C, Luo J (2020) Skeleton-based action recognition by part-aware graph convolutional networks. Vis Comput 36:621–631

    Article  Google Scholar 

  26. Zhang X, Xu C, Tian X, Tao D (2019) Graph edge convolutional neural networks for skeleton-based action recognition. IEEE Trans Neural Netw Learn Syst 31(8):3047–3060

    Article  Google Scholar 

  27. Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7912–7921

  28. Lu L, Lu Y, Yu R, Di H, Zhang L, Wang S (2019) Gaim: Graph attention interaction model for collective activity recognition. IEEE Trans Multimedia 22(2):524–539

    Article  Google Scholar 

  29. Song Y-F, Zhang Z, Shan C, Wang L (2020) Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Trans Circuits Syst Video Technol 31(5):1915–1925

    Article  Google Scholar 

  30. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  31. Diba A, Fayyaz M, Sharma V, Arzani MM, Yousefzadeh R, Gall J, Van Gool L (2018) Spatio-temporal channel correlation networks for action classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 284–299

  32. Yu L, Tian L, Du Q, Bhutto JA (2022) Multi-stream adaptive spatial-temporal attention graph convolutional network for skeleton-based action recognition. IET Comput Vis 16(2):143–158

    Article  Google Scholar 

  33. Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12026–12035

  34. Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE international conference on computer vision, pp 2117–2126

  35. Usmani A, Siddiqui N, Islam S (2023) Skeleton joint trajectories based human activity recognition using deep rnn. Multimed Tools Appl 1–25

  36. Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: 2012 IEEE Conference on computer vision and pattern recognition, pp 1290–1297

  37. Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1227–1236

  38. Chen C, Jafari R, Kehtarnavaz N (2015) Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International Conference on Image Processing (ICIP), pp 168–172. https://doi.org/10.1109/ICIP.2015.7350781

  39. Chao X, Hou Z, Mo Y (2022) Czu-mhad: A multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors. IEEE Sens J 22(7):7034–7042. https://doi.org/10.1109/JSEN.2022.3150225

    Article  Google Scholar 

  40. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: Proceedings of the 17th International conference on pattern recognition, 2004. ICPR 2004., vol 3, pp 32–36

  41. Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun, J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103–7112

  42. Ahmad Z, Khan N (2018) Towards improved human action recognition using convolutional neural networks and multimodal fusion of depth and inertial sensor data, 223–230

  43. Wang X, Lv T, Gan Z, He M, Jin L (2021) Fusion of skeleton and inertial data for human action recognition based on skeleton motion maps and dilated convolution. IEEE Sens J 21(21):24653–24664

    Article  Google Scholar 

  44. Liu J, Shahroudy A, Xu D, Kot AC, Wang G (2017) Skeleton-based action recognition using spatio-temporal lstm network with trust gates. IEEE Trans Pattern Anal Mach Intell 40(12):3007–3021

    Article  Google Scholar 

  45. Soo Kim T, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 20–28

  46. Zhu G, Zhang L, Li H, Shen P, Shah SAA, Bennamoun M (2020) Topology-learnable graph convolution for skeleton-based action recognition. Pattern Recognit Lett 135:286–292

    Article  Google Scholar 

  47. Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 143–152

  48. Yoon Y, Yu J, Jeon M (2022) Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition. Appl Intell 1–15

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 61971347), Doctoral Innovation Foundation of Xi’an University of Technology (No. 252072118), Natural Science Foundation of Shaanxi Province of China (No. 2021JM-344), Xi’an Science and Technology Plan Project (2022JH-RYFW-007) (24DCYJSGG0020), Key research and development program of Shaanxi Province (2022SF-353).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Huaijun Wang and Junhuai Li are responsible for the overall research design and guiding research work, providing research direction and guidance, and assisting in the writing and proofreading of papers. Hui Ke participated in the experiment, data collection, data analysis and other work, and also participated in the writing of the paper. Xiang Wei not only provided resource support, but also provided academic support and guidance. All authors read and approved the fnal manuscript.

Corresponding author

Correspondence to Junhuai Li.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Bai, B., Li, J. et al. Action recognition method based on multi-stream attention-enhanced recursive graph convolution. Appl Intell 54, 10133–10147 (2024). https://doi.org/10.1007/s10489-024-05719-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05719-0

Keywords