Skip to main content
Log in

Mixed graph convolution and residual transformation network for skeleton-based action recognition

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Action recognition based on a human skeleton is an extremely challenging research problem. The temporal information contained in the human skeleton is more difficult to extract than the spatial information. Many researchers focus on graph convolution networks and apply them to action recognition. In this study, an action recognition method based on a two-stream network called RNXt-GCN is proposed on the basis of the Spatial-Temporal Graph Convolutional Network (ST-GCN). The human skeleton is converted first into a spatial-temporal graph and a SkeleMotion image which are input into ST-GCN and ResNeXt, respectively, for performing the spatial-temporal convolution. The convolved features are then fused. The proposed method models the temporal information in action from the amplitude and direction of the action and addresses the shortcomings of isolated temporal information in the ST-GCN. The experiments are comprehensively performed on the four datasets: 1) UTD-MHAD, 2) Northwestern-UCLA, 3) NTU RGB-D 60, and 4) NTU RGB-D 120. The proposed model shows very competitive results compared with other models in our experiments. On the experiments of NTU RGB + D 120 dataset, our proposed model outperforms those of the state-of-the-art two-stream models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In AAAI:1–8

  2. Xie S, Girshick R, Dollár P, et al. (2017) Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 1492–1500

  3. Chen C . (2015) UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor[C]// IEEE International Conference on Image Processing. IEEE

  4. Wang J, Nie X, Xia Y, et al. (2014) Cross-view action modeling, learning and recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2649-2656

  5. Shahroudy A, Liu J, Ng TT et al (2016) NTU RGB+D: a large scale dataset for 3D human activity analysis[C]// IEEE computer. Society:1010–1019

  6. Liu J, Shahroudy A, Perez M et al (2019) NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding.[J]. IEEE Trans Pattern Anal Mach Intell:1–1

  7. Li J, Wong Y, Zhao Q, et al. (2018) Unsupervised learning of view-invariant action representations[C] //Advances in Neural Information Processing Systems(NIPS). 1254-1264

  8. Fiorini L, Mancioppi G, Semeraro F, Fujita H, Cavallo F (2020) Unsupervised emotional state classification through physiological parameters for social robotics applications[J]. Knowl-Based Syst 190:105217

    Article  Google Scholar 

  9. Han F, Reily B, Hoff W, Zhang H (2017) Space-time representation of people based on 3D skeletal data: A review[J]. Comput Vis Image Underst 158:85–105

    Article  Google Scholar 

  10. Wang P, Li W, Ogunbona P, Wan J, Escalera S (2018) RGB-D-based human motion recognition with deep learning: A survey[J]. Comput Vis Image Underst 171:118–139

    Article  Google Scholar 

  11. Mengyuan Liu, Hong Liu, and Chen Chen (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition

  12. Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2019) View adaptive neural networks for high performance skeleton-based human action recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence 41:1963–1978

    Article  Google Scholar 

  13. Hou Y, Li Z, Wang P, Li W (2018) Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks[J]. IEEE Transactions on Circuits & Systems for Video Technology 28(3):807–811

    Article  Google Scholar 

  14. Li S, Li W, Cook C, et al. (2018) Independently recurrent neural network (indrnn): Building a longer and deeper rnn[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 5457–5466

  15. Hu G, Cui B, Yu S (2019) Skeleton-based action recognition with synchronous local and non-local spatio-temporal learning and frequency attention[C]// Proceedings of the IEEE International Conference on Multimedia and Expo (ICME). 1216–1221

  16. Li C, Zhong Q, Xie D, et al. (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation[C]// Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence(IJCAI). 786–792

  17. Ke Q, An S, Bennamoun M, Sohel F, Boussaid F (2017) SkeletonNet: mining deep part features for 3-D action recognition[J]. IEEE Signal Processing Letters 24(6):731–735

    Article  Google Scholar 

  18. Liu J, Shahroudy A, Xu D, Kot AC, Wang G (2018) Skeleton-based action recognition using Spatio-temporal LSTM network with trust gates[J]. IEEE Trans Pattern Anal Mach Intell 40(12):3007–3021

    Article  Google Scholar 

  19. Chenyang SI, et al. (2019) An Attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). p. 1227–1236

  20. Shi L, Zhang Y, Cheng J, et al. (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 12026–12035

  21. Wu C, Wu X J, Kittler J. (2019) Spatial residual layer and dense connection block enhanced spatial temporal graph convolutional network for skeleton-based action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops. 0–0

  22. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos[C]//Advances in Neural Information Processing Systems(NIPS). 568–576

  23. Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks[J]. ICLR

  24. Simonyan K , Zisserman A (2015) Very deep convolutional networks for large-scale image recognition in: ICLR

  25. He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR): 770–778

  26. Szegedy C, Liu W, Jia Y, et al. (2015) Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR). 1–9

  27. Du Y, Fu Y, Wang L (2015) Skeleton based action recognition with convolutional neural network[C]//2015 3rd IAPR Asian conference on pattern recognition (ACPR). IEEE:579–583

  28. Choutas V, Weinzaepfel P, Revaud J, et al. Potion: Pose motion representation for action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2018: 7024–7033

  29. Ke Q, Bennamoun M, An S, et al. A new representation of skeleton sequences for 3d action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2017: 3288–3297

  30. Li C, Zhong Q, Xie D et al (2017) Skeleton-based action recognition with convolutional neural networks[C]//2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE:597–600

  31. Liu M, Chen C, Liu H (2017) 3d action recognition using data visualization and convolutional neural networks[C]//2017 IEEE international conference on multimedia and expo (ICME). IEEE:925–930

  32. Wang P, Li W, Li C, Hou Y (2018) Action recognition based on joint trajectory maps with convolutional neural networks[J]. Knowl-Based Syst 158:43–53

    Article  Google Scholar 

  33. Wang P, Li Z, Hou Y, et al. (2016) Action recognition based on joint trajectory maps using convolutional neural networks[C]//Proceedings of the 24th ACM international conference on Multimedia. 102–106

  34. Zhengyuan Y , Yuncheng L , Jianchao Y , et al. (2018) Action recognition with spatio-temporal visual attention on skeleton image sequences[J]. IEEE Transactions on Circuits and Systems for Video Technology, pp:1–1

  35. Caetano C, Sena J, Brémond F et al (2019) Skelemotion: a new representation of skeleton joint sequences based on motion information for 3d action recognition[C]. 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) IEEE:1–8

  36. Zhou L, Li W, Zhang Y et al (2014) Discriminative key pose extraction using extended LC-KSVD for action recognition[C]. International Conference on Digital Lmage Computing: Techniques & Applications:1–8

  37. Wang J, Liu Z, Wu Y, Yuan J (2014) Learning actionlet ensemble for 3D human action recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence 36(5):914–927

    Article  Google Scholar 

  38. Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group[C]. Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR):588–595

  39. Li M, Chen S, Chen X et al (2019) Actional-structural graph convolutional networks for skeleton-based action recognition[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR):3595–3603

  40. Caetano C, Brémond F, Schwartz WR (2019) Skeleton image representation for 3d action recognition based on tree structure and reference joints[C]//2019 32nd SIBGRAPI conference on graphics, patterns and images (SIBGRAPI). IEEE:16–23

  41. Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. (2019) Skeleton-based action recognition with directed graph neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pages 7912–7921

  42. Liu J, Akhtar N, Mian A. Skepxels: Spatio-temporal image representation of human skeleton joints for action recognition[C]//CVPR workshops. 2019

    Google Scholar 

  43. Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang (2016) Spatio-temporal LSTM with trust gates for 3d human action recognition. In European Conference on Computer Vision(ECCV), pages 816–833. Springer

  44. Liu J, Wang G, Duan L-Y, Abdiyeva K, Kot AC (2017) Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Transactions on Image Processing 27(4):1586–1599

    Article  MathSciNet  Google Scholar 

  45. Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2018) Learning clip representations for skeleton-based 3d action recognition. IEEE Trans Image Process 27(6):2842–2855

    Article  MathSciNet  Google Scholar 

  46. Liu M, Yuan J (2018) Recognizing human actions as the evolution of pose estimation maps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR):1159–1168

  47. Dong J, Gao Y, Lee HJ, Zhou H, Yao Y, Fang Z, Huang B (2020) Action recognition based on the fusion of graph convolutional networks with high order features. Applied Sciences 10(4):1482

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported partially by the project of Jilin Provincial Science and Technology Department under the Grant 20180201003GX and the project of Jilin province development and reform commission under the Grant 2019C053-4. The authors gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.

Funding

This research was funded by the project of Jilin Provincial Science and Technology Department under the Grant 20180201003GX and the APC was funded by Grant 20180201003GX too.

Author information

Authors and Affiliations

Authors

Contributions

This study was completed by the co-authors. Shuhua Liu conceived the research and wrote the draft. The major experiments and analyses were undertaken by Xiaoying Bai and Ming Fang. Lanting Li was responsible for data processing and drawing figures. Chih-Cheng Hung edited and reviewed the paper. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Chih-Cheng Hung.

Ethics declarations

Conflict of interest

Authors declare no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, S., Bai, X., Fang, M. et al. Mixed graph convolution and residual transformation network for skeleton-based action recognition. Appl Intell 52, 1544–1555 (2022). https://doi.org/10.1007/s10489-021-02517-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02517-w

Keywords

Navigation