Mixed graph convolution and residual transformation network for skeleton-based action recognition

Liu, Shuhua; Bai, Xiaoying; Fang, Ming; Li, Lanting; Hung, Chih-Cheng

doi:10.1007/s10489-021-02517-w

Mixed graph convolution and residual transformation network for skeleton-based action recognition

Published: 23 May 2021

Volume 52, pages 1544–1555, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Shuhua Liu ORCID: orcid.org/0000-0002-5133-329X¹,
Xiaoying Bai¹,
Ming Fang¹,
Lanting Li¹ &
…
Chih-Cheng Hung²

1467 Accesses
17 Citations
Explore all metrics

Abstract

Action recognition based on a human skeleton is an extremely challenging research problem. The temporal information contained in the human skeleton is more difficult to extract than the spatial information. Many researchers focus on graph convolution networks and apply them to action recognition. In this study, an action recognition method based on a two-stream network called RNXt-GCN is proposed on the basis of the Spatial-Temporal Graph Convolutional Network (ST-GCN). The human skeleton is converted first into a spatial-temporal graph and a SkeleMotion image which are input into ST-GCN and ResNeXt, respectively, for performing the spatial-temporal convolution. The convolved features are then fused. The proposed method models the temporal information in action from the amplitude and direction of the action and addresses the shortcomings of isolated temporal information in the ST-GCN. The experiments are comprehensively performed on the four datasets: 1) UTD-MHAD, 2) Northwestern-UCLA, 3) NTU RGB-D 60, and 4) NTU RGB-D 120. The proposed model shows very competitive results compared with other models in our experiments. On the experiments of NTU RGB + D 120 dataset, our proposed model outperforms those of the state-of-the-art two-stream models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

CBAM: Convolutional Block Attention Module

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

Article 12 August 2023

Graph convolutional networks: a comprehensive review

Article Open access 10 November 2019

References

Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In AAAI:1–8
Xie S, Girshick R, Dollár P, et al. (2017) Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 1492–1500
Chen C . (2015) UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor[C]// IEEE International Conference on Image Processing. IEEE
Wang J, Nie X, Xia Y, et al. (2014) Cross-view action modeling, learning and recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2649-2656
Shahroudy A, Liu J, Ng TT et al (2016) NTU RGB+D: a large scale dataset for 3D human activity analysis[C]// IEEE computer. Society:1010–1019
Liu J, Shahroudy A, Perez M et al (2019) NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding.[J]. IEEE Trans Pattern Anal Mach Intell:1–1
Li J, Wong Y, Zhao Q, et al. (2018) Unsupervised learning of view-invariant action representations[C] //Advances in Neural Information Processing Systems(NIPS). 1254-1264
Fiorini L, Mancioppi G, Semeraro F, Fujita H, Cavallo F (2020) Unsupervised emotional state classification through physiological parameters for social robotics applications[J]. Knowl-Based Syst 190:105217
Article Google Scholar
Han F, Reily B, Hoff W, Zhang H (2017) Space-time representation of people based on 3D skeletal data: A review[J]. Comput Vis Image Underst 158:85–105
Article Google Scholar
Wang P, Li W, Ogunbona P, Wan J, Escalera S (2018) RGB-D-based human motion recognition with deep learning: A survey[J]. Comput Vis Image Underst 171:118–139
Article Google Scholar
Mengyuan Liu, Hong Liu, and Chen Chen (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2019) View adaptive neural networks for high performance skeleton-based human action recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence 41:1963–1978
Article Google Scholar
Hou Y, Li Z, Wang P, Li W (2018) Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks[J]. IEEE Transactions on Circuits & Systems for Video Technology 28(3):807–811
Article Google Scholar
Li S, Li W, Cook C, et al. (2018) Independently recurrent neural network (indrnn): Building a longer and deeper rnn[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 5457–5466
Hu G, Cui B, Yu S (2019) Skeleton-based action recognition with synchronous local and non-local spatio-temporal learning and frequency attention[C]// Proceedings of the IEEE International Conference on Multimedia and Expo (ICME). 1216–1221
Li C, Zhong Q, Xie D, et al. (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation[C]// Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence(IJCAI). 786–792
Ke Q, An S, Bennamoun M, Sohel F, Boussaid F (2017) SkeletonNet: mining deep part features for 3-D action recognition[J]. IEEE Signal Processing Letters 24(6):731–735
Article Google Scholar
Liu J, Shahroudy A, Xu D, Kot AC, Wang G (2018) Skeleton-based action recognition using Spatio-temporal LSTM network with trust gates[J]. IEEE Trans Pattern Anal Mach Intell 40(12):3007–3021
Article Google Scholar
Chenyang SI, et al. (2019) An Attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). p. 1227–1236
Shi L, Zhang Y, Cheng J, et al. (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 12026–12035
Wu C, Wu X J, Kittler J. (2019) Spatial residual layer and dense connection block enhanced spatial temporal graph convolutional network for skeleton-based action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops. 0–0
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos[C]//Advances in Neural Information Processing Systems(NIPS). 568–576
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks[J]. ICLR
Simonyan K , Zisserman A (2015) Very deep convolutional networks for large-scale image recognition in: ICLR
He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR): 770–778
Szegedy C, Liu W, Jia Y, et al. (2015) Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR). 1–9
Du Y, Fu Y, Wang L (2015) Skeleton based action recognition with convolutional neural network[C]//2015 3rd IAPR Asian conference on pattern recognition (ACPR). IEEE:579–583
Choutas V, Weinzaepfel P, Revaud J, et al. Potion: Pose motion representation for action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2018: 7024–7033
Ke Q, Bennamoun M, An S, et al. A new representation of skeleton sequences for 3d action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2017: 3288–3297
Li C, Zhong Q, Xie D et al (2017) Skeleton-based action recognition with convolutional neural networks[C]//2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE:597–600
Liu M, Chen C, Liu H (2017) 3d action recognition using data visualization and convolutional neural networks[C]//2017 IEEE international conference on multimedia and expo (ICME). IEEE:925–930
Wang P, Li W, Li C, Hou Y (2018) Action recognition based on joint trajectory maps with convolutional neural networks[J]. Knowl-Based Syst 158:43–53
Article Google Scholar
Wang P, Li Z, Hou Y, et al. (2016) Action recognition based on joint trajectory maps using convolutional neural networks[C]//Proceedings of the 24th ACM international conference on Multimedia. 102–106
Zhengyuan Y , Yuncheng L , Jianchao Y , et al. (2018) Action recognition with spatio-temporal visual attention on skeleton image sequences[J]. IEEE Transactions on Circuits and Systems for Video Technology, pp:1–1
Caetano C, Sena J, Brémond F et al (2019) Skelemotion: a new representation of skeleton joint sequences based on motion information for 3d action recognition[C]. 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) IEEE:1–8
Zhou L, Li W, Zhang Y et al (2014) Discriminative key pose extraction using extended LC-KSVD for action recognition[C]. International Conference on Digital Lmage Computing: Techniques & Applications:1–8
Wang J, Liu Z, Wu Y, Yuan J (2014) Learning actionlet ensemble for 3D human action recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence 36(5):914–927
Article Google Scholar
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group[C]. Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR):588–595
Li M, Chen S, Chen X et al (2019) Actional-structural graph convolutional networks for skeleton-based action recognition[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR):3595–3603
Caetano C, Brémond F, Schwartz WR (2019) Skeleton image representation for 3d action recognition based on tree structure and reference joints[C]//2019 32nd SIBGRAPI conference on graphics, patterns and images (SIBGRAPI). IEEE:16–23
Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. (2019) Skeleton-based action recognition with directed graph neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pages 7912–7921
Liu J, Akhtar N, Mian A. Skepxels: Spatio-temporal image representation of human skeleton joints for action recognition[C]//CVPR workshops. 2019
Google Scholar
Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang (2016) Spatio-temporal LSTM with trust gates for 3d human action recognition. In European Conference on Computer Vision(ECCV), pages 816–833. Springer
Liu J, Wang G, Duan L-Y, Abdiyeva K, Kot AC (2017) Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Transactions on Image Processing 27(4):1586–1599
Article MathSciNet Google Scholar
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2018) Learning clip representations for skeleton-based 3d action recognition. IEEE Trans Image Process 27(6):2842–2855
Article MathSciNet Google Scholar
Liu M, Yuan J (2018) Recognizing human actions as the evolution of pose estimation maps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR):1159–1168
Dong J, Gao Y, Lee HJ, Zhou H, Yao Y, Fang Z, Huang B (2020) Action recognition based on the fusion of graph convolutional networks with high order features. Applied Sciences 10(4):1482
Article Google Scholar

Download references

Acknowledgments

This work is supported partially by the project of Jilin Provincial Science and Technology Department under the Grant 20180201003GX and the project of Jilin province development and reform commission under the Grant 2019C053-4. The authors gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.

Funding

This research was funded by the project of Jilin Provincial Science and Technology Department under the Grant 20180201003GX and the APC was funded by Grant 20180201003GX too.

Author information

Authors and Affiliations

School of Information Science and Technology, Northeast Normal University, Changchun, China
Shuhua Liu, Xiaoying Bai, Ming Fang & Lanting Li
Center for Machine Vision and Security Research, Kennesaw State University, Marietta, GA, USA
Chih-Cheng Hung

Authors

Shuhua Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoying Bai
View author publications
You can also search for this author in PubMed Google Scholar
Ming Fang
View author publications
You can also search for this author in PubMed Google Scholar
Lanting Li
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Cheng Hung
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

This study was completed by the co-authors. Shuhua Liu conceived the research and wrote the draft. The major experiments and analyses were undertaken by Xiaoying Bai and Ming Fang. Lanting Li was responsible for data processing and drawing figures. Chih-Cheng Hung edited and reviewed the paper. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Chih-Cheng Hung.

Ethics declarations

Conflict of interest

Authors declare no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, S., Bai, X., Fang, M. et al. Mixed graph convolution and residual transformation network for skeleton-based action recognition. Appl Intell 52, 1544–1555 (2022). https://doi.org/10.1007/s10489-021-02517-w

Download citation

Accepted: 07 May 2021
Published: 23 May 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s10489-021-02517-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixed graph convolution and residual transformation network for skeleton-based action recognition

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

Graph convolutional networks: a comprehensive review

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mixed graph convolution and residual transformation network for skeleton-based action recognition

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

Graph convolutional networks: a comprehensive review

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation