Triplet attention multiple spacetime-semantic graph convolutional network for skeleton-based action recognition

Sun, Yanjing; Huang, Han; Yun, Xiao; Yang, Bin; Dong, Kaiwen

doi:10.1007/s10489-021-02370-x

Triplet attention multiple spacetime-semantic graph convolutional network for skeleton-based action recognition

Published: 24 April 2021

Volume 52, pages 113–126, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yanjing Sun^1,2,
Han Huang¹,
Xiao Yun ORCID: orcid.org/0000-0002-1538-5279¹,
Bin Yang¹ &
…
Kaiwen Dong¹

959 Accesses
8 Citations
Explore all metrics

Abstract

Skeleton-based action recognition has recently attracted widespread attention in the field of computer vision. Previous studies on skeleton-based action recognition are susceptible to interferences from redundant video frames in judging complex actions but ignore the fact that the spatial-temporal features of different actions are extremely different. To solve these problems, we propose a triplet attention multiple spacetime-semantic graph convolutional network for skeleton-based action recognition (AM-GCN), which can not only capture the multiple spacetime-semantic feature from the video images to avoid limited information diversity from single-layer feature representation but can also improve the generalization ability of the network. We also present the triplet attention mechanism to apply an attention mechanism to different key points, key channels, and key frames of the actions, improving the accuracy and interpretability of the judgement of complex actions. In addition, different kinds of spacetime-semantic feature information are combined through the proposed fusion decision for comprehensive prediction in order to improve the robustness of the algorithm. We validate AM-GCN with two standard datasets, NTU-RGBD and Kinetics, and compare it with other mainstream models. The results show that the proposed model achieves tremendous improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Focus on temporal graph convolutional networks with unified attention for skeleton-based action recognition

Article 17 August 2021

An improved spatial temporal graph convolutional network for robust skeleton-based action recognition

Article 13 June 2022

References

Cao C, Lan C, Zhang Y, Zeng W, Lu H, Zhang Y (2018) Skeleton-based action recognition with gated convolutional neural networks. IEEE Trans Circuits Syst Video Technol 29(11):3247–3257
Article Google Scholar
Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7291–7299
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308
Chen Y, Ma G, Yuan C, Li B, Zhang H, Wang F, Hu W (2020) Graph convolutional network with structure pooling and joint-wise channel attention for action recognition. Pattern Recognit, 103
Ding C, Liu K, Cheng F, Belyaev E (2021) Spatio-temporal attention on manifold space for 3d human action recognition. Appl Intell 51(5):560–570
Article Google Scholar
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118
Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru-Guzik A, Adams RP (2015) Convolutional networks on graphs for learning molecular fingerprints. In: Conference and workshop on neural information processing systems, pp 2224–2232
Feng Y, Li K, Gao Y, Qiu J (2020) Hierarchical graph attention networks for semi-supervised node classification. Appl Intell 50(3):1–17
Google Scholar
Fernando B, Gavves E, Oramas JM, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5378–5387
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3146–3154
Gao P, Zhang Q, Wang F, Xiao L, Zhang Y (2020) Learning reinforced attentional representation for end-to-end visual tracking. Inf Sci 517:52–67
Article Google Scholar
Gaur U, Zhu Y, Song B, Roy-Chowdhury A (2011) A “string of feature graphs” model for recognition of complex activities in natural videos. In: Proceedings of the IEEE 15th international conference on computer vision, pp 2595–2602
Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Conference and workshop on neural information processing systems, pp 1024–1034
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3d locations. In: International joint conference on artificial intelligence
i R, Tapaswi M, Liao R, Jia J, Urtasun R, Fidler S (2017) Situation recognition with graph neural networks. In: IEEE International conference on computer vision, pp 4183–4192
Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev P et al (2017) The kinetics human action video dataset. arXiv:1705.06950
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3288–3297
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2018) Learning clip representations for skeleton-based 3d action recognition. IEEE Trans Image Process 27(6):2842–2855
Article MathSciNet Google Scholar
Kim TS, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition Workshop, pp 1623–1631
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: International conference on learning representations, pp 1–14
Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: International joint conferences on artificial intelligence, pp 786–792
Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3595–3603
Lin TY, Dollár P., Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision, pp 816–833
Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit 68(8):346–362
Article Google Scholar
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, pp 21–37
Lu L, Yu R, Di H, Zhang L, Lu Y (2020) Gaim: Graph attention based interaction model for collective activity recognition. IEEE Trans Multimedia 22(2):524–539
Article Google Scholar
Monti F, Boscaini D, Masci J, Rodola E, Svoboda J, Bronstein MM (2017) Geometric deep learning on graphs and manifolds using mixture model cnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5115–5124
Niepert M, Ahmed M, Kutzkov K (2016) Learning convolutional neural networks for graphs. In: Proceedings of the 33rd international conference on machine learning and data mining, pp 2014–2023
Pérez-Hernández F, Tabik S, Lamas A, Olmos R, Herrera F (2020) Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: Application in video surveillance. Knowl-Based Syst 194:100590
Article Google Scholar
Qi S, Wang W, Jia B, Shen J, Zhu SC (2018) Learning human-object interactions by graph parsing neural networks. In: European conference on computer vision, pp 401–417
Shahroudy A, Liu J, Ng TT, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019
Shahroudy A, Ng TT, Gong Y, Wang G (2018) Deep multimodal feature analysis for action recognition in rgb+d videos. IEEE Trans Pattern Anal Mach Intell 40(5):1045–1058
Article Google Scholar
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12026–12035
Shi L, Zhang Y, Cheng J, Lu H (2020) Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans Image Process 29:9532–9545
Article Google Scholar
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1227–1236
Song S, Lan C, Xing J, Zeng W, Liu J (2017) An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Thirty-first AAAI conference on artificial intelligence, pp 4263–4270
Tang Y, Tian Y, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5323–5332
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN (2017) Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Conference and workshop on neural information processing systems, pp 5998–6008
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595
Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1290–1297
Wang Y, Zhou L, Qiao Y (2018) Temporal hallucinating for action recognition with few still images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5314–5322
Woo S, Park J, Lee JY, So Kweon I (2018) Cbam: Convolutional block attention module. In: European conference on computer vision, pp 3–19
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence, pp 7444–7452
Yang D, Li MM, Fu H, Fan J, Leung H (2020) Centrality graph convolutional networks for skeleton-based action recognition. arXiv:2003.03007
Yang H, Gu Y, Zhu J, Hu K, Zhang X (2020) Pgcn-tca: Pseudo graph convolutional network with temporal and channel-wise attention for skeleton-based action recognition. IEEE Access 8(7):10040–10047
Article Google Scholar
Zhang H, Goodfellow I, Metaxas D, Odena A (2018) Self-attention generative adversarial networks. arXiv:1805.08318
Zhang S, Yang Y, Xiao J, Liu X, Yang Y, Xie D, Zhuang Y (2018) Fusing geometric features for skeleton-based action recognition using multilayer lstm networks. IEEE Trans Multimed 20(9):2330–2343
Article Google Scholar
Zhang X, Xu C, Tian X, Tao D (2020) Graph edge convolutional neural networks for skeleton-based action recognition. IEEE Trans Neural Netw Learn Syst 31(8):3047–3060
Article Google Scholar

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of Jiangsu Province (BK20180640), the National Natural Science Foundation of China (61902404, 512918914, 51734009, 61771417, 61873246), and the State Key Research Development Program (2016YFC0801403).

Author information

Authors and Affiliations

China University of Mining and Technology, Xuzhou, China
Yanjing Sun, Han Huang, Xiao Yun, Bin Yang & Kaiwen Dong
Xuzhou Engineering Research Center of Intelligent Industry Safety and Emergency Collaboration, Xuzhou, China
Yanjing Sun

Authors

Yanjing Sun
View author publications
You can also search for this author in PubMed Google Scholar
Han Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Yun
View author publications
You can also search for this author in PubMed Google Scholar
Bin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Kaiwen Dong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao Yun.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, Y., Huang, H., Yun, X. et al. Triplet attention multiple spacetime-semantic graph convolutional network for skeleton-based action recognition. Appl Intell 52, 113–126 (2022). https://doi.org/10.1007/s10489-021-02370-x

Download citation

Accepted: 17 March 2021
Published: 24 April 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s10489-021-02370-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Triplet attention multiple spacetime-semantic graph convolutional network for skeleton-based action recognition

Abstract

Access this article

Similar content being viewed by others

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Focus on temporal graph convolutional networks with unified attention for skeleton-based action recognition

An improved spatial temporal graph convolutional network for robust skeleton-based action recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Triplet attention multiple spacetime-semantic graph convolutional network for skeleton-based action recognition

Abstract

Access this article

Similar content being viewed by others

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Focus on temporal graph convolutional networks with unified attention for skeleton-based action recognition

An improved spatial temporal graph convolutional network for robust skeleton-based action recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation