An attentional spatial temporal graph convolutional network with co-occurrence feature learning for action recognition

Tian, Dong; Lu, Zhe-Ming; Chen, Xiao; Ma, Long-Hua

doi:10.1007/s11042-020-08611-4

An attentional spatial temporal graph convolutional network with co-occurrence feature learning for action recognition

Published: 22 January 2020

Volume 79, pages 12679–12697, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Dong Tian¹,
Zhe-Ming Lu¹,
Xiao Chen¹ &
…
Long-Hua Ma²

904 Accesses
12 Citations
Explore all metrics

Abstract

Action recognition plays a central role in intelligent surveillance system, game-control, human-computer interaction, and so on. In this work, we design a multi-task framework that improves the recent Spatial-Temporal Graph Convolutional Networks (ST-GCN) for skeleton-based action recognition by introducing the attention mechanism and co-occurrence feature learning. Specifically, we use an attentional branch to pay more attention to more discriminating features and aggregates co-occurrence features from all joints globally in another branch. Additionally, our multi-task framework exploits the inherent correlation between branches to further enhance the classification accuracy and convergence speed. Experiments have been carried out on NTURGB+D and Kinetics human action dataset. The results clearly show that the accuracy of the proposed multi-task framework are distinguishably higher than ST-GCN and other mainstream methods for 3D action recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Spatio-temporal Attention Graph Convolutions for Skeleton-based Action Recognition

Focus on temporal graph convolutional networks with unified attention for skeleton-based action recognition

Article 17 August 2021

References

Baradel F, Wolf C, Mille J (2017) Pose-conditioned spatiotemporal attention for human action recognition. CoRR abs/1703.10106, 2017. 7
Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Computer Vision and Pattern Recognition (CVPR), 2017 9, 10
Carreira J, Zisserman A (2017) Quovadis, action recognition? a new model and the kinetics dataset. In: CVPR, 2017. 1, 3, 5, 7, 8
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1110–1118
Gu J, Wang G, Chen T (2016) Recurrent highway networks with language cnn for image captioning. arXiv preprint arXiv:1612.07086
Hammond DK, Vandergheynst P, Gribonval R (2011) Wavelets on graphs via spectral graph theory. Appl Comput Harmon Anal 30(2):129–150
Article MathSciNet Google Scholar
Jie H, Li S, Albanie S (2017) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell pp(99):1–1
Google Scholar
Jin SY, Choi HJ (2012) Essential body-joint and atomic action detection for human activity recognition using longest common subsequence algorithm. In: ICCV, pp 148–159
Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev P et al (2017) The kinetics human action video dataset. In: arXiv:1705.06950
Ke Q, An S, Bennamoun M, Sohel F, Boussaid F (2017) Skeletonnet: mining deep part features for 3d action recognition. In: IEEE signal processing letters
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3D action recognition. In: CVPR, July 2017
Kim TS, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: BNMW CVPRW
Koniusz P, Cherian A, Porikli F (2016) Tensor representations via kernel linearization for action recognition from 3d skeletons. arXiv preprint arXiv:1604.00239
Li D, Chen X, Zhang Z, Huang K (2017) Learning deep context-aware features over body and latent parts for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 384–393
Li C, Zhong Q, Xie D, Pu S (2017) Skeleton-based action recognition with convolutional neural networks. In: arXiv:1704.07595
Li W, Zhu X, Gong S (2018) Harmonious attention network for person reidentification. In: CVPR, vol 1, p 2
Li R, Wang S, Zhu F, Huang J (2018) Adaptive graph convolutional neural networks. arXiv preprint arXiv:1801.03226
Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: arXiv:1804.06055
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal LSTM with trust gates for 3D human action recognition. In: European conference on computer vision (ECCV). Springer, pp 816–833
Lu G, Zhou Y, Li X (2016) Efficient action recognition via local position offset of 3D skeletal body joints. Multimed Tools Appl 75(6):3479–3494
Article Google Scholar
Nguyen TV (2015) STAP: spatial-temporal attention-aware pooling for action recognition[J]. IEEE Trans Circuits Syst Video Technol 25(1):77–86
Article Google Scholar
Niepert M, Ahmed M, Kutzkov K (2016) Learning convolutional neural networks for graphs. In: International conference on machine learning
Sainath TN, Vinyals O, Senior A, Sak H (2015) Convolutional, long short-term memory, fully connected deep neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 4580–4584
Shahroudy A, Liu J, Ng T-T, Wang G (2016) Nturgb+d: a large scale data set for 3d human activity analysis. In: CVPR, pp 1010–1019
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: CVPR 2019
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: NIPS, pp 568–576
Song S, Lan C, Xing J, Zeng W, Liu J (2017) An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4–9, 2017, San Francisco, California, USA, pp 4263–4270
Sun B, Kong D, Wang S (2018) Effective human action recognition using global and local offsets of skeleton joints. Multimed Tools Appl:1–25. Published online Jul, 2018
Toshev A, Szegedy C (2013) Deeppose: human pose estimation via deep neural networks. CoRR abs/1312.4659
Wang H, Schmid C (2014) Action recognition with improved trajectories. IEEE International Conference on Computer Vision
Wang H et al (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103.1(2013):60–79
Article MathSciNet Google Scholar
Wang J, Liu Z, Wu Y, Yuan J (2014) Learning actionlet ensemble for 3d human action recognition. TPAMI 36(5):914
Article Google Scholar
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: ECCV, 2016, p 6
Wang C, Zhang Q, Huang C, Liu W, Wang X (2018) Mancs: a multi-task attentional network with curriculum sampling for person re-identification. In: ECCV 2018, pp 384–400
Weston J, Chopra S, Bordes A (2014) Memory networks. arXiv preprint arXiv:1410.3916
Xia L, Chen C-C, Aggarwal J (2012) View invariant human action recognition using histograms of 3D joints. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 20–27
Xu K, Li C, Tian Y, Sonobe T, Kawarabayashi KI, Jegelka S (2018) Representation learning on graphs with jumping knowledge networks. arXiv preprint arXiv:1806.03536
Google Scholar
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI
Yeung S, Russakovsky O, Jin N, Andriluka M, Mori G, Fei-Fei L (2015) Every moment counts: dense detailed labeling of actions in complex videos. Int J Comput Vis 126(2–4):375–389
MathSciNet Google Scholar
Yong D, Yun F, Liang W (2016) Skeleton based action recognition with convolutional neural network. In: Pattern Recognition, pp 579–583
Yu Y, Mann GK, Gosine RG (2010) An object-based visual attention model for robotic applications. IEEE Trans Syst Man Cybern B Cybern 40(5):1398–1412
Article Google Scholar
Zhao Y, Xiong Y, Wang L, Wu Z, Tang X, Lin D (2017) Temporal action detection with structured segment networks. In: ICCV
Zhu W, Lan C, Xing J, Zeng W, Li Y, Shen L, Xie X (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks. In: AAAI Conference on Artificial Intelligence (AAAI)
Zichao M, Zhixin S (2018) Time-varying LSTM networks for action recognition. Multimed Tools Appl:32275–32285. Published online Dec. 2018

Download references

Author information

Authors and Affiliations

School of Aeronautics and Astronautics, Zhejiang University, Hangzhou, 310027, People’s Republic of China
Dong Tian, Zhe-Ming Lu & Xiao Chen
School of Information Science and Engineering, Ningbo Institute of Technology, Zhejiang University, Ningbo, 315100, People’s Republic of China
Long-Hua Ma

Authors

Dong Tian
View author publications
You can also search for this author in PubMed Google Scholar
Zhe-Ming Lu
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Long-Hua Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhe-Ming Lu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tian, D., Lu, ZM., Chen, X. et al. An attentional spatial temporal graph convolutional network with co-occurrence feature learning for action recognition. Multimed Tools Appl 79, 12679–12697 (2020). https://doi.org/10.1007/s11042-020-08611-4

Download citation

Received: 02 February 2019
Revised: 18 November 2019
Accepted: 02 January 2020
Published: 22 January 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s11042-020-08611-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An attentional spatial temporal graph convolutional network with co-occurrence feature learning for action recognition

Abstract

Access this article

Similar content being viewed by others

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Spatio-temporal Attention Graph Convolutions for Skeleton-based Action Recognition

Focus on temporal graph convolutional networks with unified attention for skeleton-based action recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An attentional spatial temporal graph convolutional network with co-occurrence feature learning for action recognition

Abstract

Access this article

Similar content being viewed by others

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Spatio-temporal Attention Graph Convolutions for Skeleton-based Action Recognition

Focus on temporal graph convolutional networks with unified attention for skeleton-based action recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation