Abstract
Action recognition plays a central role in intelligent surveillance system, game-control, human-computer interaction, and so on. In this work, we design a multi-task framework that improves the recent Spatial-Temporal Graph Convolutional Networks (ST-GCN) for skeleton-based action recognition by introducing the attention mechanism and co-occurrence feature learning. Specifically, we use an attentional branch to pay more attention to more discriminating features and aggregates co-occurrence features from all joints globally in another branch. Additionally, our multi-task framework exploits the inherent correlation between branches to further enhance the classification accuracy and convergence speed. Experiments have been carried out on NTURGB+D and Kinetics human action dataset. The results clearly show that the accuracy of the proposed multi-task framework are distinguishably higher than ST-GCN and other mainstream methods for 3D action recognition.
Similar content being viewed by others
References
Baradel F, Wolf C, Mille J (2017) Pose-conditioned spatiotemporal attention for human action recognition. CoRR abs/1703.10106, 2017. 7
Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Computer Vision and Pattern Recognition (CVPR), 2017 9, 10
Carreira J, Zisserman A (2017) Quovadis, action recognition? a new model and the kinetics dataset. In: CVPR, 2017. 1, 3, 5, 7, 8
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1110–1118
Gu J, Wang G, Chen T (2016) Recurrent highway networks with language cnn for image captioning. arXiv preprint arXiv:1612.07086
Hammond DK, Vandergheynst P, Gribonval R (2011) Wavelets on graphs via spectral graph theory. Appl Comput Harmon Anal 30(2):129–150
Jie H, Li S, Albanie S (2017) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell pp(99):1–1
Jin SY, Choi HJ (2012) Essential body-joint and atomic action detection for human activity recognition using longest common subsequence algorithm. In: ICCV, pp 148–159
Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev P et al (2017) The kinetics human action video dataset. In: arXiv:1705.06950
Ke Q, An S, Bennamoun M, Sohel F, Boussaid F (2017) Skeletonnet: mining deep part features for 3d action recognition. In: IEEE signal processing letters
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3D action recognition. In: CVPR, July 2017
Kim TS, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: BNMW CVPRW
Koniusz P, Cherian A, Porikli F (2016) Tensor representations via kernel linearization for action recognition from 3d skeletons. arXiv preprint arXiv:1604.00239
Li D, Chen X, Zhang Z, Huang K (2017) Learning deep context-aware features over body and latent parts for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 384–393
Li C, Zhong Q, Xie D, Pu S (2017) Skeleton-based action recognition with convolutional neural networks. In: arXiv:1704.07595
Li W, Zhu X, Gong S (2018) Harmonious attention network for person reidentification. In: CVPR, vol 1, p 2
Li R, Wang S, Zhu F, Huang J (2018) Adaptive graph convolutional neural networks. arXiv preprint arXiv:1801.03226
Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: arXiv:1804.06055
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal LSTM with trust gates for 3D human action recognition. In: European conference on computer vision (ECCV). Springer, pp 816–833
Lu G, Zhou Y, Li X (2016) Efficient action recognition via local position offset of 3D skeletal body joints. Multimed Tools Appl 75(6):3479–3494
Nguyen TV (2015) STAP: spatial-temporal attention-aware pooling for action recognition[J]. IEEE Trans Circuits Syst Video Technol 25(1):77–86
Niepert M, Ahmed M, Kutzkov K (2016) Learning convolutional neural networks for graphs. In: International conference on machine learning
Sainath TN, Vinyals O, Senior A, Sak H (2015) Convolutional, long short-term memory, fully connected deep neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 4580–4584
Shahroudy A, Liu J, Ng T-T, Wang G (2016) Nturgb+d: a large scale data set for 3d human activity analysis. In: CVPR, pp 1010–1019
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: CVPR 2019
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: NIPS, pp 568–576
Song S, Lan C, Xing J, Zeng W, Liu J (2017) An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4–9, 2017, San Francisco, California, USA, pp 4263–4270
Sun B, Kong D, Wang S (2018) Effective human action recognition using global and local offsets of skeleton joints. Multimed Tools Appl:1–25. Published online Jul, 2018
Toshev A, Szegedy C (2013) Deeppose: human pose estimation via deep neural networks. CoRR abs/1312.4659
Wang H, Schmid C (2014) Action recognition with improved trajectories. IEEE International Conference on Computer Vision
Wang H et al (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103.1(2013):60–79
Wang J, Liu Z, Wu Y, Yuan J (2014) Learning actionlet ensemble for 3d human action recognition. TPAMI 36(5):914
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: ECCV, 2016, p 6
Wang C, Zhang Q, Huang C, Liu W, Wang X (2018) Mancs: a multi-task attentional network with curriculum sampling for person re-identification. In: ECCV 2018, pp 384–400
Weston J, Chopra S, Bordes A (2014) Memory networks. arXiv preprint arXiv:1410.3916
Xia L, Chen C-C, Aggarwal J (2012) View invariant human action recognition using histograms of 3D joints. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 20–27
Xu K, Li C, Tian Y, Sonobe T, Kawarabayashi KI, Jegelka S (2018) Representation learning on graphs with jumping knowledge networks. arXiv preprint arXiv:1806.03536
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI
Yeung S, Russakovsky O, Jin N, Andriluka M, Mori G, Fei-Fei L (2015) Every moment counts: dense detailed labeling of actions in complex videos. Int J Comput Vis 126(2–4):375–389
Yong D, Yun F, Liang W (2016) Skeleton based action recognition with convolutional neural network. In: Pattern Recognition, pp 579–583
Yu Y, Mann GK, Gosine RG (2010) An object-based visual attention model for robotic applications. IEEE Trans Syst Man Cybern B Cybern 40(5):1398–1412
Zhao Y, Xiong Y, Wang L, Wu Z, Tang X, Lin D (2017) Temporal action detection with structured segment networks. In: ICCV
Zhu W, Lan C, Xing J, Zeng W, Li Y, Shen L, Xie X (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks. In: AAAI Conference on Artificial Intelligence (AAAI)
Zichao M, Zhixin S (2018) Time-varying LSTM networks for action recognition. Multimed Tools Appl:32275–32285. Published online Dec. 2018
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tian, D., Lu, ZM., Chen, X. et al. An attentional spatial temporal graph convolutional network with co-occurrence feature learning for action recognition. Multimed Tools Appl 79, 12679–12697 (2020). https://doi.org/10.1007/s11042-020-08611-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-08611-4