Focus on temporal graph convolutional networks with unified attention for skeleton-based action recognition

Gao, Bing-Kun; Dong, Le; Bi, Hong-Bo; Bi, Yun-Ze

doi:10.1007/s10489-021-02723-6

Focus on temporal graph convolutional networks with unified attention for skeleton-based action recognition

Published: 17 August 2021

Volume 52, pages 5608–5616, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Bing-Kun Gao¹,
Le Dong¹,
Hong-Bo Bi ORCID: orcid.org/0000-0003-2442-330X¹ &
…
Yun-Ze Bi¹

672 Accesses
6 Citations
Explore all metrics

Abstract

Graph convolutional networks (GCN) have received more and more attention in skeleton-based action recognition. Many existing GCN models pay more attention to spatial information and ignore temporal information, but the completion of actions must be accompanied by changes in temporal information. Besides, the channel, spatial, and temporal dimensions often contain redundant information. In this paper, we design a temporal graph convolutional network (FTGCN) module which can concentrate more temporal information and properly balance them for each action. In order to better integrate channel, spatial and temporal information, we propose a unified attention model of the channel, spatial and temporal (CSTA). A basic block containing these two novelties is called FTC-GCN. Extensive experiments on two large-scale datasets, compared with 17 methods on NTU-RGB+D and 8 methods on Kinetics-Skeleton, show that for skeleton-based human action recognition, our method achieves the best performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

Article 12 August 2023

References

Wang X (2013) surveillance, Intelligent multi-camera video. A Rev Pattern Recognit Lett 34 (1):3–19
Article Google Scholar
Turaga P, Chellappa R, Subrahmanian VS, Udrea O (2008) Machine recognition of human activities: a survey. IEEE Trans Circ Syst Video Technol 18(11):1473–1488
Article Google Scholar
Ellis C, Masood SZ, Tappen MF, LaViola JJ, Sukthankar R (2013) Exploring the trade-off between accuracy and observational latency in action recognition. Int J Comput Vis 101(3):420– 436
Article Google Scholar
Zhang W, Smith ML, Smith LN, Farooq A (2016) Gender and gaze gesture recognition for human-computer interaction. Comput Vis Image Underst 149:32–50
Article Google Scholar
Camporesi C, Kallmann M, Han JJ (2013) Vr solutions for improving physical therapy. In: 2013 IEEE Virtual Reality (VR). IEEE, pp 77–78
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595
Fernando B, Gavves E, Oramas JM, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5378–5387
Krizhevsky A, Ilya S, Geoffrey HE (2017) Imagenet classification with deep convolutional neural networks. Communications of the Acm, USA
Book Google Scholar
Greff K, Srivastava RK, Koutnik J, Steunebrink BR, Schmidhuber J (2016) Lstm: A search space odyssey. IEEE Trans Neural Netw Learn Syst 1–11
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(1):2493–2537
MATH Google Scholar
Tarwani KM, Edem S (2017) Survey on recurrent neural network in natural language processing. Int J Eng Trends Technol 48:301–304
Article Google Scholar
Wang P, Li Z, Hou Y, Li W (2016) Action recognition based on joint trajectory maps using convolutional neural networks. Knowl Based Syst 102–106
Li C, Hou Y, Wang P, Li W (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628
Article Google Scholar
Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2017) Long-term recurrent convolutional networks for visual recognition and description. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR)
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. In: CVPR, p 2017
Shao Z, Li Y, Yao G, Yang J, Wang Z (2018) A hierarchical model for action recognition based on body parts. In: 2018 IEEE international conference on robotics and automation (ICRA)
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks, arXiv:1609.02907
Gao H, Wang Z, Ji S (2018) Large-scale learnable graph convolutional networks. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1416–1424
Wu F, Zhang T, Souza AHd Jr, Fifty C, Yu T, Weinberger KQ (2019) Simplifying graph convolutional networks, arXiv:1902.07153
Chen J, Ma T, Xiao C (2018) Fastgcn: fast learning with graph convolutional networks via importance sampling, arXiv:1801.10247
Balcilar M, Renton G, Heroux P, Gauzere B, Adam S, Honeine P (2020) Bridging the gap between spectral and spatial domains in graph neural networks
Ma Y, Wang S, Aggarwal CC, Tang J (2019) Graph convolutional networks with eigenpooling. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, pp 723–731
Zhang M, Cui Z, Neumann M, Chen Y (2018) An end-to-end deep learning architecture for graph classification. In: AAAI, vol 18, pp 4438–4445
Bresson X, Laurent T (2017) Residual gated graph convnets, arXiv:1711.07553
Wang H, Leskovec J (2020) Unifying graph convolutional neural networks and label propagation, arXiv:2002.06755
Huang W, Zhang T, Rong Y, Huang J (2018) Adaptive sampling towards fast graph representation learning. Adv Neural Inform Process Syst 31:4558–4567
Google Scholar
Sun K, Lin Z, Zhu Z (2019) Adagcn: Adaboosting graph convolutional networks into deep models, arXiv:1908.05081
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition, arXiv:1801.07455
Tang Y, Tian Y, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5323–5332
Thakkar K, Narayanan P (2018) Part-based graph convolutional network for action recognition, arXiv:1809.04983
Ye F, Tang H, Wang X, Liang X (2019) Joints relation inference network for skeleton-based action recognition. In: 2019 IEEE international conference on image processing (ICIP). IEEE, pp 16–20
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118
Li B, Dai Y, Cheng X, Chen H, Lin Y, He M (2017) Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn. In: 2017 IEEE international conference on multimedia and expo workshops (ICMEW). IEEE, pp 601–604
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12,026–12,035
Shuman DI, Narang SK, Frossard P, Ortega A, Vandergheynst P (2013) The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process Magaz 30(3):83–98
Article Google Scholar
Shahroudy A, Liu J, Ng TT, Wang G (2016) Ntu rgb+d: A large scale dataset for 3d human activity analysis. 1010–1019
Kay W, Carreira J, Simonyan K, Zhang B, Zisserman A (2017) The kinetics human action video dataset
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision. Springer, pp 816–833
Wang H, Wang L (2017) Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 499–508
Lee I, Kim D, Kang S, Lee S (2017) Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks. In: 2017 IEEE international conference on computer vision (ICCV)
Liu M, Hong L, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition
Kim TS, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks
Jianru X, Wenjun Z, Junliang X, Cuiling L, Nanning Z, Pengfei Z (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE international conference on computer vision
Song YF, Zhang Z, Wang L (2019) Richly activated graph convolutional network for action recognition with incomplete skeletons 2019
Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3595–3603
Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7912–7921
Yang D, Li MM, Fu H, Fan J, Leung H (2020) Centrality graph convolutional networks for skeleton-based action recognition, arXiv:2003.03007

Download references

Author information

Authors and Affiliations

NorthEast Petroleum University, Daqing, China
Bing-Kun Gao, Le Dong, Hong-Bo Bi & Yun-Ze Bi

Authors

Bing-Kun Gao
View author publications
You can also search for this author in PubMed Google Scholar
Le Dong
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Bo Bi
View author publications
You can also search for this author in PubMed Google Scholar
Yun-Ze Bi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong-Bo Bi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gao, BK., Dong, L., Bi, HB. et al. Focus on temporal graph convolutional networks with unified attention for skeleton-based action recognition. Appl Intell 52, 5608–5616 (2022). https://doi.org/10.1007/s10489-021-02723-6

Download citation

Accepted: 26 July 2021
Published: 17 August 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10489-021-02723-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Focus on temporal graph convolutional networks with unified attention for skeleton-based action recognition

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Focus on temporal graph convolutional networks with unified attention for skeleton-based action recognition

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation