skip to main content
10.1145/3372806.3372814acmotherconferencesArticle/Chapter ViewAbstractPublication PagesspmlConference Proceedingsconference-collections
research-article

An Attention-Enhanced Recurrent Graph Convolutional Network for Skeleton-Based Action Recognition

Published: 21 January 2020 Publication History

Abstract

Dynamic movements of human skeleton have attracted more and more attention as a robust modality for action recognition. As not all temporal stages and skeleton joints are informative for action recognition, and the irrelevant information often brings noise which can degrade the detection performance, extracting discriminative temporal and spatial features becomes an important task. In this paper, we propose a novel end-to-end attention-enhanced recurrent graph convolutional network (AR-GCN) for skeleton-based action recognition. An attention-enhanced mechanism is employed in AR-GCN to pay different levels of attention to different temporal stages and spatial joints. This approach overcomes the information loss caused by only using keyframes and key joints. In particular, AR-GCN combines the graph convolutional network (GCN) with the bidirectional recurrent neural network (BRNN), which retains the irregular joints expressive power of the original GCN, while promoting its sequential modeling ability by introducing a recurrent network. Experimental results demonstrate the effectiveness of our proposed model on the widely used NTU and Kinetics datasets.

References

[1]
James Atwood and Don Towsley. 2016. Diffusion-convolutional neural networks. In Advances in Neural Information Processing Systems. 1993--2001.
[2]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multiperson 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7291--7299.
[3]
Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1110--1118.
[4]
David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems. 2224--2232.
[5]
Basura Fernando, Efstratios Gavves, Jose M Oramas, Amir Ghodrati, and Tinne Tuytelaars. 2015. Modeling video evolution for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5378--5387.
[6]
Xiang Gao, Wei Hu, Jiaxiang Tang, Pan Pan, Jiaying Liu, and Zongming Guo. 2018. Generalized Graph Convolutional Networks for Skeleton-based Action Recognition. arXiv preprint arXiv:1811.12013 (2018).
[7]
Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, et al. 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017).
[8]
Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2017. A new representation of skeleton sequences for 3d action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3288--3297.
[9]
Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2018. Learning clip representations for skeleton-based 3D action recognition. IEEE Transactions on Image Processing 27, 6 (2018), 2842--2855.
[10]
Tae Soo Kim and Austin Reiter. 2017. Interpretable 3d human action analysis with temporal convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 1623--1631.
[11]
Thomas N Kipf and MaxWelling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
[12]
Inwoong Lee, Doyoung Kim, Seoungyoon Kang, and Sanghoon Lee. 2017. Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In Proceedings of the IEEE International Conference on Computer Vision. 1012--1020.
[13]
Chaolong Li, Zhen Cui, Wenming Zheng, Chunyan Xu, and Jian Yang. 2018. Spatio-temporal graph convolution for skeleton based action recognition. In Thirty-Second AAAI Conference on Artificial Intelligence.
[14]
Ruiyu Li, Makarand Tapaswi, Renjie Liao, Jiaya Jia, Raquel Urtasun, and Sanja Fidler. 2017. Situation recognition with graph neural networks. In Proceedings of the IEEE International Conference on Computer Vision. 4173--4182.
[15]
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).
[16]
Jun Liu, Amir Shahroudy, Dong Xu, Alex C Kot, and Gang Wang. 2018. Skeletonbased action recognition using spatio-temporal LSTM network with trust gates. IEEE transactions on pattern analysis and machine intelligence 40, 12 (2018), 3007-- 3021.
[17]
Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang. 2016. Spatio-temporal lstm with trust gates for 3d human action recognition. In European Conference on Computer Vision. Springer, 816--833.
[18]
Li Liu, Ling Shao, and Peter Rockett. 2013. Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition. Pattern recognition 46, 7 (2013), 1810--1818.
[19]
Mengyuan Liu, Hong Liu, and Chen Chen. 2017. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition 68 (2017), 346--362.
[20]
Hossein Rahmani and Mohammed Bennamoun. 2017. Learning action recognition model from depth and skeleton videos. In Proceedings of the IEEE International Conference on Computer Vision. 5832--5841.
[21]
Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+ D: A large scale dataset for 3D human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1010--1019.
[22]
Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, and Tieniu Tan. 2019. An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition. arXiv preprint arXiv:1902.09130 (2019).
[23]
Chenyang Si, Ya Jing, Wei Wang, Liang Wang, and Tieniu Tan. 2018. Skeletonbased action recognition with spatial reasoning and temporal stack learning. In Proceedings of the European Conference on Computer Vision (ECCV). 103--118.
[24]
Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, and Jiaying Liu. 2017. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In Thirty-first AAAI conference on artificial intelligence.
[25]
Yansong Tang, Yi Tian, Jiwen Lu, Peiyang Li, and Jie Zhou. 2018. Deep progressive reinforcement learning for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5323--5332.
[26]
Raviteja Vemulapalli, Felipe Arrate, and Rama Chellappa. 2014. Human action recognition by representing 3d skeletons as points in a lie group. In Proceedings of the IEEE conference on computer vision and pattern recognition. 588--595.
[27]
Hongsong Wang and Liang Wang. 2017. Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 499--508.
[28]
Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-Second AAAI Conference on Artificial Intelligence.
[29]
Mingmin Zhao, Tianhong Li, Mohammad Abu Alsheikh, Yonglong Tian, Hang Zhao, Antonio Torralba, and Dina Katabi. 2018. Through-wall human pose estimation using radio signals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7356--7365.
[30]
Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. 2018. Graph Neural Networks: A Review of Methods and Applications. arXiv preprint arXiv:1812.08434 (2018).
[31]
Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li, Li Shen, and Xiaohui Xie. 2016. Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In Thirtieth AAAI Conference on Artificial Intelligence.

Cited By

View all
  • (2024)Enhancing human behavior recognition with spatiotemporal graph convolutional neural networks and skeleton sequencesEURASIP Journal on Advances in Signal Processing10.1186/s13634-024-01156-w2024:1Online publication date: 7-May-2024
  • (2024)DeGCN: Deformable Graph Convolutional Networks for Skeleton-Based Action RecognitionIEEE Transactions on Image Processing10.1109/TIP.2024.337888633(2477-2490)Online publication date: 2024
  • (2024)MSA-GCN: Exploiting Multi-Scale Temporal Dynamics With Adaptive Graph Convolution for Skeleton-Based Action RecognitionIEEE Access10.1109/ACCESS.2024.352017212(193552-193563)Online publication date: 2024
  • Show More Cited By

Index Terms

  1. An Attention-Enhanced Recurrent Graph Convolutional Network for Skeleton-Based Action Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SPML '19: Proceedings of the 2019 2nd International Conference on Signal Processing and Machine Learning
    November 2019
    135 pages
    ISBN:9781450372213
    DOI:10.1145/3372806
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • Ritsumeikan University: Ritsumeikan University

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 January 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. action recognition
    2. attention-enhanced
    3. graph convolution
    4. recurrent network
    5. skeletonbased

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    SPML '19

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enhancing human behavior recognition with spatiotemporal graph convolutional neural networks and skeleton sequencesEURASIP Journal on Advances in Signal Processing10.1186/s13634-024-01156-w2024:1Online publication date: 7-May-2024
    • (2024)DeGCN: Deformable Graph Convolutional Networks for Skeleton-Based Action RecognitionIEEE Transactions on Image Processing10.1109/TIP.2024.337888633(2477-2490)Online publication date: 2024
    • (2024)MSA-GCN: Exploiting Multi-Scale Temporal Dynamics With Adaptive Graph Convolution for Skeleton-Based Action RecognitionIEEE Access10.1109/ACCESS.2024.352017212(193552-193563)Online publication date: 2024
    • (2023)Deep Learning for Human Activity Recognition on 3D Human Skeleton: Survey and Comparative StudySensors10.3390/s2311512123:11(5121)Online publication date: 27-May-2023
    • (2023)Adaptive Multi-Scale Difference Graph Convolution Network for Skeleton-Based Action RecognitionElectronics10.3390/electronics1213285212:13(2852)Online publication date: 28-Jun-2023
    • (2023)A review of skeleton-based human action recognitionJournal of Image and Graphics10.11834/jig.23004628:12(3651-3669)Online publication date: 2023
    • (2022)Skeleton Graph-Neural-Network-Based Human Action Recognition: A SurveySensors10.3390/s2206209122:6(2091)Online publication date: 8-Mar-2022
    • (2022)Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/347272218:2(1-24)Online publication date: 16-Feb-2022
    • (2022)Learning Hierarchical Video Graph Networks for One-Stop Video DeliveryACM Transactions on Multimedia Computing, Communications, and Applications10.1145/346688618:1(1-23)Online publication date: 27-Jan-2022
    • (2022)Deep learning-based for human segmentation and tracking, 3D human pose estimation and action recognition on monocular video of MADS datasetMultimedia Tools and Applications10.1007/s11042-022-13921-w82:14(20771-20818)Online publication date: 25-Oct-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media