research-article

An Attention-Enhanced Recurrent Graph Convolutional Network for Skeleton-Based Action Recognition

Authors:

Wai ChenAuthors Info & Claims

SPML '19: Proceedings of the 2019 2nd International Conference on Signal Processing and Machine Learning

Pages 79 - 84

https://doi.org/10.1145/3372806.3372814

Published: 21 January 2020 Publication History

SPML '19: Proceedings of the 2019 2nd International Conference on Signal Processing and Machine Learning

An Attention-Enhanced Recurrent Graph Convolutional Network for Skeleton-Based Action Recognition

Pages 79 - 84

Abstract
References

Abstract

Dynamic movements of human skeleton have attracted more and more attention as a robust modality for action recognition. As not all temporal stages and skeleton joints are informative for action recognition, and the irrelevant information often brings noise which can degrade the detection performance, extracting discriminative temporal and spatial features becomes an important task. In this paper, we propose a novel end-to-end attention-enhanced recurrent graph convolutional network (AR-GCN) for skeleton-based action recognition. An attention-enhanced mechanism is employed in AR-GCN to pay different levels of attention to different temporal stages and spatial joints. This approach overcomes the information loss caused by only using keyframes and key joints. In particular, AR-GCN combines the graph convolutional network (GCN) with the bidirectional recurrent neural network (BRNN), which retains the irregular joints expressive power of the original GCN, while promoting its sequential modeling ability by introducing a recurrent network. Experimental results demonstrate the effectiveness of our proposed model on the widely used NTU and Kinetics datasets.

References

[1]

James Atwood and Don Towsley. 2016. Diffusion-convolutional neural networks. In Advances in Neural Information Processing Systems. 1993--2001.

[2]

Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multiperson 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7291--7299.

[3]

Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1110--1118.

[4]

David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems. 2224--2232.

[5]

Basura Fernando, Efstratios Gavves, Jose M Oramas, Amir Ghodrati, and Tinne Tuytelaars. 2015. Modeling video evolution for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5378--5387.

[6]

Xiang Gao, Wei Hu, Jiaxiang Tang, Pan Pan, Jiaying Liu, and Zongming Guo. 2018. Generalized Graph Convolutional Networks for Skeleton-based Action Recognition. arXiv preprint arXiv:1811.12013 (2018).

[7]

Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, et al. 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017).

[8]

Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2017. A new representation of skeleton sequences for 3d action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3288--3297.

[9]

Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2018. Learning clip representations for skeleton-based 3D action recognition. IEEE Transactions on Image Processing 27, 6 (2018), 2842--2855.

[10]

Tae Soo Kim and Austin Reiter. 2017. Interpretable 3d human action analysis with temporal convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 1623--1631.

[11]

Thomas N Kipf and MaxWelling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).

[12]

Inwoong Lee, Doyoung Kim, Seoungyoon Kang, and Sanghoon Lee. 2017. Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In Proceedings of the IEEE International Conference on Computer Vision. 1012--1020.

[13]

Chaolong Li, Zhen Cui, Wenming Zheng, Chunyan Xu, and Jian Yang. 2018. Spatio-temporal graph convolution for skeleton based action recognition. In Thirty-Second AAAI Conference on Artificial Intelligence.

[14]

Ruiyu Li, Makarand Tapaswi, Renjie Liao, Jiaya Jia, Raquel Urtasun, and Sanja Fidler. 2017. Situation recognition with graph neural networks. In Proceedings of the IEEE International Conference on Computer Vision. 4173--4182.

[15]

Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).

[16]

Jun Liu, Amir Shahroudy, Dong Xu, Alex C Kot, and Gang Wang. 2018. Skeletonbased action recognition using spatio-temporal LSTM network with trust gates. IEEE transactions on pattern analysis and machine intelligence 40, 12 (2018), 3007-- 3021.

[17]

Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang. 2016. Spatio-temporal lstm with trust gates for 3d human action recognition. In European Conference on Computer Vision. Springer, 816--833.

[18]

Li Liu, Ling Shao, and Peter Rockett. 2013. Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition. Pattern recognition 46, 7 (2013), 1810--1818.

[19]

Mengyuan Liu, Hong Liu, and Chen Chen. 2017. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition 68 (2017), 346--362.

Digital Library

[20]

Hossein Rahmani and Mohammed Bennamoun. 2017. Learning action recognition model from depth and skeleton videos. In Proceedings of the IEEE International Conference on Computer Vision. 5832--5841.

[21]

Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+ D: A large scale dataset for 3D human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1010--1019.

[22]

Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, and Tieniu Tan. 2019. An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition. arXiv preprint arXiv:1902.09130 (2019).

[23]

Chenyang Si, Ya Jing, Wei Wang, Liang Wang, and Tieniu Tan. 2018. Skeletonbased action recognition with spatial reasoning and temporal stack learning. In Proceedings of the European Conference on Computer Vision (ECCV). 103--118.

Digital Library

[24]

Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, and Jiaying Liu. 2017. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In Thirty-first AAAI conference on artificial intelligence.

Digital Library

[25]

Yansong Tang, Yi Tian, Jiwen Lu, Peiyang Li, and Jie Zhou. 2018. Deep progressive reinforcement learning for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5323--5332.

[26]

Raviteja Vemulapalli, Felipe Arrate, and Rama Chellappa. 2014. Human action recognition by representing 3d skeletons as points in a lie group. In Proceedings of the IEEE conference on computer vision and pattern recognition. 588--595.

Digital Library

[27]

Hongsong Wang and Liang Wang. 2017. Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 499--508.

[28]

Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-Second AAAI Conference on Artificial Intelligence.

[29]

Mingmin Zhao, Tianhong Li, Mohammad Abu Alsheikh, Yonglong Tian, Hang Zhao, Antonio Torralba, and Dina Katabi. 2018. Through-wall human pose estimation using radio signals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7356--7365.

[30]

Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. 2018. Graph Neural Networks: A Review of Methods and Applications. arXiv preprint arXiv:1812.08434 (2018).

[31]

Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li, Li Shen, and Xiaohui Xie. 2016. Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In Thirtieth AAAI Conference on Artificial Intelligence.

Digital Library

Cited By

Xu JLiu FWang QZou RWang YZheng JDu SZeng W(2024)Enhancing human behavior recognition with spatiotemporal graph convolutional neural networks and skeleton sequencesEURASIP Journal on Advances in Signal Processing10.1186/s13634-024-01156-w2024:1Online publication date: 7-May-2024
https://doi.org/10.1186/s13634-024-01156-w
Myung WSu NXue JWang G(2024)DeGCN: Deformable Graph Convolutional Networks for Skeleton-Based Action RecognitionIEEE Transactions on Image Processing10.1109/TIP.2024.337888633(2477-2490)Online publication date: 2024
https://doi.org/10.1109/TIP.2024.3378886
Comivi Alowonou KHan J(2024)MSA-GCN: Exploiting Multi-Scale Temporal Dynamics With Adaptive Graph Convolution for Skeleton-Based Action RecognitionIEEE Access10.1109/ACCESS.2024.352017212(193552-193563)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3520172
Show More Cited By

Index Terms

An Attention-Enhanced Recurrent Graph Convolutional Network for Skeleton-Based Action Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding

Recommendations

A Semantics-Guided Graph Convolutional Network for Skeleton-Based Action Recognition
ICIAI '20: Proceedings of the 2020 the 4th International Conference on Innovation in Artificial Intelligence

Action recognition with skeleton data is a challenging task in computer vision. Graph convolutional networks (GCNs), which directly model the human body skeletons as the graph structure, have achieved remarkable performance. However, current ...
Augmentation of Elman Recurrent Network Learning with Particle Swarm Optimization
AMS '08: Proceedings of the 2008 Second Asia International Conference on Modelling & Simulation (AMS)

Despite a variety of Artificial Neural Network (ANN) categories, Backpropagation Network (BP) and Elman Recurrent Network (ERN) are the widespread modus operandi in real applications. However, there are many drawbacks in BP network, for instance, ...
A Spatial Attention-Enhanced Multi-Timescale Graph Convolutional Network for Skeleton-Based Action Recognition
AIPR '20: Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition

How to effectively extract discriminative spatial and temporal features is important for skeleton-based action recognition. However, current researches on skeleton-based action recognition mainly focus on the natural connections of the skeleton and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SPML '19: Proceedings of the 2019 2nd International Conference on Signal Processing and Machine Learning

November 2019

135 pages

ISBN:9781450372213

DOI:10.1145/3372806

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Ritsumeikan University: Ritsumeikan University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 January 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SPML '19

SPML '19: 2019 2nd International Conference on Signal Processing and Machine Learning

November 27 - 29, 2019

Hangzhou, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
281
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)1

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xu JLiu FWang QZou RWang YZheng JDu SZeng W(2024)Enhancing human behavior recognition with spatiotemporal graph convolutional neural networks and skeleton sequencesEURASIP Journal on Advances in Signal Processing10.1186/s13634-024-01156-w2024:1Online publication date: 7-May-2024
https://doi.org/10.1186/s13634-024-01156-w
Myung WSu NXue JWang G(2024)DeGCN: Deformable Graph Convolutional Networks for Skeleton-Based Action RecognitionIEEE Transactions on Image Processing10.1109/TIP.2024.337888633(2477-2490)Online publication date: 2024
https://doi.org/10.1109/TIP.2024.3378886
Comivi Alowonou KHan J(2024)MSA-GCN: Exploiting Multi-Scale Temporal Dynamics With Adaptive Graph Convolution for Skeleton-Based Action RecognitionIEEE Access10.1109/ACCESS.2024.352017212(193552-193563)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3520172
Nguyen HNguyen TScherer RLe V(2023)Deep Learning for Human Activity Recognition on 3D Human Skeleton: Survey and Comparative StudySensors10.3390/s2311512123:11(5121)Online publication date: 27-May-2023
https://doi.org/10.3390/s23115121
Wang XGan ZJin LXiao YHe M(2023)Adaptive Multi-Scale Difference Graph Convolution Network for Skeleton-Based Action RecognitionElectronics10.3390/electronics1213285212:13(2852)Online publication date: 28-Jun-2023
https://doi.org/10.3390/electronics12132852
Jian LXuanfeng LBo ZJian Z(2023)A review of skeleton-based human action recognitionJournal of Image and Graphics10.11834/jig.23004628:12(3651-3669)Online publication date: 2023
https://doi.org/10.11834/jig.230046
Feng MMeunier J(2022)Skeleton Graph-Neural-Network-Based Human Action Recognition: A SurveySensors10.3390/s2206209122:6(2091)Online publication date: 8-Mar-2022
https://doi.org/10.3390/s22062091
Tang YLiu XYu XZhang DLu JZhou J(2022)Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/347272218:2(1-24)Online publication date: 16-Feb-2022
https://dl.acm.org/doi/10.1145/3472722
Song YGao JYang XXu C(2022)Learning Hierarchical Video Graph Networks for One-Stop Video DeliveryACM Transactions on Multimedia Computing, Communications, and Applications10.1145/346688618:1(1-23)Online publication date: 27-Jan-2022
https://dl.acm.org/doi/10.1145/3466886
Le V(2022)Deep learning-based for human segmentation and tracking, 3D human pose estimation and action recognition on monocular video of MADS datasetMultimedia Tools and Applications10.1007/s11042-022-13921-w82:14(20771-20818)Online publication date: 25-Oct-2022
https://doi.org/10.1007/s11042-022-13921-w
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten