research-article

A Semantics-Guided Graph Convolutional Network for Skeleton-Based Action Recognition

Authors:

Wai ChenAuthors Info & Claims

ICIAI '20: Proceedings of the 2020 the 4th International Conference on Innovation in Artificial Intelligence

Pages 130 - 136

https://doi.org/10.1145/3390557.3394129

Published: 04 June 2020 Publication History

Abstract

Action recognition with skeleton data is a challenging task in computer vision. Graph convolutional networks (GCNs), which directly model the human body skeletons as the graph structure, have achieved remarkable performance. However, current architectures of GCNs are limited to the small receptive field of convolution filters, only capturing local physical dependencies among joints and using all skeleton data indiscriminately. To address these limitations and to achieve a flexible graph representation of the skeleton features, we propose a novel semantics-guided graph convolutional network (Sem-GCN) for skeleton-based action recognition. Three types of semantic graph modules (structural graph extraction module, actional graph inference module and attention graph iteration module) are employed in Sem-GCN to aggregate L-hop joint neighbors' information, to capture action-specific latent dependencies and to distribute importance level. Combing these semantic graphs into a generalized skeleton graph, we further propose the semantics-guided graph convolution block, which stacks semantic graph convolution and temporal convolution, to learn both semantic and temporal features for action recognition. Experimental results demonstrate the effectiveness of our proposed model on the widely used NTU and Kinetics datasets.

References

[1]

James Atwood and Don Towsley. 2016. Diffusion-convolutional neural networks. In Advances in Neural Information Processing Systems. 1993--2001.

[2]

Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7291--7299.

[3]

Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1110--1118.

[4]

David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems. 2224--2232.

[5]

Basura Fernando, Efstratios Gavves, Jose M Oramas, Amir Ghodrati, and Tinne Tuytelaars. 2015. Modeling video evolution for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5378--5387.

[6]

Xiang Gao, Wei Hu, Jiaxiang Tang, Pan Pan, Jiaying Liu, and Zongming Guo. 2018. Generalized Graph Convolutional Networks for Skeleton-based Action Recognition. arXiv preprint arXiv.1811.12013 (2018).

[7]

Fei Han, Brian Reily, William Hoff, and Hao Zhang. 2017. Space-time representation of people based on 3D skeletal data: A review. Computer Vision and Image Understanding 158 (2017), 85--105.

Digital Library

[8]

Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, et al. 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017).

[9]

Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2017. A new representation of skeleton sequences for 3d action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3288--3297.

[10]

Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2018. Learning clip representations for skeleton-based 3D action recognition. IEEE Transactions on Image Processing 27, 6 (2018), 2842--2855.

[11]

Tae Soo Kim and Austin Reiter. 2017. Interpretable 3d human action analysis with temporal convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 1623--1631.

[12]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv 1609.02907 (2016).

[13]

Yu Kong and Yun Fu. 2018. Human Action Recognition and Prediction: A Survey. arXiv preprint arXiv:1806.11230 (2018).

[14]

Piotr Koniusz, Anoop Cherian, and Fatih Porikli. 2016. Tensor representations via kernel linearization for action recognition from 3d skeletons. In European Conference on Computer Vision. Springer, 37--53.

[15]

Inwoong Lee, Doyoung Kim, Seoungyoon Kang, and Sanghoon Lee. 2017. Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In Proceedings of the IEEE International Conference on Computer Vision. 1012--1020.

[16]

Chaolong Li, Zhen Cui, Wenming Zheng, Chunyan Xu, and Jian Yang. 2018. Spatio-temporal graph convolution for skeleton based action recognition. In Thirty-Second AAAI Conference on Artificial Intelligence.

[17]

Ruiyu Li, Makarand Tapaswi, Renjie Liao, Jiaya Jia, Raquel Urtasun, and Sanja Fidler. 2017. Situation recognition with graph neural networks. In Proceedings of the IEEE International Conference on Computer Vision. 4173--4182.

[18]

Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).

[19]

Jun Liu, Amir Shahroudy, Dong Xu, Alex C Kot, and Gang Wang. 2018. Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE transactions on pattern analysis and machine intelligence 40, 12 (2018), 3007--3021.

Digital Library

[20]

Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang. 2016. Spatio-temporal lstm with trust gates for 3d human action recognition. In European Conference on Computer Vision. Springer, 816--833.

[21]

Li Liu, Ling Shao, and Peter Rockett. 2013. Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition. Pattern recognition 46, 7 (2013), 1810--1818.

[22]

Mengyuan Liu, Hong Liu, and Chen Chen. 2017. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition 68 (2017), 346--362.

Digital Library

[23]

Liliana Lo Presti and Marco La Cascia. 2016. 3D skeleton-based human action classification: A survey. Pattern Recognition 53 (2016), 130--147.

Digital Library

[24]

Hossein Rahmani and Mohammed Bennamoun. 2017. Learning action recognition model from depth and skeleton videos. In Proceedings of the IEEE International Conference on Computer Vision. 5832--5841.

[25]

Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+ D: A large scale dataset for 3D human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1010--1019.

[26]

Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, and Tieniu Tan. 2019. An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition. arXiv preprint arXiv:1902.09130 (2019).

[27]

Chenyang Si, Ya Jing, Wei Wang, Liang Wang, and Tieniu Tan. 2018. Skeleton-based action recognition with spatial reasoning and temporal stack learning. In Proceedings of the European Conference on Computer Vision (ECCV). 103--118.

Digital Library

[28]

Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, and Jiaying Liu. 2017. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In Thirty-first AAAI conference on artificial intelligence.

Digital Library

[29]

Yansong Tang, Yi Tian, Jiwen Lu, Peiyang Li, and Jie Zhou. 2018. Deep progressive reinforcement learning for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5323--5332.

[30]

Raviteja Vemulapalli, Felipe Arrate, and Rama Chellappa. 2014. Human action recognition by representing 3d skeletons as points in a lie group. In Proceedings of the IEEE conference on computer vision and pattern recognition. 588--595.

Digital Library

[31]

Hongsong Wang and Liang Wang. 2017. Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 499--508.

[32]

Junwu Weng, Chaoqun Weng, and Junsong Yuan. 2017. Spatio-temporal naive-bayes nearest-neighbor (st-nbnn) for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4171--4180.

[33]

Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-Second AAAI Conference on Artificial Intelligence.

[34]

Mingmin Zhao, Tianhong Li, Mohammad Abu Alsheikh, Yonglong Tian, Hang Zhao, Antonio Torralba, and Dina Katabi. 2018. Through-wall human pose estimation using radio signals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7356--7365.

[35]

Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li, Li Shen, and Xiaohui Xie. 2016. Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In Thirtieth AAAI Conference on Artificial Intelligence.

Digital Library

Cited By

Chen ZHuang WLiu HWang ZWen YWang S(2024)ST-TGR: Spatio-Temporal Representation Learning for Skeleton-Based Teaching Gesture RecognitionSensors10.3390/s2408258924:8(2589)Online publication date: 18-Apr-2024
https://doi.org/10.3390/s24082589
Yao JXiang FWei XYuan K(2024)NEF-GGCN: Node-Edge Fusion Gated Graph Convolutional Networks For Skeleton-based Medical Action Recognition2024 IEEE 9th International Conference on Data Science in Cyberspace (DSC)10.1109/DSC63484.2024.00020(93-100)Online publication date: 23-Aug-2024
https://doi.org/10.1109/DSC63484.2024.00020
Chen MLiang JLiu H(2024)Multi-stream P&U adaptive graph convolutional networks for skeleton-based action recognitionThe Journal of Supercomputing10.1007/s11227-024-05900-980:8(11614-11639)Online publication date: 29-Jan-2024
https://dl.acm.org/doi/10.1007/s11227-024-05900-9
Show More Cited By

Index Terms

A Semantics-Guided Graph Convolutional Network for Skeleton-Based Action Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding

Recommendations

Attention-Based Generative Graph Convolutional Network for Skeleton-Based Human Action Recognition
ICVIP '19: Proceedings of the 3rd International Conference on Video and Image Processing

Skeleton-based action recognition is a challenging field in computer vision. Graph representations of skeleton are used to learn the connection patterns of human joints. However, the fixed handcraft graph of human skeleton topology can not well ...
An Attention-Enhanced Recurrent Graph Convolutional Network for Skeleton-Based Action Recognition
SPML '19: Proceedings of the 2019 2nd International Conference on Signal Processing and Machine Learning

Dynamic movements of human skeleton have attracted more and more attention as a robust modality for action recognition. As not all temporal stages and skeleton joints are informative for action recognition, and the irrelevant information often brings ...
A Spatial Attention-Enhanced Multi-Timescale Graph Convolutional Network for Skeleton-Based Action Recognition
AIPR '20: Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition

How to effectively extract discriminative spatial and temporal features is important for skeleton-based action recognition. However, current researches on skeleton-based action recognition mainly focus on the natural connections of the skeleton and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICIAI '20: Proceedings of the 2020 the 4th International Conference on Innovation in Artificial Intelligence

May 2020

271 pages

ISBN:9781450376587

DOI:10.1145/3390557

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

The Hong Kong Polytechnic: The Hong Kong Polytechnic University
Xi'an Jiaotong-Liverpool University: Xi'an Jiaotong-Liverpool University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICIAI 2020

ICIAI 2020: 2020 the 4th International Conference on Innovation in Artificial Intelligence

May 8 - 11, 2020

Xiamen, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
278
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)2

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen ZHuang WLiu HWang ZWen YWang S(2024)ST-TGR: Spatio-Temporal Representation Learning for Skeleton-Based Teaching Gesture RecognitionSensors10.3390/s2408258924:8(2589)Online publication date: 18-Apr-2024
https://doi.org/10.3390/s24082589
Yao JXiang FWei XYuan K(2024)NEF-GGCN: Node-Edge Fusion Gated Graph Convolutional Networks For Skeleton-based Medical Action Recognition2024 IEEE 9th International Conference on Data Science in Cyberspace (DSC)10.1109/DSC63484.2024.00020(93-100)Online publication date: 23-Aug-2024
https://doi.org/10.1109/DSC63484.2024.00020
Chen MLiang JLiu H(2024)Multi-stream P&U adaptive graph convolutional networks for skeleton-based action recognitionThe Journal of Supercomputing10.1007/s11227-024-05900-980:8(11614-11639)Online publication date: 29-Jan-2024
https://dl.acm.org/doi/10.1007/s11227-024-05900-9
Nguyen HNguyen TScherer RLe V(2023)Deep Learning for Human Activity Recognition on 3D Human Skeleton: Survey and Comparative StudySensors10.3390/s2311512123:11(5121)Online publication date: 27-May-2023
https://doi.org/10.3390/s23115121
Qi HGuo XXin HLi SChen E(2023)Comprehensive receptive field adaptive graph convolutional networks for action recognitionJournal of Visual Communication and Image Representation10.1016/j.jvcir.2023.10395397(103953)Online publication date: Dec-2023
https://doi.org/10.1016/j.jvcir.2023.103953
Wu QZhao XZhang ZZhang TPeng Z(2023)Algorithm for Human Abnormal Behavior Recognition Based on Improved Spatial Temporal Graph Convolutional NetworksAdvanced Computational Intelligence and Intelligent Informatics10.1007/978-981-99-7593-8_4(29-42)Online publication date: 30-Oct-2023
https://doi.org/10.1007/978-981-99-7593-8_4
Kim JSeo HNaseem MLee C(2022)Pathological-Gait Recognition Using Spatiotemporal Graph Convolutional Networks and Attention ModelSensors10.3390/s2213486322:13(4863)Online publication date: 27-Jun-2022
https://doi.org/10.3390/s22134863
Feng MMeunier J(2022)Skeleton Graph-Neural-Network-Based Human Action Recognition: A SurveySensors10.3390/s2206209122:6(2091)Online publication date: 8-Mar-2022
https://doi.org/10.3390/s22062091
Fanuel MYuan XNam Kim HQingge LRoy K(2021)A Survey on Skeleton-Based Activity Recognition using Graph Convolutional Networks (GCN)2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)10.1109/ISPA52656.2021.9552064(177-182)Online publication date: 13-Sep-2021
https://doi.org/10.1109/ISPA52656.2021.9552064
Hu ZLee E(2020)Dual Attention-Guided Multiscale Dynamic Aggregate Graph Convolutional Networks for Skeleton-Based Human Action RecognitionSymmetry10.3390/sym1210158912:10(1589)Online publication date: 24-Sep-2020
https://doi.org/10.3390/sym12101589

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten