skip to main content
10.1145/3390557.3394129acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiciaiConference Proceedingsconference-collections
research-article

A Semantics-Guided Graph Convolutional Network for Skeleton-Based Action Recognition

Published: 04 June 2020 Publication History

Abstract

Action recognition with skeleton data is a challenging task in computer vision. Graph convolutional networks (GCNs), which directly model the human body skeletons as the graph structure, have achieved remarkable performance. However, current architectures of GCNs are limited to the small receptive field of convolution filters, only capturing local physical dependencies among joints and using all skeleton data indiscriminately. To address these limitations and to achieve a flexible graph representation of the skeleton features, we propose a novel semantics-guided graph convolutional network (Sem-GCN) for skeleton-based action recognition. Three types of semantic graph modules (structural graph extraction module, actional graph inference module and attention graph iteration module) are employed in Sem-GCN to aggregate L-hop joint neighbors' information, to capture action-specific latent dependencies and to distribute importance level. Combing these semantic graphs into a generalized skeleton graph, we further propose the semantics-guided graph convolution block, which stacks semantic graph convolution and temporal convolution, to learn both semantic and temporal features for action recognition. Experimental results demonstrate the effectiveness of our proposed model on the widely used NTU and Kinetics datasets.

References

[1]
James Atwood and Don Towsley. 2016. Diffusion-convolutional neural networks. In Advances in Neural Information Processing Systems. 1993--2001.
[2]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7291--7299.
[3]
Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1110--1118.
[4]
David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems. 2224--2232.
[5]
Basura Fernando, Efstratios Gavves, Jose M Oramas, Amir Ghodrati, and Tinne Tuytelaars. 2015. Modeling video evolution for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5378--5387.
[6]
Xiang Gao, Wei Hu, Jiaxiang Tang, Pan Pan, Jiaying Liu, and Zongming Guo. 2018. Generalized Graph Convolutional Networks for Skeleton-based Action Recognition. arXiv preprint arXiv.1811.12013 (2018).
[7]
Fei Han, Brian Reily, William Hoff, and Hao Zhang. 2017. Space-time representation of people based on 3D skeletal data: A review. Computer Vision and Image Understanding 158 (2017), 85--105.
[8]
Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, et al. 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017).
[9]
Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2017. A new representation of skeleton sequences for 3d action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3288--3297.
[10]
Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2018. Learning clip representations for skeleton-based 3D action recognition. IEEE Transactions on Image Processing 27, 6 (2018), 2842--2855.
[11]
Tae Soo Kim and Austin Reiter. 2017. Interpretable 3d human action analysis with temporal convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 1623--1631.
[12]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv 1609.02907 (2016).
[13]
Yu Kong and Yun Fu. 2018. Human Action Recognition and Prediction: A Survey. arXiv preprint arXiv:1806.11230 (2018).
[14]
Piotr Koniusz, Anoop Cherian, and Fatih Porikli. 2016. Tensor representations via kernel linearization for action recognition from 3d skeletons. In European Conference on Computer Vision. Springer, 37--53.
[15]
Inwoong Lee, Doyoung Kim, Seoungyoon Kang, and Sanghoon Lee. 2017. Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In Proceedings of the IEEE International Conference on Computer Vision. 1012--1020.
[16]
Chaolong Li, Zhen Cui, Wenming Zheng, Chunyan Xu, and Jian Yang. 2018. Spatio-temporal graph convolution for skeleton based action recognition. In Thirty-Second AAAI Conference on Artificial Intelligence.
[17]
Ruiyu Li, Makarand Tapaswi, Renjie Liao, Jiaya Jia, Raquel Urtasun, and Sanja Fidler. 2017. Situation recognition with graph neural networks. In Proceedings of the IEEE International Conference on Computer Vision. 4173--4182.
[18]
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).
[19]
Jun Liu, Amir Shahroudy, Dong Xu, Alex C Kot, and Gang Wang. 2018. Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE transactions on pattern analysis and machine intelligence 40, 12 (2018), 3007--3021.
[20]
Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang. 2016. Spatio-temporal lstm with trust gates for 3d human action recognition. In European Conference on Computer Vision. Springer, 816--833.
[21]
Li Liu, Ling Shao, and Peter Rockett. 2013. Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition. Pattern recognition 46, 7 (2013), 1810--1818.
[22]
Mengyuan Liu, Hong Liu, and Chen Chen. 2017. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition 68 (2017), 346--362.
[23]
Liliana Lo Presti and Marco La Cascia. 2016. 3D skeleton-based human action classification: A survey. Pattern Recognition 53 (2016), 130--147.
[24]
Hossein Rahmani and Mohammed Bennamoun. 2017. Learning action recognition model from depth and skeleton videos. In Proceedings of the IEEE International Conference on Computer Vision. 5832--5841.
[25]
Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+ D: A large scale dataset for 3D human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1010--1019.
[26]
Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, and Tieniu Tan. 2019. An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition. arXiv preprint arXiv:1902.09130 (2019).
[27]
Chenyang Si, Ya Jing, Wei Wang, Liang Wang, and Tieniu Tan. 2018. Skeleton-based action recognition with spatial reasoning and temporal stack learning. In Proceedings of the European Conference on Computer Vision (ECCV). 103--118.
[28]
Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, and Jiaying Liu. 2017. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In Thirty-first AAAI conference on artificial intelligence.
[29]
Yansong Tang, Yi Tian, Jiwen Lu, Peiyang Li, and Jie Zhou. 2018. Deep progressive reinforcement learning for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5323--5332.
[30]
Raviteja Vemulapalli, Felipe Arrate, and Rama Chellappa. 2014. Human action recognition by representing 3d skeletons as points in a lie group. In Proceedings of the IEEE conference on computer vision and pattern recognition. 588--595.
[31]
Hongsong Wang and Liang Wang. 2017. Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 499--508.
[32]
Junwu Weng, Chaoqun Weng, and Junsong Yuan. 2017. Spatio-temporal naive-bayes nearest-neighbor (st-nbnn) for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4171--4180.
[33]
Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-Second AAAI Conference on Artificial Intelligence.
[34]
Mingmin Zhao, Tianhong Li, Mohammad Abu Alsheikh, Yonglong Tian, Hang Zhao, Antonio Torralba, and Dina Katabi. 2018. Through-wall human pose estimation using radio signals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7356--7365.
[35]
Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li, Li Shen, and Xiaohui Xie. 2016. Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In Thirtieth AAAI Conference on Artificial Intelligence.

Cited By

View all
  • (2024)ST-TGR: Spatio-Temporal Representation Learning for Skeleton-Based Teaching Gesture RecognitionSensors10.3390/s2408258924:8(2589)Online publication date: 18-Apr-2024
  • (2024)NEF-GGCN: Node-Edge Fusion Gated Graph Convolutional Networks For Skeleton-based Medical Action Recognition2024 IEEE 9th International Conference on Data Science in Cyberspace (DSC)10.1109/DSC63484.2024.00020(93-100)Online publication date: 23-Aug-2024
  • (2024)Multi-stream P&U adaptive graph convolutional networks for skeleton-based action recognitionThe Journal of Supercomputing10.1007/s11227-024-05900-980:8(11614-11639)Online publication date: 29-Jan-2024
  • Show More Cited By

Index Terms

  1. A Semantics-Guided Graph Convolutional Network for Skeleton-Based Action Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICIAI '20: Proceedings of the 2020 the 4th International Conference on Innovation in Artificial Intelligence
    May 2020
    271 pages
    ISBN:9781450376587
    DOI:10.1145/3390557
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • The Hong Kong Polytechnic: The Hong Kong Polytechnic University
    • Xi'an Jiaotong-Liverpool University: Xi'an Jiaotong-Liverpool University

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 June 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. action recognition
    2. graph convolution
    3. semantics-guided
    4. skeleton-based

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICIAI 2020

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ST-TGR: Spatio-Temporal Representation Learning for Skeleton-Based Teaching Gesture RecognitionSensors10.3390/s2408258924:8(2589)Online publication date: 18-Apr-2024
    • (2024)NEF-GGCN: Node-Edge Fusion Gated Graph Convolutional Networks For Skeleton-based Medical Action Recognition2024 IEEE 9th International Conference on Data Science in Cyberspace (DSC)10.1109/DSC63484.2024.00020(93-100)Online publication date: 23-Aug-2024
    • (2024)Multi-stream P&U adaptive graph convolutional networks for skeleton-based action recognitionThe Journal of Supercomputing10.1007/s11227-024-05900-980:8(11614-11639)Online publication date: 29-Jan-2024
    • (2023)Deep Learning for Human Activity Recognition on 3D Human Skeleton: Survey and Comparative StudySensors10.3390/s2311512123:11(5121)Online publication date: 27-May-2023
    • (2023)Comprehensive receptive field adaptive graph convolutional networks for action recognitionJournal of Visual Communication and Image Representation10.1016/j.jvcir.2023.10395397(103953)Online publication date: Dec-2023
    • (2023)Algorithm for Human Abnormal Behavior Recognition Based on Improved Spatial Temporal Graph Convolutional NetworksAdvanced Computational Intelligence and Intelligent Informatics10.1007/978-981-99-7593-8_4(29-42)Online publication date: 30-Oct-2023
    • (2022)Pathological-Gait Recognition Using Spatiotemporal Graph Convolutional Networks and Attention ModelSensors10.3390/s2213486322:13(4863)Online publication date: 27-Jun-2022
    • (2022)Skeleton Graph-Neural-Network-Based Human Action Recognition: A SurveySensors10.3390/s2206209122:6(2091)Online publication date: 8-Mar-2022
    • (2021)A Survey on Skeleton-Based Activity Recognition using Graph Convolutional Networks (GCN)2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)10.1109/ISPA52656.2021.9552064(177-182)Online publication date: 13-Sep-2021
    • (2020)Dual Attention-Guided Multiscale Dynamic Aggregate Graph Convolutional Networks for Skeleton-Based Human Action RecognitionSymmetry10.3390/sym1210158912:10(1589)Online publication date: 24-Sep-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media