skip to main content
10.1145/3581783.3612280acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Self-Relational Graph Convolution Network for Skeleton-Based Action Recognition

Published: 27 October 2023 Publication History

Abstract

Using a Graph convolution network (GCN) for constructing and aggregating node features has been helpful for skeleton-based action recognition. The strength of the nodes' relation of an action sequence distinguishes it from other actions. This work proposes a novel spatial module called Multi-scale self-relational graph convolution (MS-SRGC) for dynamically modeling joint relations of action instances. Modeling the joints' relations is crucial in determining the spatial distinctiveness between skeleton sequences; hence MS-SRGC shows effectiveness for activity recognition. We also propose a Hybrid multi-scale temporal convolution network (HMS-TCN) that captures different ranges of time steps along the temporal dimension of the skeleton sequence. In addition, we propose a Spatio-temporal blackout (STB) module that randomly zeroes some continue frames for selected strategic joint groups. We sequentially stack our spatial (MS-SRGC) and temporal (HMS-TCN) modules to form a Self-relational graph convolution network (SR-GCN) block, which we use to construct our SR-GCN model. We append our STB on the SR-GCN model top for the randomized operation. With the effectiveness of ensemble networks, we perform extensive experiments on single and multiple ensembles. Our results beat the state-of-the-art methods on the NTU RGB-D, NTU RGB-D 120, and Northwestern-UCLA datasets.

Supplemental Material

MP4 File
This talk specifically presents the research scope of our paper and describes the issues existing in this area, followed by a succinct explanation of our approach and experimental results and the principal contributions of our paper.

References

[1]
Sami Abu-El-Haija, Bryan Perozzi, Amol Kapoor, Nazanin Alipourfard, Kristina Lerman, Hrayr Harutyunyan, Greg Ver Steeg, and Aram Galstyan. 2019. Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing. In International Conference on Machine Learning. 21--29.
[2]
Yuxin Chen, Ziqi Zhang, Chunfeng Yuan, Bing Li, Ying Deng, and Weiming Hu. 2021. Channel-wise topology refinement graph convolution for skeleton-based action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13359--13368.
[3]
Ke Cheng, Yifan Zhang, Congqi Cao, Lei Shi, Jian Cheng, and Hanqing Lu. 2020. Decoupling gcn with dropgraph module for skeleton-based action recognition. In Proceedings of the European Conference on Computer Vision. 536--553.
[4]
Ke Cheng, Yifan Zhang, Xiangyu He, Weihan Chen, Jian Cheng, and Hanqing Lu. 2020. Skeleton-based action recognition with shift graph convolutional network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 183--192.
[5]
Ke Cheng, Yifan Zhang, Xiangyu He, Jian Cheng, and Hanqing Lu. 2021. Extremely lightweight skeleton-based action recognition with shiftgcn. IEEE Transactions on Image Processing 30, 1 (2021), 7333--7348.
[6]
Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1110--1118.
[7]
Kumie Gedamu, Yanli Ji, LingLing Gao, Yang Yang, and Heng Tao Shen. 2023. Relation-mining self-attention network for skeleton-based human action recognition. Pattern Recognition 139 (2023), 109455.
[8]
Kumie Gedamu, Yanli Ji, Yang Yang, LingLing Gao, and Heng Tao Shen. 2021. Arbitrary-view human action recognition via novel-view action generation. Pattern Recognition 118 (2021), 108043.
[9]
Hyung gun Chi, Myoung Hoon Ha, Seunggeun Chi, SangWan Lee, Qixing Huang, and Karthik Ramani. 2022. Infogcn: Representation learning for human skeletonbased action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20186--20196.
[10]
Xiaoke Hao, Jie Li, Yingchun Guo, Tao Jiang, and Ming Yu. 2021. Hypergraph neural network for skeleton-based action recognition. IEEE Transactions on Image Processing 30, 1 (2021), 2263--2275.
[11]
Ruijie Hou, Yanran Li, Ningyu Zhang, Yulin Zhou, Xiaosong Yang, and Zhao Wang. 2022. Shifting perspective to see difference: A novel multi-View method for skeleton based action recognition. In Proceedings of the 30th ACM International Conference on Multimedia. 4987--4995.
[12]
Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2018. Learning clip representations for skeleton-based 3d action recognition. IEEE Transactions on Image Processing 27, 1 (2018), 2842--2855.
[13]
Matthew Korban and Xin Li. 2020. DDGCN: A dynamic directed graph convolutional network for action recognition. In Proceedings of the European Conference on Computer Vision. 761--776.
[14]
Inwoong Lee, Doyoung Kim, Seoungyoon Kang, and Sanghoon Lee. 2017. Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1012--1020.
[15]
Jungho Lee, Minhyeok Lee, Dogyoon Lee, and Sangyoon Lee. [n. d.]. Hierarchically decomposed graph convolutional networks for skeleton-based action recognition. arXiv:2208.10741 ([n. d.]).
[16]
Bo Li, Yuchao Dai, Xuelian Cheng, Huahui Chen, Yi Lin, and Mingyi He. 2017. Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. In IEEE ICME. 601--604.
[17]
Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu. 2018. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In International Joint Conference on Artificial Intelligence. 786--792.
[18]
Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2019. Actional-structural graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3595--3603.
[19]
Shuai Li, Wanqing Li, Chris Cook, Ce Zhu, and Yanbo Gao. 2018. Independently recurrent neural network (IndRNN): Building a longer and deeper RNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5457--5466.
[20]
Jun Liu, Amir Shahroudy, Mauricio Perez, GangWang, Ling-Yu Duan, and Alex C. Kot. 2019. Ntu rgb d 120: A large-scale benchmark for 3d human activity understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 10 (2019), 2684--2701.
[21]
Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang. 2016. Spatio-temporal lstm with trust gates for 3d human action recognition. In Proceedings of the European Conference on Computer Vision. 816--833.
[22]
Jun Liu, Gang Wang, Ling-Yu Duan, Kamila Abdiyeva, and Alex C. Kot. 2017. Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Transactions on Image Processing 27, 1 (2017), 1586--1599.
[23]
Mengyuan Liu, Hong Liu, and Chen Chen. 2017. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition 68, 1 (2017), 346--362.
[24]
Ziyu Liu, Hongwen Zhang, Zhenghao Chen, Zhiyong Wang, and Wanli Ouyang. 2020. Disentangling and unifying graph convolutions for skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 143--152.
[25]
Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. Ntu rgb d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1010--1019.
[26]
Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Skeleton-based action recognition with directed graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7912--7921.
[27]
Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12026--12035.
[28]
Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2020. Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Transactions on Image Processing 29, 1 (2020), 9532--9545.
[29]
Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, and Tieniu Tan. 2019. An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1227--1236.
[30]
Chenyang Si, Ya Jing, Wei Wang, Liang Wang, and Tieniu Tan. 2018. Skeletonbased action recognition with spatial reasoning and temporal stack learning. In Proceedings of the European Conference on Computer Vision. 103--118.
[31]
Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, and Jiaying Liu. 2017. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In Proceedings of the AAAI Conference on Artificial Intelligence.
[32]
Yi-Fan Song, Zhang Zhang, Caifeng Shan, and Liang Wang. 2020. Stronger, faster and more explainable: A graph convolutional baseline for skeleton-based action recognition. In Proceedings of the 28th ACM International Conference on Multimedia. 1625--1633.
[33]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958.
[34]
Bin Sun, Dehui Kong, Shaofan Wang, Jinghua Li, Baocai Yin, and Xiaonan Luo. 2022. GAN for vision, KG for relation: A two-stage network for zero-shot action recognition. Pattern Recognition 126 (2022), 108563.
[35]
Vivek Veeriah, Naifan Zhuang, and Guo-Jun Qi. 2015. Differential recurrent neural networks for action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4041--4049.
[36]
Hongsong Wang and Liang Wang. 2017. Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 499--508.
[37]
Jiang Wang, Zicheng Liu, Ying Wu, and Junsong Yuan. 2013. Learning actionlet ensemble for 3D human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 5 (2013), 914--927.
[38]
Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, and Song-Chun Zhu. 2014. Crossview action modeling, learning and recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2649--2656.
[39]
Xuanhan Wang, Yan Dai, Lianli Gao, and Jingkuan Song. 2022. Skeleton-based action recognition via adaptive cross-form learning. In Proceedings of the 30th ACM International Conference on Multimedia. 1670--1678.
[40]
Xiaolong Wang and Abhinav Gupta. 2018. Videos as space-time region graphs. In Proceedings of the European Conference on Computer Vision. 399--417.
[41]
Max Welling and Thomas N. Kipf. 2017. Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations.
[42]
Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. 2019. Simplifying graph convolutional networks. In International Conference on Machine Learning. 6861--6871.
[43]
Liyu Wu, Can Zhang, and Yuexian Zou. 2023. Spatio temporal focus for skeleton based action recognition. Pattern Recognition 136 (2023), 109231.
[44]
Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence.
[45]
Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence Conference on Artificial Intelligence.
[46]
Hao Yang, Dan Yan, Li Zhang, Yunda Sun, Dong Li, and Stephen J. Maybank. 2021. Feedback graph convolutional network for skeleton-based action recognition. IEEE Transactions on Image Processing 31, 1 (2021), 164--175.
[47]
Fanfan Ye, Shiliang Pu, Qiaoyong Zhong, Chao Li, Di Xie, and Huiming Tang. 2020. Dynamic GCN: Context-enriched topology learning for skeleton-based action recognition. In ACM MM. 55--63.
[48]
Pengfei Zhang, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jianru Xue, and Nanning Zheng. 2017. View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2117--2126.
[49]
Pengfei Zhang, Cuiling Lan, Wenjun Zeng, Junliang Xing, Jianru Xue, and Nanning Zheng. 2020. Semantics-guided neural networks for efficient skeleton-based human action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1112--1121.

Cited By

View all
  • (2024)Towards Multi-view Consistent Graph DiffusionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681258(186-195)Online publication date: 28-Oct-2024
  • (2024)Localized Linear Temporal Dynamics for Self-Supervised Skeleton Action RecognitionIEEE Transactions on Multimedia10.1109/TMM.2024.340571226(10189-10199)Online publication date: 1-Jan-2024
  • (2024)Multi-scale sampling attention graph convolutional networks for skeleton-based action recognitionNeurocomputing10.1016/j.neucom.2024.128086597(128086)Online publication date: Sep-2024

Index Terms

  1. Self-Relational Graph Convolution Network for Skeleton-Based Action Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. self-relational graph
    2. skeleton-based action recognition
    3. spatio-temporal blackout
    4. temporal convolution

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)268
    • Downloads (Last 6 weeks)13
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Towards Multi-view Consistent Graph DiffusionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681258(186-195)Online publication date: 28-Oct-2024
    • (2024)Localized Linear Temporal Dynamics for Self-Supervised Skeleton Action RecognitionIEEE Transactions on Multimedia10.1109/TMM.2024.340571226(10189-10199)Online publication date: 1-Jan-2024
    • (2024)Multi-scale sampling attention graph convolutional networks for skeleton-based action recognitionNeurocomputing10.1016/j.neucom.2024.128086597(128086)Online publication date: Sep-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media