research-article

SAGN: Semantic Adaptive Graph Network for Skeleton-Based Human Action Recognition

Authors:
Ziwang Fu

Beijing University of Posts and Telecommunications, Beijing, China

Beijing University of Posts and Telecommunications, Beijing, China
View Profile

,
Feng Liu

East China Normal University, Shanghai, China

East China Normal University, Shanghai, China
View Profile

,
Jiahao Zhang

East China Normal University, Shanghai, China

East China Normal University, Shanghai, China
View Profile

,
Hanyang Wang

East China Normal University, Shanghai, China

East China Normal University, Shanghai, China
View Profile

,
Chengyi Yang

Shanghai University of International Business and Economics, Shanghai, China

Shanghai University of International Business and Economics, Shanghai, China
View Profile

,
Qing Xu

Beijing University of Posts and Telecommunications, Beijing, China

Beijing University of Posts and Telecommunications, Beijing, China
View Profile

,
Jiayin Qi

Shanghai University of International Business and Economics, Shanghai, China

Shanghai University of International Business and Economics, Shanghai, China
View Profile

,
Xiangling Fu

Beijing University of Posts and Telecommunications, Beijing, China

Beijing University of Posts and Telecommunications, Beijing, China
View Profile

,
Aimin Zhou

East China Normal University, Shanghai, China

East China Normal University, Shanghai, China
View Profile

ICMR '21: Proceedings of the 2021 International Conference on Multimedia RetrievalAugust 2021Pages 110–117https://doi.org/10.1145/3460426.3463633

Published:01 September 2021Publication History

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

Pages 110–117

ABSTRACT

With the continuous development and popularity of depth cameras, skeleton-based human action recognition has attracted people's wide attention. Graph Convolutional Network (GCN) has achieved remarkable performance. However, the existing methods do not better consider the semantic characteristics, which can help to express the current concept and scene information. Semantic information can also help with better granularity classification. In addition, most of the existing models require a lot of computation. What's more, adaptive GCN can automatically learn the graph structure and consider the connections between joints. In this paper, we propose a relatively less computationally intensive model, which combines semantic and adaptive graph network (SAGN) for skeleton-based human action recognition. Specifically, we mainly combine the dynamic characteristics and bone information to extract the data, taking the correlation between semantics into the model. In the training process, SAGN includes an adaptive network so that we can make attention mechanism more flexible. We design the Convolutional Neural Network (CNN) for feature extraction on the time dimension. The experimental results show that SAGN achieves the state-of-the-art performance on NTU-RGB+D 60 and NTU-RGB+D 120 datasets. SAGN can promote the study of skeleton-based human action recognition. The source code is available at https://github.com/skeletonNN/SAGN.

References

K. Cheng, Y. Zhang, X. He, W. Chen, J. Cheng, and H. Lu. 2020. Skeleton-Based Action Recognition With Shift Graph Convolutional Network. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 180--189. https://doi.org/10.1109/CVPR42600.2020.00026Google Scholar
Sangwoo Cho, Muhammad Hasan Maqbool, Fei Liu, and Hassan Foroosh. 2019. Self-Attention Network for Skeleton-based Human Action Recognition. arxiv: 1912.08435 [cs.CV]Google Scholar
Chris Ellis, Syed Zain, Masood Marshall, F. Tappen, Joseph LaViola, and Rahul Sukthankar. 2012. Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition. International Journal of Computer Vision, Vol. 101 (02 2012). https://doi.org/10.1007/s11263-012-0550--7Google Scholar
Xiang Gao, Wei Hu, Jiaxiang Tang, Jiaying Liu, and Zongming Guo. 2019. Optimized Skeleton-based Action Recognition via Sparsified Graph Regression. (2019). arxiv: 1811.12013 [cs.CV]Google Scholar
J. Hu, W. Zheng, J. Lai, and Jianguo Zhang. 2015. Jointly learning heterogeneous features for RGB-D activity recognition. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5344--5352. https://doi.org/10.1109/CVPR.2015.7299172Google ScholarCross Ref
Q. Ke, M. Bennamoun, S. An, F. Sohel, and F. Boussaid. 2017a. A New Representation of Skeleton Sequences for 3D Action Recognition. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 4570--4579. https://doi.org/10.1109/CVPR.2017.486Google Scholar
Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2017b. A New Representation of Skeleton Sequences for 3D Action Recognition. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Jul 2017). https://doi.org/10.1109/cvpr.2017.486Google ScholarCross Ref
Q. Ke, M. Bennamoun, S. An, F. Sohel, and F. Boussaid. 2018. Learning Clip Representations for Skeleton-Based 3D Action Recognition. IEEE Transactions on Image Processing, Vol. 27, 6 (2018), 2842--2855. https://doi.org/10.1109/TIP.2018.2812099Google ScholarCross Ref
Bo Li, Mingyi He, Xuelian Cheng, Yucheng Chen, and Yuchao Dai. 2017. Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn. (2017). arxiv: 1704.05645 [cs.CV]Google Scholar
Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu. 2018b. Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation. (2018). arxiv: 1804.06055 [cs.CV]Google Scholar
Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2019. Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition. (2019). arxiv: 1904.12659 [cs.CV]Google Scholar
Shuai Li, Wanqing Li, Chris Cook, Ce Zhu, and Yanbo Gao. 2018a. Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN. (2018). arxiv: 1803.04831 [cs.CV]Google Scholar
Y. Li, R. Xia, X. Liu, and Q. Huang. 2019. Learning Shape-Motion Representations from Geometric Algebra Spatio-Temporal Model for Skeleton-Based Action Recognition. In 2019 IEEE International Conference on Multimedia and Expo (ICME). 1066--1071. https://doi.org/10.1109/ICME.2019.00187Google ScholarCross Ref
Jun Liu, Amir Shahroudy, Mauricio Perez, Gang Wang, Ling-Yu Duan, and Alex C. Kot. 2020. NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 42, 10 (Oct 2020), 2684--2701. https://doi.org/10.1109/tpami.2019.2916873Google ScholarDigital Library
Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang. 2016. Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition. (2016).Google Scholar
Jun Liu, Gang Wang, Ling-Yu Duan, Kamila Abdiyeva, and Alex C. Kot. 2018. Skeleton-Based Human Action Recognition with Global Context-Aware Attention LSTM Networks. (2018). arxiv: 1707.05740 [cs.CV]Google Scholar
J. Liu, G. Wang, P. Hu, L. Duan, and A. C. Kot. 2017. Global Context-Aware Attention LSTM Networks for 3D Action Recognition. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3671--3680. https://doi.org/10.1109/CVPR.2017.391Google ScholarCross Ref
M. Liu and J. Yuan. 2018. Recognizing Human Actions as the Evolution of Pose Estimation Maps. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1159--1168. https://doi.org/10.1109/CVPR.2018.00127Google Scholar
Z. Liu, H. Zhang, Z. Chen, Z. Wang, and W. Ouyang. 2020. Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 140--149. https://doi.org/10.1109/CVPR42600.2020.00022Google Scholar
Wei Peng, Xiaopeng Hong, Haoyu Chen, and Guoying Zhao. 2019. Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching. (2019). arxiv: 1911.04131 [cs.CV]Google Scholar
Chiara Plizzari, Marco Cannici, and Matteo Matteucci. 2020. Skeleton-based Action Recognition via Spatial and Temporal Transformer Networks. arxiv: 2008.07404 [cs.CV]Google Scholar
Bin Ren, Mengyuan Liu, Runwei Ding, and Hong Liu. 2020. A Survey on 3D Skeleton-Based Action Recognition Using Learning Method. (2020). arxiv: 2002.05907 [cs.CV]Google Scholar
Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. (2016). arxiv: 1604.02808 [cs.CV]Google Scholar
Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Skeleton-Based Action Recognition With Directed Graph Neural Networks. (June 2019).Google Scholar
L. Shi, Y. Zhang, J. Cheng, and H. Lu. 2019. Skeleton-Based Action Recognition With Directed Graph Neural Networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7904--7913. https://doi.org/10.1109/CVPR.2019.00810Google Scholar
Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. (2019). arxiv: 1805.07694 [cs.CV]Google Scholar
Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, and Tieniu Tan. 2019. An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition. (2019). arxiv: 1902.09130 [cs.CV]Google Scholar
Chenyang Si, Ya Jing, Wei Wang, Liang Wang, and Tieniu Tan. 2018. Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning. (2018). arxiv: 1805.02335 [cs.CV]Google Scholar
Wataru Takano and Yoshihiko Nakamura. 2015. Statistical mutual conversion between whole body motion primitives and linguistic sentences for human motions. International Journal of Robotics Research, Vol. 34 (09 2015), 1314--1328. https://doi.org/10.1177/0278364915587923Google ScholarDigital Library
Y. Tang, Y. Tian, J. Lu, P. Li, and J. Zhou. 2018. Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5323--5332. https://doi.org/10.1109/CVPR.2018.00558Google Scholar
Hongsong Wang and Liang Wang. 2017. Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks. (2017). arxiv: 1704.02581 [cs.CV]Google Scholar
Lei Wang, Du Q. Huynh, and Piotr Koniusz. 2020. A Comparative Review of Recent Kinect-Based Action Recognition Algorithms. IEEE Transactions on Image Processing, Vol. 29 (2020), 15--28. https://doi.org/10.1109/tip.2019.2925285Google ScholarCross Ref
Pichao Wang, Wanqing Li, Philip Ogunbona, Jun Wan, and Sergio Escalera. 2018. RGB-D-based Human Motion Recognition with Deep Learning: A Survey. (2018). arxiv: 1711.08362 [cs.CV]Google Scholar
Xiaogang Wang. 2013. Intelligent Multi-Camera Video Surveillance: A Review. Pattern Recogn. Lett., Vol. 34, 1 (Jan. 2013), 3--19. https://doi.org/10.1016/j.patrec.2012.07.005Google ScholarDigital Library
Chunyu Xie, Ce Li, Baochang Zhang, Chen Chen, Jungong Han, Changqing Zou, and Jianzhuang Liu. 2018. Memory Attention Networks for Skeleton-based Action Recognition. (2018). arxiv: 1804.08254 [cs.CV]Google Scholar
Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. (2018). arxiv: 1801.07455 [cs.CV]Google Scholar
Pengfei Zhang, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jianru Xue, and Nanning Zheng. 2017. View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data. arxiv: 1703.08274 [cs.CV]Google Scholar
P. Zhang, C. Lan, J. Xing, W. Zeng, J. Xue, and N. Zheng. 2019. View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 8 (2019), 1963--1978. https://doi.org/10.1109/TPAMI.2019.2896631Google ScholarCross Ref
Pengfei Zhang, Cuiling Lan, Wenjun Zeng, Junliang Xing, Jianru Xue, and Nanning Zheng. 2020. Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition. (2020). arxiv: 1904.01189 [cs.CV]Google Scholar
Pengfei Zhang, Jianru Xue, Cuiling Lan, Wenjun Zeng, Zhanning Gao, and Nanning Zheng. 2018. Adding Attentiveness to the Neurons in Recurrent Neural Networks. In Computer Vision -- ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 136--152.Google ScholarDigital Library
Wenhao Zhang, Melvyn L. Smith, Lyndon N. Smith, and Abdul Farooq. 2016. Gender and Gaze Gesture Recognition for Human-Computer Interaction. Comput. Vis. Image Underst., Vol. 149, C (Aug. 2016), 32--50. https://doi.org/10.1016/j.cviu.2016.03.014Google ScholarDigital Library

Index Terms

SAGN: Semantic Adaptive Graph Network for Skeleton-Based Human Action Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding

Recommendations

Multi-scale skeleton adaptive weighted GCN for skeleton-based human action recognition in IoT
Abstract
Skeleton-based human action recognition has become a hot topic due to its potential advantages. Graph convolution network (GCN) has obtained remarkable performances in the modeling of skeleton-based human action recognition in IoT. In ...
Highlights
- We propose a multi-scale skeleton graph convolution network for skeleton-based human action recognition in IoT.
Read More
IGFormer: Interaction Graph Transformer for Skeleton-Based Human Interaction Recognition
Computer Vision – ECCV 2022
Abstract
Human interaction recognition is very important in many applications. One crucial cue in recognizing an interaction is the interactive body parts. In this work, we propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based ...
Read More
Semantic-guided multi-scale human skeleton action recognition
Abstract
With the development of depth sensors and pose estimation algorithms, action recognition technology based on the human skeleton has attracted wide attention from researchers. The human skeleton action recognition methods embedded with semantic ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval
August 2021
715 pages
ISBN:9781450384636
DOI:10.1145/3460426
General Chairs:
Wen-Huang Cheng
National Yang Ming Chiao Tung University, Taiwan
,
Mohan Kankanhalli
National University of Singapore, Singapore
,
Meng Wang
Hefei University of Technology, China
,
Program Chairs:
Wei-Ta Chu
National Cheng Kung University, Taiwan
,
Jiaying Liu
Peking University, China
,
Marcel Worring
University of Amsterdam, Netherlands
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
adaptive GCN
data fusion
semantic information
skeleton-based human action recognition
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate254of830submissions,31%
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 262
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SAGN: Semantic Adaptive Graph Network for Skeleton-Based Human Action Recognition

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-scale skeleton adaptive weighted GCN for skeleton-based human action recognition in IoT

IGFormer: Interaction Graph Transformer for Skeleton-Based Human Interaction Recognition

Semantic-guided multi-scale human skeleton action recognition