ABSTRACT
With the continuous development and popularity of depth cameras, skeleton-based human action recognition has attracted people's wide attention. Graph Convolutional Network (GCN) has achieved remarkable performance. However, the existing methods do not better consider the semantic characteristics, which can help to express the current concept and scene information. Semantic information can also help with better granularity classification. In addition, most of the existing models require a lot of computation. What's more, adaptive GCN can automatically learn the graph structure and consider the connections between joints. In this paper, we propose a relatively less computationally intensive model, which combines semantic and adaptive graph network (SAGN) for skeleton-based human action recognition. Specifically, we mainly combine the dynamic characteristics and bone information to extract the data, taking the correlation between semantics into the model. In the training process, SAGN includes an adaptive network so that we can make attention mechanism more flexible. We design the Convolutional Neural Network (CNN) for feature extraction on the time dimension. The experimental results show that SAGN achieves the state-of-the-art performance on NTU-RGB+D 60 and NTU-RGB+D 120 datasets. SAGN can promote the study of skeleton-based human action recognition. The source code is available at https://github.com/skeletonNN/SAGN.
- K. Cheng, Y. Zhang, X. He, W. Chen, J. Cheng, and H. Lu. 2020. Skeleton-Based Action Recognition With Shift Graph Convolutional Network. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 180--189. https://doi.org/10.1109/CVPR42600.2020.00026Google Scholar
- Sangwoo Cho, Muhammad Hasan Maqbool, Fei Liu, and Hassan Foroosh. 2019. Self-Attention Network for Skeleton-based Human Action Recognition. arxiv: 1912.08435 [cs.CV]Google Scholar
- Chris Ellis, Syed Zain, Masood Marshall, F. Tappen, Joseph LaViola, and Rahul Sukthankar. 2012. Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition. International Journal of Computer Vision, Vol. 101 (02 2012). https://doi.org/10.1007/s11263-012-0550--7Google Scholar
- Xiang Gao, Wei Hu, Jiaxiang Tang, Jiaying Liu, and Zongming Guo. 2019. Optimized Skeleton-based Action Recognition via Sparsified Graph Regression. (2019). arxiv: 1811.12013 [cs.CV]Google Scholar
- J. Hu, W. Zheng, J. Lai, and Jianguo Zhang. 2015. Jointly learning heterogeneous features for RGB-D activity recognition. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5344--5352. https://doi.org/10.1109/CVPR.2015.7299172Google ScholarCross Ref
- Q. Ke, M. Bennamoun, S. An, F. Sohel, and F. Boussaid. 2017a. A New Representation of Skeleton Sequences for 3D Action Recognition. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 4570--4579. https://doi.org/10.1109/CVPR.2017.486Google Scholar
- Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2017b. A New Representation of Skeleton Sequences for 3D Action Recognition. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Jul 2017). https://doi.org/10.1109/cvpr.2017.486Google ScholarCross Ref
- Q. Ke, M. Bennamoun, S. An, F. Sohel, and F. Boussaid. 2018. Learning Clip Representations for Skeleton-Based 3D Action Recognition. IEEE Transactions on Image Processing, Vol. 27, 6 (2018), 2842--2855. https://doi.org/10.1109/TIP.2018.2812099Google ScholarCross Ref
- Bo Li, Mingyi He, Xuelian Cheng, Yucheng Chen, and Yuchao Dai. 2017. Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn. (2017). arxiv: 1704.05645 [cs.CV]Google Scholar
- Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu. 2018b. Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation. (2018). arxiv: 1804.06055 [cs.CV]Google Scholar
- Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2019. Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition. (2019). arxiv: 1904.12659 [cs.CV]Google Scholar
- Shuai Li, Wanqing Li, Chris Cook, Ce Zhu, and Yanbo Gao. 2018a. Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN. (2018). arxiv: 1803.04831 [cs.CV]Google Scholar
- Y. Li, R. Xia, X. Liu, and Q. Huang. 2019. Learning Shape-Motion Representations from Geometric Algebra Spatio-Temporal Model for Skeleton-Based Action Recognition. In 2019 IEEE International Conference on Multimedia and Expo (ICME). 1066--1071. https://doi.org/10.1109/ICME.2019.00187Google ScholarCross Ref
- Jun Liu, Amir Shahroudy, Mauricio Perez, Gang Wang, Ling-Yu Duan, and Alex C. Kot. 2020. NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 42, 10 (Oct 2020), 2684--2701. https://doi.org/10.1109/tpami.2019.2916873Google ScholarDigital Library
- Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang. 2016. Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition. (2016).Google Scholar
- Jun Liu, Gang Wang, Ling-Yu Duan, Kamila Abdiyeva, and Alex C. Kot. 2018. Skeleton-Based Human Action Recognition with Global Context-Aware Attention LSTM Networks. (2018). arxiv: 1707.05740 [cs.CV]Google Scholar
- J. Liu, G. Wang, P. Hu, L. Duan, and A. C. Kot. 2017. Global Context-Aware Attention LSTM Networks for 3D Action Recognition. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3671--3680. https://doi.org/10.1109/CVPR.2017.391Google ScholarCross Ref
- M. Liu and J. Yuan. 2018. Recognizing Human Actions as the Evolution of Pose Estimation Maps. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1159--1168. https://doi.org/10.1109/CVPR.2018.00127Google Scholar
- Z. Liu, H. Zhang, Z. Chen, Z. Wang, and W. Ouyang. 2020. Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 140--149. https://doi.org/10.1109/CVPR42600.2020.00022Google Scholar
- Wei Peng, Xiaopeng Hong, Haoyu Chen, and Guoying Zhao. 2019. Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching. (2019). arxiv: 1911.04131 [cs.CV]Google Scholar
- Chiara Plizzari, Marco Cannici, and Matteo Matteucci. 2020. Skeleton-based Action Recognition via Spatial and Temporal Transformer Networks. arxiv: 2008.07404 [cs.CV]Google Scholar
- Bin Ren, Mengyuan Liu, Runwei Ding, and Hong Liu. 2020. A Survey on 3D Skeleton-Based Action Recognition Using Learning Method. (2020). arxiv: 2002.05907 [cs.CV]Google Scholar
- Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. (2016). arxiv: 1604.02808 [cs.CV]Google Scholar
- Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Skeleton-Based Action Recognition With Directed Graph Neural Networks. (June 2019).Google Scholar
- L. Shi, Y. Zhang, J. Cheng, and H. Lu. 2019. Skeleton-Based Action Recognition With Directed Graph Neural Networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7904--7913. https://doi.org/10.1109/CVPR.2019.00810Google Scholar
- Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. (2019). arxiv: 1805.07694 [cs.CV]Google Scholar
- Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, and Tieniu Tan. 2019. An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition. (2019). arxiv: 1902.09130 [cs.CV]Google Scholar
- Chenyang Si, Ya Jing, Wei Wang, Liang Wang, and Tieniu Tan. 2018. Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning. (2018). arxiv: 1805.02335 [cs.CV]Google Scholar
- Wataru Takano and Yoshihiko Nakamura. 2015. Statistical mutual conversion between whole body motion primitives and linguistic sentences for human motions. International Journal of Robotics Research, Vol. 34 (09 2015), 1314--1328. https://doi.org/10.1177/0278364915587923Google ScholarDigital Library
- Y. Tang, Y. Tian, J. Lu, P. Li, and J. Zhou. 2018. Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5323--5332. https://doi.org/10.1109/CVPR.2018.00558Google Scholar
- Hongsong Wang and Liang Wang. 2017. Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks. (2017). arxiv: 1704.02581 [cs.CV]Google Scholar
- Lei Wang, Du Q. Huynh, and Piotr Koniusz. 2020. A Comparative Review of Recent Kinect-Based Action Recognition Algorithms. IEEE Transactions on Image Processing, Vol. 29 (2020), 15--28. https://doi.org/10.1109/tip.2019.2925285Google ScholarCross Ref
- Pichao Wang, Wanqing Li, Philip Ogunbona, Jun Wan, and Sergio Escalera. 2018. RGB-D-based Human Motion Recognition with Deep Learning: A Survey. (2018). arxiv: 1711.08362 [cs.CV]Google Scholar
- Xiaogang Wang. 2013. Intelligent Multi-Camera Video Surveillance: A Review. Pattern Recogn. Lett., Vol. 34, 1 (Jan. 2013), 3--19. https://doi.org/10.1016/j.patrec.2012.07.005Google ScholarDigital Library
- Chunyu Xie, Ce Li, Baochang Zhang, Chen Chen, Jungong Han, Changqing Zou, and Jianzhuang Liu. 2018. Memory Attention Networks for Skeleton-based Action Recognition. (2018). arxiv: 1804.08254 [cs.CV]Google Scholar
- Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. (2018). arxiv: 1801.07455 [cs.CV]Google Scholar
- Pengfei Zhang, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jianru Xue, and Nanning Zheng. 2017. View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data. arxiv: 1703.08274 [cs.CV]Google Scholar
- P. Zhang, C. Lan, J. Xing, W. Zeng, J. Xue, and N. Zheng. 2019. View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 8 (2019), 1963--1978. https://doi.org/10.1109/TPAMI.2019.2896631Google ScholarCross Ref
- Pengfei Zhang, Cuiling Lan, Wenjun Zeng, Junliang Xing, Jianru Xue, and Nanning Zheng. 2020. Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition. (2020). arxiv: 1904.01189 [cs.CV]Google Scholar
- Pengfei Zhang, Jianru Xue, Cuiling Lan, Wenjun Zeng, Zhanning Gao, and Nanning Zheng. 2018. Adding Attentiveness to the Neurons in Recurrent Neural Networks. In Computer Vision -- ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 136--152.Google ScholarDigital Library
- Wenhao Zhang, Melvyn L. Smith, Lyndon N. Smith, and Abdul Farooq. 2016. Gender and Gaze Gesture Recognition for Human-Computer Interaction. Comput. Vis. Image Underst., Vol. 149, C (Aug. 2016), 32--50. https://doi.org/10.1016/j.cviu.2016.03.014Google ScholarDigital Library
Index Terms
- SAGN: Semantic Adaptive Graph Network for Skeleton-Based Human Action Recognition
Recommendations
Multi-scale skeleton adaptive weighted GCN for skeleton-based human action recognition in IoT
AbstractSkeleton-based human action recognition has become a hot topic due to its potential advantages. Graph convolution network (GCN) has obtained remarkable performances in the modeling of skeleton-based human action recognition in IoT. In ...
Highlights- We propose a multi-scale skeleton graph convolution network for skeleton-based human action recognition in IoT.
IGFormer: Interaction Graph Transformer for Skeleton-Based Human Interaction Recognition
Computer Vision – ECCV 2022AbstractHuman interaction recognition is very important in many applications. One crucial cue in recognizing an interaction is the interactive body parts. In this work, we propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based ...
Semantic-guided multi-scale human skeleton action recognition
AbstractWith the development of depth sensors and pose estimation algorithms, action recognition technology based on the human skeleton has attracted wide attention from researchers. The human skeleton action recognition methods embedded with semantic ...
Comments