skip to main content
10.1145/3460426.3463633acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

SAGN: Semantic Adaptive Graph Network for Skeleton-Based Human Action Recognition

Authors Info & Claims
Published:01 September 2021Publication History

ABSTRACT

With the continuous development and popularity of depth cameras, skeleton-based human action recognition has attracted people's wide attention. Graph Convolutional Network (GCN) has achieved remarkable performance. However, the existing methods do not better consider the semantic characteristics, which can help to express the current concept and scene information. Semantic information can also help with better granularity classification. In addition, most of the existing models require a lot of computation. What's more, adaptive GCN can automatically learn the graph structure and consider the connections between joints. In this paper, we propose a relatively less computationally intensive model, which combines semantic and adaptive graph network (SAGN) for skeleton-based human action recognition. Specifically, we mainly combine the dynamic characteristics and bone information to extract the data, taking the correlation between semantics into the model. In the training process, SAGN includes an adaptive network so that we can make attention mechanism more flexible. We design the Convolutional Neural Network (CNN) for feature extraction on the time dimension. The experimental results show that SAGN achieves the state-of-the-art performance on NTU-RGB+D 60 and NTU-RGB+D 120 datasets. SAGN can promote the study of skeleton-based human action recognition. The source code is available at https://github.com/skeletonNN/SAGN.

References

  1. K. Cheng, Y. Zhang, X. He, W. Chen, J. Cheng, and H. Lu. 2020. Skeleton-Based Action Recognition With Shift Graph Convolutional Network. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 180--189. https://doi.org/10.1109/CVPR42600.2020.00026Google ScholarGoogle Scholar
  2. Sangwoo Cho, Muhammad Hasan Maqbool, Fei Liu, and Hassan Foroosh. 2019. Self-Attention Network for Skeleton-based Human Action Recognition. arxiv: 1912.08435 [cs.CV]Google ScholarGoogle Scholar
  3. Chris Ellis, Syed Zain, Masood Marshall, F. Tappen, Joseph LaViola, and Rahul Sukthankar. 2012. Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition. International Journal of Computer Vision, Vol. 101 (02 2012). https://doi.org/10.1007/s11263-012-0550--7Google ScholarGoogle Scholar
  4. Xiang Gao, Wei Hu, Jiaxiang Tang, Jiaying Liu, and Zongming Guo. 2019. Optimized Skeleton-based Action Recognition via Sparsified Graph Regression. (2019). arxiv: 1811.12013 [cs.CV]Google ScholarGoogle Scholar
  5. J. Hu, W. Zheng, J. Lai, and Jianguo Zhang. 2015. Jointly learning heterogeneous features for RGB-D activity recognition. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5344--5352. https://doi.org/10.1109/CVPR.2015.7299172Google ScholarGoogle ScholarCross RefCross Ref
  6. Q. Ke, M. Bennamoun, S. An, F. Sohel, and F. Boussaid. 2017a. A New Representation of Skeleton Sequences for 3D Action Recognition. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 4570--4579. https://doi.org/10.1109/CVPR.2017.486Google ScholarGoogle Scholar
  7. Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2017b. A New Representation of Skeleton Sequences for 3D Action Recognition. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Jul 2017). https://doi.org/10.1109/cvpr.2017.486Google ScholarGoogle ScholarCross RefCross Ref
  8. Q. Ke, M. Bennamoun, S. An, F. Sohel, and F. Boussaid. 2018. Learning Clip Representations for Skeleton-Based 3D Action Recognition. IEEE Transactions on Image Processing, Vol. 27, 6 (2018), 2842--2855. https://doi.org/10.1109/TIP.2018.2812099Google ScholarGoogle ScholarCross RefCross Ref
  9. Bo Li, Mingyi He, Xuelian Cheng, Yucheng Chen, and Yuchao Dai. 2017. Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn. (2017). arxiv: 1704.05645 [cs.CV]Google ScholarGoogle Scholar
  10. Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu. 2018b. Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation. (2018). arxiv: 1804.06055 [cs.CV]Google ScholarGoogle Scholar
  11. Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2019. Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition. (2019). arxiv: 1904.12659 [cs.CV]Google ScholarGoogle Scholar
  12. Shuai Li, Wanqing Li, Chris Cook, Ce Zhu, and Yanbo Gao. 2018a. Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN. (2018). arxiv: 1803.04831 [cs.CV]Google ScholarGoogle Scholar
  13. Y. Li, R. Xia, X. Liu, and Q. Huang. 2019. Learning Shape-Motion Representations from Geometric Algebra Spatio-Temporal Model for Skeleton-Based Action Recognition. In 2019 IEEE International Conference on Multimedia and Expo (ICME). 1066--1071. https://doi.org/10.1109/ICME.2019.00187Google ScholarGoogle ScholarCross RefCross Ref
  14. Jun Liu, Amir Shahroudy, Mauricio Perez, Gang Wang, Ling-Yu Duan, and Alex C. Kot. 2020. NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 42, 10 (Oct 2020), 2684--2701. https://doi.org/10.1109/tpami.2019.2916873Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang. 2016. Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition. (2016).Google ScholarGoogle Scholar
  16. Jun Liu, Gang Wang, Ling-Yu Duan, Kamila Abdiyeva, and Alex C. Kot. 2018. Skeleton-Based Human Action Recognition with Global Context-Aware Attention LSTM Networks. (2018). arxiv: 1707.05740 [cs.CV]Google ScholarGoogle Scholar
  17. J. Liu, G. Wang, P. Hu, L. Duan, and A. C. Kot. 2017. Global Context-Aware Attention LSTM Networks for 3D Action Recognition. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3671--3680. https://doi.org/10.1109/CVPR.2017.391Google ScholarGoogle ScholarCross RefCross Ref
  18. M. Liu and J. Yuan. 2018. Recognizing Human Actions as the Evolution of Pose Estimation Maps. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1159--1168. https://doi.org/10.1109/CVPR.2018.00127Google ScholarGoogle Scholar
  19. Z. Liu, H. Zhang, Z. Chen, Z. Wang, and W. Ouyang. 2020. Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 140--149. https://doi.org/10.1109/CVPR42600.2020.00022Google ScholarGoogle Scholar
  20. Wei Peng, Xiaopeng Hong, Haoyu Chen, and Guoying Zhao. 2019. Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching. (2019). arxiv: 1911.04131 [cs.CV]Google ScholarGoogle Scholar
  21. Chiara Plizzari, Marco Cannici, and Matteo Matteucci. 2020. Skeleton-based Action Recognition via Spatial and Temporal Transformer Networks. arxiv: 2008.07404 [cs.CV]Google ScholarGoogle Scholar
  22. Bin Ren, Mengyuan Liu, Runwei Ding, and Hong Liu. 2020. A Survey on 3D Skeleton-Based Action Recognition Using Learning Method. (2020). arxiv: 2002.05907 [cs.CV]Google ScholarGoogle Scholar
  23. Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. (2016). arxiv: 1604.02808 [cs.CV]Google ScholarGoogle Scholar
  24. Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Skeleton-Based Action Recognition With Directed Graph Neural Networks. (June 2019).Google ScholarGoogle Scholar
  25. L. Shi, Y. Zhang, J. Cheng, and H. Lu. 2019. Skeleton-Based Action Recognition With Directed Graph Neural Networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7904--7913. https://doi.org/10.1109/CVPR.2019.00810Google ScholarGoogle Scholar
  26. Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. (2019). arxiv: 1805.07694 [cs.CV]Google ScholarGoogle Scholar
  27. Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, and Tieniu Tan. 2019. An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition. (2019). arxiv: 1902.09130 [cs.CV]Google ScholarGoogle Scholar
  28. Chenyang Si, Ya Jing, Wei Wang, Liang Wang, and Tieniu Tan. 2018. Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning. (2018). arxiv: 1805.02335 [cs.CV]Google ScholarGoogle Scholar
  29. Wataru Takano and Yoshihiko Nakamura. 2015. Statistical mutual conversion between whole body motion primitives and linguistic sentences for human motions. International Journal of Robotics Research, Vol. 34 (09 2015), 1314--1328. https://doi.org/10.1177/0278364915587923Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Tang, Y. Tian, J. Lu, P. Li, and J. Zhou. 2018. Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5323--5332. https://doi.org/10.1109/CVPR.2018.00558Google ScholarGoogle Scholar
  31. Hongsong Wang and Liang Wang. 2017. Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks. (2017). arxiv: 1704.02581 [cs.CV]Google ScholarGoogle Scholar
  32. Lei Wang, Du Q. Huynh, and Piotr Koniusz. 2020. A Comparative Review of Recent Kinect-Based Action Recognition Algorithms. IEEE Transactions on Image Processing, Vol. 29 (2020), 15--28. https://doi.org/10.1109/tip.2019.2925285Google ScholarGoogle ScholarCross RefCross Ref
  33. Pichao Wang, Wanqing Li, Philip Ogunbona, Jun Wan, and Sergio Escalera. 2018. RGB-D-based Human Motion Recognition with Deep Learning: A Survey. (2018). arxiv: 1711.08362 [cs.CV]Google ScholarGoogle Scholar
  34. Xiaogang Wang. 2013. Intelligent Multi-Camera Video Surveillance: A Review. Pattern Recogn. Lett., Vol. 34, 1 (Jan. 2013), 3--19. https://doi.org/10.1016/j.patrec.2012.07.005Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Chunyu Xie, Ce Li, Baochang Zhang, Chen Chen, Jungong Han, Changqing Zou, and Jianzhuang Liu. 2018. Memory Attention Networks for Skeleton-based Action Recognition. (2018). arxiv: 1804.08254 [cs.CV]Google ScholarGoogle Scholar
  36. Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. (2018). arxiv: 1801.07455 [cs.CV]Google ScholarGoogle Scholar
  37. Pengfei Zhang, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jianru Xue, and Nanning Zheng. 2017. View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data. arxiv: 1703.08274 [cs.CV]Google ScholarGoogle Scholar
  38. P. Zhang, C. Lan, J. Xing, W. Zeng, J. Xue, and N. Zheng. 2019. View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 8 (2019), 1963--1978. https://doi.org/10.1109/TPAMI.2019.2896631Google ScholarGoogle ScholarCross RefCross Ref
  39. Pengfei Zhang, Cuiling Lan, Wenjun Zeng, Junliang Xing, Jianru Xue, and Nanning Zheng. 2020. Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition. (2020). arxiv: 1904.01189 [cs.CV]Google ScholarGoogle Scholar
  40. Pengfei Zhang, Jianru Xue, Cuiling Lan, Wenjun Zeng, Zhanning Gao, and Nanning Zheng. 2018. Adding Attentiveness to the Neurons in Recurrent Neural Networks. In Computer Vision -- ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 136--152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Wenhao Zhang, Melvyn L. Smith, Lyndon N. Smith, and Abdul Farooq. 2016. Gender and Gaze Gesture Recognition for Human-Computer Interaction. Comput. Vis. Image Underst., Vol. 149, C (Aug. 2016), 32--50. https://doi.org/10.1016/j.cviu.2016.03.014Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SAGN: Semantic Adaptive Graph Network for Skeleton-Based Human Action Recognition

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval
      August 2021
      715 pages
      ISBN:9781450384636
      DOI:10.1145/3460426

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 September 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate254of830submissions,31%

      Upcoming Conference

      ICMR '24
      International Conference on Multimedia Retrieval
      June 10 - 14, 2024
      Phuket , Thailand

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader