skip to main content
10.1145/3556677.3556692acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicdltConference Proceedingsconference-collections
research-article

MULTI-Stream Graph Convolutional Networks with Efficient spatial-temporal Attention for Skeleton-based Action Recognition

Authors Info & Claims
Published:08 October 2022Publication History

ABSTRACT

In skeleton-based action recognition, graph convolutional networks (GCN) based methods have achieved remarkable performance by building skeleton coordinates into spatial-temporal graphs and explored the relationship between body joints. ST-GCN [19] proposed by Yan et al is regarded as a heuristic method, which firstly introduced GCN to skeleton-based action recognition. However, it applied graph convolution on joints of each frame equally. Less contribution joints caused interference in generating intermediate feature maps. We designed a spatial-temporal attention module to capture significant feature in spatial and temporal dimension simultaneously. Moreover, we adopted inverted bottleneck temporal convolutional networks to decrease computational amount and learned more feature with residual construction. Besides useful message in joints, bones and their movement also contain learnable information for analyzing action categories. We input data to a multi-stream framework. Finally, we demonstrated the efficiency of our proposed MSEA-GCN on NTU RGB+D datasets.

References

  1. Yan, Sijie, Yuanjun Xiong, and Dahua Lin. "Spatial temporal graph convolutional networks for skeleton-based action recognition." In Thirty-second AAAI conference on artificial intelligence. 2018.Google ScholarGoogle Scholar
  2. Chu, Xiao, Wei Yang, Wanli Ouyang, Cheng Ma, Alan L. Yuille, and Xiaogang Wang. "Multi-context attention for human pose estimation." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1831-1840. 2017.Google ScholarGoogle Scholar
  3. Yang, Wei, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang. "End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3073-3082. 2016.Google ScholarGoogle Scholar
  4. Du, Yong, Wei Wang, and Liang Wang. "Hierarchical recurrent neural network for skeleton based action recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1110-1118. 2015.Google ScholarGoogle Scholar
  5. Wang, Hongsong, and Liang Wang. "Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 499-508. 2017.Google ScholarGoogle Scholar
  6. Caetano, Carlos, Jessica Sena, François Brémond, Jefersson A. Dos Santos, and William Robson Schwartz. "Skelemotion: A new representation of skeleton joint sequences based on motion information for 3d action recognition." In 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1-8. IEEE, 2019.Google ScholarGoogle Scholar
  7. Li, Yanshan, Rongjie Xia, Xing Liu, and Qinghua Huang. "Learning shape-motion representations from geometric algebra spatio-temporal model for skeleton-based action recognition." In 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 1066-1071. IEEE, 2019.Google ScholarGoogle Scholar
  8. Shi, Lei, Yifan Zhang, Jian Cheng, and Hanqing Lu. "Two-stream adaptive graph convolutional networks for skeleton-based action recognition." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12026-12035. 2019.Google ScholarGoogle Scholar
  9. Song, Yi-Fan, Zhang Zhang, Caifeng Shan, and Liang Wang. "Richly activated graph convolutional network for robust skeleton-based action recognition." IEEE Transactions on Circuits and Systems for Video Technology 31, no. 5 (2020): 1915-1925.Google ScholarGoogle Scholar
  10. Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. "Mobilenetv2: Inverted residuals and linear bottlenecks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510-4520. 2018.Google ScholarGoogle Scholar
  11. Wang, Qilong, Banggu Wu, Pengfei Zhu, Peihua Li, Wangmeng Zuo, and Qinghua Hu. "Supplementary Material for “ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks”."Google ScholarGoogle Scholar
  12. Veeriah, Vivek, Naifan Zhuang, and Guo-Jun Qi. "Differential recurrent neural networks for action recognition." In Proceedings of the IEEE international conference on computer vision, pp. 4041-4049. 2015.Google ScholarGoogle Scholar
  13. Si, Chenyang, Ya Jing, Wei Wang, Liang Wang, and Tieniu Tan. "Skeleton-based action recognition with spatial reasoning and temporal stack learning." In Proceedings of the European Conference on Computer Vision (ECCV), pp. 103-118. 2018.Google ScholarGoogle Scholar
  14. Li, Maosen, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. "Actional-structural graph convolutional networks for skeleton-based action recognition." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3595-3603. 2019.Google ScholarGoogle Scholar
  15. Song, Yi-Fan, Zhang Zhang, and Liang Wang. "Richly activated graph convolutional network for action recognition with incomplete skeletons." In 2019 IEEE International Conference on Image Processing (ICIP), pp. 1-5. IEEE, 2019.Google ScholarGoogle Scholar
  16. Thakkar, Kalpit, and P. J. Narayanan. "Part-based graph convolutional network for action recognition." arXiv preprint arXiv:1809.04983 (2018).Google ScholarGoogle Scholar
  17. Li, Chao, Qiaoyong Zhong, Di Xie, and Shiliang Pu. "Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation." arXiv preprint arXiv:1804.06055 (2018).Google ScholarGoogle Scholar
  18. Li, Bin, Xi Li, Zhongfei Zhang, and Fei Wu. "Spatio-temporal graph routing for skeleton-based action recognition." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 8561-8568. 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Liu, Mengyuan, Hong Liu, and Chen Chen. "Enhanced skeleton visualization for view invariant human action recognition." Pattern Recognition 68 (2017): 346-362.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technologies
    July 2022
    155 pages
    ISBN:9781450396936
    DOI:10.1145/3556677

    Copyright © 2022 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 8 October 2022

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format