research-article

MULTI-Stream Graph Convolutional Networks with Efficient spatial-temporal Attention for Skeleton-based Action Recognition

Authors:
Hui Yueting

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, China

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, China
View Profile

,
Sun Wensheng

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, China

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, China
View Profile

ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning TechnologiesJuly 2022Pages 32–36https://doi.org/10.1145/3556677.3556692

Published:08 October 2022Publication History

ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technologies

Pages 32–36

ABSTRACT

In skeleton-based action recognition, graph convolutional networks (GCN) based methods have achieved remarkable performance by building skeleton coordinates into spatial-temporal graphs and explored the relationship between body joints. ST-GCN [19] proposed by Yan et al is regarded as a heuristic method, which firstly introduced GCN to skeleton-based action recognition. However, it applied graph convolution on joints of each frame equally. Less contribution joints caused interference in generating intermediate feature maps. We designed a spatial-temporal attention module to capture significant feature in spatial and temporal dimension simultaneously. Moreover, we adopted inverted bottleneck temporal convolutional networks to decrease computational amount and learned more feature with residual construction. Besides useful message in joints, bones and their movement also contain learnable information for analyzing action categories. We input data to a multi-stream framework. Finally, we demonstrated the efficiency of our proposed MSEA-GCN on NTU RGB+D datasets.

References

Yan, Sijie, Yuanjun Xiong, and Dahua Lin. "Spatial temporal graph convolutional networks for skeleton-based action recognition." In Thirty-second AAAI conference on artificial intelligence. 2018.Google Scholar
Chu, Xiao, Wei Yang, Wanli Ouyang, Cheng Ma, Alan L. Yuille, and Xiaogang Wang. "Multi-context attention for human pose estimation." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1831-1840. 2017.Google Scholar
Yang, Wei, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang. "End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3073-3082. 2016.Google Scholar
Du, Yong, Wei Wang, and Liang Wang. "Hierarchical recurrent neural network for skeleton based action recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1110-1118. 2015.Google Scholar
Wang, Hongsong, and Liang Wang. "Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 499-508. 2017.Google Scholar
Caetano, Carlos, Jessica Sena, François Brémond, Jefersson A. Dos Santos, and William Robson Schwartz. "Skelemotion: A new representation of skeleton joint sequences based on motion information for 3d action recognition." In 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1-8. IEEE, 2019.Google Scholar
Li, Yanshan, Rongjie Xia, Xing Liu, and Qinghua Huang. "Learning shape-motion representations from geometric algebra spatio-temporal model for skeleton-based action recognition." In 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 1066-1071. IEEE, 2019.Google Scholar
Shi, Lei, Yifan Zhang, Jian Cheng, and Hanqing Lu. "Two-stream adaptive graph convolutional networks for skeleton-based action recognition." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12026-12035. 2019.Google Scholar
Song, Yi-Fan, Zhang Zhang, Caifeng Shan, and Liang Wang. "Richly activated graph convolutional network for robust skeleton-based action recognition." IEEE Transactions on Circuits and Systems for Video Technology 31, no. 5 (2020): 1915-1925.Google Scholar
Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. "Mobilenetv2: Inverted residuals and linear bottlenecks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510-4520. 2018.Google Scholar
Wang, Qilong, Banggu Wu, Pengfei Zhu, Peihua Li, Wangmeng Zuo, and Qinghua Hu. "Supplementary Material for “ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks”."Google Scholar
Veeriah, Vivek, Naifan Zhuang, and Guo-Jun Qi. "Differential recurrent neural networks for action recognition." In Proceedings of the IEEE international conference on computer vision, pp. 4041-4049. 2015.Google Scholar
Si, Chenyang, Ya Jing, Wei Wang, Liang Wang, and Tieniu Tan. "Skeleton-based action recognition with spatial reasoning and temporal stack learning." In Proceedings of the European Conference on Computer Vision (ECCV), pp. 103-118. 2018.Google Scholar
Li, Maosen, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. "Actional-structural graph convolutional networks for skeleton-based action recognition." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3595-3603. 2019.Google Scholar
Song, Yi-Fan, Zhang Zhang, and Liang Wang. "Richly activated graph convolutional network for action recognition with incomplete skeletons." In 2019 IEEE International Conference on Image Processing (ICIP), pp. 1-5. IEEE, 2019.Google Scholar
Thakkar, Kalpit, and P. J. Narayanan. "Part-based graph convolutional network for action recognition." arXiv preprint arXiv:1809.04983 (2018).Google Scholar
Li, Chao, Qiaoyong Zhong, Di Xie, and Shiliang Pu. "Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation." arXiv preprint arXiv:1804.06055 (2018).Google Scholar
Li, Bin, Xi Li, Zhongfei Zhang, and Fei Wu. "Spatio-temporal graph routing for skeleton-based action recognition." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 8561-8568. 2019.Google ScholarDigital Library
Liu, Mengyuan, Hong Liu, and Chen Chen. "Enhanced skeleton visualization for view invariant human action recognition." Pattern Recognition 68 (2017): 346-362.Google ScholarDigital Library

Recommendations

Action Recognition Based on Spatial Temporal Graph Convolutional Networks
CSAE '19: Proceedings of the 3rd International Conference on Computer Science and Application Engineering

Compared with the achievements of convolutional neural networks in image classification, human action recognition for video is not ideal in terms of accuracy and practicability. A major method in action recognition is based on the human skeleton, which ...
Read More
Skeleton-based action recognition with multi-stream, multi-scale dilated spatial-temporal graph convolution network
Abstract
Action recognition techniques based on skeleton data are receiving more and more attention in the field of computer vision due to their ability to adapt to dynamic environments and complex backgrounds. Topologizing human skeleton data as spatial-...
Read More
Hierarchical Graph Convolutional Network for Skeleton-Based Action Recognition
Image and Graphics
Abstract
Skeleton-based action recognition has drawn much attention recently. Previous methods mainly focus on using RNNs or CNNs to process skeletons. But they ignore the topological structure of the skeleton which is very important for action ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technologies
July 2022
155 pages
ISBN:9781450396936
DOI:10.1145/3556677

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 47
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

MULTI-Stream Graph Convolutional Networks with Efficient spatial-temporal Attention for Skeleton-based Action Recognition

ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technologies

ABSTRACT

References

Cited By

Recommendations

Action Recognition Based on Spatial Temporal Graph Convolutional Networks

Skeleton-based action recognition with multi-stream, multi-scale dilated spatial-temporal graph convolution network

Hierarchical Graph Convolutional Network for Skeleton-Based Action Recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

MULTI-Stream Graph Convolutional Networks with Efficient spatial-temporal Attention for Skeleton-based Action Recognition

ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technologies

ABSTRACT

References

Cited By

Recommendations

Action Recognition Based on Spatial Temporal Graph Convolutional Networks

Skeleton-based action recognition with multi-stream, multi-scale dilated spatial-temporal graph convolution network

Hierarchical Graph Convolutional Network for Skeleton-Based Action Recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media