ABSTRACT
In skeleton-based action recognition, graph convolutional networks (GCN) based methods have achieved remarkable performance by building skeleton coordinates into spatial-temporal graphs and explored the relationship between body joints. ST-GCN [19] proposed by Yan et al is regarded as a heuristic method, which firstly introduced GCN to skeleton-based action recognition. However, it applied graph convolution on joints of each frame equally. Less contribution joints caused interference in generating intermediate feature maps. We designed a spatial-temporal attention module to capture significant feature in spatial and temporal dimension simultaneously. Moreover, we adopted inverted bottleneck temporal convolutional networks to decrease computational amount and learned more feature with residual construction. Besides useful message in joints, bones and their movement also contain learnable information for analyzing action categories. We input data to a multi-stream framework. Finally, we demonstrated the efficiency of our proposed MSEA-GCN on NTU RGB+D datasets.
- Yan, Sijie, Yuanjun Xiong, and Dahua Lin. "Spatial temporal graph convolutional networks for skeleton-based action recognition." In Thirty-second AAAI conference on artificial intelligence. 2018.Google Scholar
- Chu, Xiao, Wei Yang, Wanli Ouyang, Cheng Ma, Alan L. Yuille, and Xiaogang Wang. "Multi-context attention for human pose estimation." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1831-1840. 2017.Google Scholar
- Yang, Wei, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang. "End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3073-3082. 2016.Google Scholar
- Du, Yong, Wei Wang, and Liang Wang. "Hierarchical recurrent neural network for skeleton based action recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1110-1118. 2015.Google Scholar
- Wang, Hongsong, and Liang Wang. "Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 499-508. 2017.Google Scholar
- Caetano, Carlos, Jessica Sena, François Brémond, Jefersson A. Dos Santos, and William Robson Schwartz. "Skelemotion: A new representation of skeleton joint sequences based on motion information for 3d action recognition." In 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1-8. IEEE, 2019.Google Scholar
- Li, Yanshan, Rongjie Xia, Xing Liu, and Qinghua Huang. "Learning shape-motion representations from geometric algebra spatio-temporal model for skeleton-based action recognition." In 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 1066-1071. IEEE, 2019.Google Scholar
- Shi, Lei, Yifan Zhang, Jian Cheng, and Hanqing Lu. "Two-stream adaptive graph convolutional networks for skeleton-based action recognition." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12026-12035. 2019.Google Scholar
- Song, Yi-Fan, Zhang Zhang, Caifeng Shan, and Liang Wang. "Richly activated graph convolutional network for robust skeleton-based action recognition." IEEE Transactions on Circuits and Systems for Video Technology 31, no. 5 (2020): 1915-1925.Google Scholar
- Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. "Mobilenetv2: Inverted residuals and linear bottlenecks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510-4520. 2018.Google Scholar
- Wang, Qilong, Banggu Wu, Pengfei Zhu, Peihua Li, Wangmeng Zuo, and Qinghua Hu. "Supplementary Material for “ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks”."Google Scholar
- Veeriah, Vivek, Naifan Zhuang, and Guo-Jun Qi. "Differential recurrent neural networks for action recognition." In Proceedings of the IEEE international conference on computer vision, pp. 4041-4049. 2015.Google Scholar
- Si, Chenyang, Ya Jing, Wei Wang, Liang Wang, and Tieniu Tan. "Skeleton-based action recognition with spatial reasoning and temporal stack learning." In Proceedings of the European Conference on Computer Vision (ECCV), pp. 103-118. 2018.Google Scholar
- Li, Maosen, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. "Actional-structural graph convolutional networks for skeleton-based action recognition." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3595-3603. 2019.Google Scholar
- Song, Yi-Fan, Zhang Zhang, and Liang Wang. "Richly activated graph convolutional network for action recognition with incomplete skeletons." In 2019 IEEE International Conference on Image Processing (ICIP), pp. 1-5. IEEE, 2019.Google Scholar
- Thakkar, Kalpit, and P. J. Narayanan. "Part-based graph convolutional network for action recognition." arXiv preprint arXiv:1809.04983 (2018).Google Scholar
- Li, Chao, Qiaoyong Zhong, Di Xie, and Shiliang Pu. "Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation." arXiv preprint arXiv:1804.06055 (2018).Google Scholar
- Li, Bin, Xi Li, Zhongfei Zhang, and Fei Wu. "Spatio-temporal graph routing for skeleton-based action recognition." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 8561-8568. 2019.Google ScholarDigital Library
- Liu, Mengyuan, Hong Liu, and Chen Chen. "Enhanced skeleton visualization for view invariant human action recognition." Pattern Recognition 68 (2017): 346-362.Google ScholarDigital Library
Recommendations
Action Recognition Based on Spatial Temporal Graph Convolutional Networks
CSAE '19: Proceedings of the 3rd International Conference on Computer Science and Application EngineeringCompared with the achievements of convolutional neural networks in image classification, human action recognition for video is not ideal in terms of accuracy and practicability. A major method in action recognition is based on the human skeleton, which ...
Skeleton-based action recognition with multi-stream, multi-scale dilated spatial-temporal graph convolution network
AbstractAction recognition techniques based on skeleton data are receiving more and more attention in the field of computer vision due to their ability to adapt to dynamic environments and complex backgrounds. Topologizing human skeleton data as spatial-...
Hierarchical Graph Convolutional Network for Skeleton-Based Action Recognition
Image and GraphicsAbstractSkeleton-based action recognition has drawn much attention recently. Previous methods mainly focus on using RNNs or CNNs to process skeletons. But they ignore the topological structure of the skeleton which is very important for action ...
Comments