ABSTRACT
Nowadays, video has gradually become the mainstream media of communication, and the massive amounts of videos bring challenge to the task of manual review of the videos. So, using computers to understand the videos is of great significance. Among the approaches of automatic action recognition, skeleton-based approach has many advantages, such as strong robustness to light changes, strong action expression ability, small amount of computation time, etc. In this paper, a multi-scale feature augmented graph convolutional network is proposed. It uses the spatial multi-scale GCN module to extract spatial features of different scales, the multi-scale temporal augmentation module to capture temporal features of different scales. To prove the performance of the proposed method, experiments were performed on two public datasets, NTU-RGB+D and The Kinetics-Skeleton. Compared with other advanced action recognition methods, the proposed method can accomplish action recognize effectively, and the recognition accuracy is improved.
- Zhang, Z, “Microsoft kinect sensor and its effect”, IEEE multimedia (2012), 19(2), pp. 4-10.Google Scholar
- Cao, Z., Hidalgo, G., Simon, T., Wei, S. E., & Sheikh, Y, “OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields”, IEEE transactions on pattern analysis and machine intelligence (2019), 43(1), pp. 172-186.Google Scholar
- H. Wang and L. Wang, “Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 3633-3642.Google ScholarCross Ref
- Wu Zheng, Lin Li, Zhaoxiang Zhang, Yan Huang, Liang Wang, “Skeleton-Based Relational Modeling for Action Recognition”, Proceedings of the IEEE International Conference on Multimedia and Expo (2019), pp. 826-831.Google Scholar
- J. Liu, G. Wang, P. Hu, L. Duan and A. C. Kot, “Global Context-Aware Attention LSTM Networks for 3D Action Recognition”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 3671-3680.Google ScholarCross Ref
- Zewei Ding, Pichao Wang, P. O. Ogunbona and Wanqing Li, “Investigation of different skeleton features for CNN-based 3D action recognition”, Proceedings of the IEEE International Conference on Multimedia & Expo Workshops (2017), pp. 617-622.Google ScholarCross Ref
- P. Wang, W. Li, C. Li, and Y. Hou, “Action recognition based on joint trajectory maps with convolutional neural networks”, Knowledge-Based Systems (2018), 158: pp. 43-53.Google Scholar
- Y. Li, R. Xia, X. Liu and Q. Huang, “Learning Shape-Motion Representations from Geometric Algebra Spatio-Temporal Model for Skeleton-Based Action Recognition”, Proceedings of the IEEE International Conference on Multimedia and Expo (2019), pp. 1066-1071.Google ScholarCross Ref
- C. Caetano, J. Sena, F. Brémond, J. A. Dos Santos and W. R. Schwartz, “SkeleMotion: A New Representation of Skeleton Joint Sequences based on Motion Information for 3D Action Recognition”, Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (2019), pp. 1-8.Google ScholarCross Ref
- S. Yan, Y. Xiong, D. Lin, “Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition”, Proceedings of the AAAI Conference on Artificial Intelligence (2018), pp. 4875-4885.Google ScholarCross Ref
- M. Li, S. Chen, X. Chen, Y. Zhang, Y. Wang and Q. Tian, “Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 3590-3598.Google ScholarCross Ref
- L. Shi, Y. Zhang, J. Cheng and H. Lu, “Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 12018-12027.Google ScholarCross Ref
- Kip F, T. N., & Welling, M, “Semi-supervised classification with graph convolutional networks”, Proceedings of the International Conference on Learning Representations (2017), pp. 1-14.Google Scholar
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena, “DeepWalk: online learning of social representations”, Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '14), pp. 701–710.Google Scholar
- Grover A, Leskovec J, “Node2vec: Scalable Feature Learning for Networks”, Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining (2016), pp. 855-864.Google ScholarDigital Library
- Bruna, Joan & Zaremba, Wojciech & Szlam, Arthur & Lecun, Yann, “Spectral Networks and Locally Connected Networks on Graphs”, arXiv preprint arXiv:1312.6203. (2013).Google Scholar
- Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering”, Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS'16), pp. 3844–3852.Google Scholar
- Xu, Bingbing & Shen, Huawei & Cao, Qi & Keting, Cen & Cheng, Xueqi, “Graph Convolutional Networks using Heat Kernel for Semi-supervised Learning”, Proceedings of the International Joint Conference on Artificial Intelligence (2019), pp. 1-7.Google ScholarCross Ref
- Xu, B., Shen, H., Cao, Q., Qiu, Y., & Cheng, X, “Graph wavelet neural network”, Proceedings of the International Conference on Learning Representations (2019), pp. 1-13.Google Scholar
- Hamilton, W. L., Ying, R., & Leskovec, J, “Inductive representation learning on large graphs”, Proceedings of the 31st International Conference on Neural Information Processing Systems (2017), pp. 1025-1035.Google ScholarDigital Library
- Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y, “Graph attention networks”, Proceedings of the International Conference on Learning Representations (2018), pp. 1-12.Google Scholar
- K. Cheng, Y. Zhang, X. He, W. Chen, J. Cheng and H. Lu, “Skeleton-Based Action Recognition With Shift Graph Convolutional Network”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020), pp. 180-189.Google ScholarCross Ref
- Bai, S., Kolter, J. Z., & Koltun, V, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling”, arXiv preprint arXiv:1803.01271. (2018).Google Scholar
- Zhang, X., Xu, C., & Tao, D, “Context aware graph convolution for skeleton-based action recognition”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 14333-14342.Google ScholarCross Ref
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I, “Attention is all you need”, In Advances in neural information processing systems (2017), pp. 5998-6008.Google ScholarDigital Library
- Hu, J., Shen, L., & Sun, G, “Squeeze-and-excitation networks”, Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 7132-7141.Google ScholarCross Ref
- He, K., Zhang, X., Ren, S., & Sun, J, “Deep residual learning for image recognition”, Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770-778.Google ScholarCross Ref
- Shahroudy, A., Liu, J., Ng, T. T., & Wang, G, “NTU RGB+D: A large scale dataset for 3d human activity analysis”, Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 1010-1019.Google ScholarCross Ref
- Kay, W., Carreira, J., Simonyan, K., Zhang, B., & Zisserman, A, “The kinetics human action video dataset”, arXiv preprint arXiv:1705.06950. (2019).Google Scholar
- L. Li, W. Zheng, Z. Zhang, Y. Huang, and L. Wang “Relational network for skeleton-based action recognition”, Proceedings of the IEEE International Conference on Multimedia & Expo (2019), pp. 826-831.Google Scholar
- B. Li, Y. Dai, X. Cheng, H. Chen, Y. Lin, and M. He, “Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN”, Proceedings of the IEEE International Conference on Multimedia & Expo Workshops (2017), pp. 601- 604.Google Scholar
- Song, Y. F., Zhang, Z., Shan, C., & Wang, L, “Richly activated graph convolutional network for robust skeleton-based action recognition”, IEEE Transactions on Circuits and Systems for Video Technology (2020), 31(5), pp. 1915-1925.Google Scholar
- Huang, L., Huang, Y., Ouyang, W., & Wang, L, “Part-level graph convolutional network for skeleton-based action recognition”, Proceedings of the AAAI Conference on Artificial Intelligence (2020, April), Vol. 34, No. 07, pp. 11045-11052.Google ScholarCross Ref
- Soo Kim, T., & Reiter, A, “Interpretable 3d human action analysis with temporal convolutional networks”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2017), pp. 20-28.Google Scholar
Recommendations
A comparative review of graph convolutional networks for human skeleton-based action recognition
AbstractHuman action recognition is one of the hottest topics in the research field, so there are many relevant review papers illustrating the multi-modality of data, the selection of feature vectors, and the pros and cons of classification networks. With ...
FERGCN: facial expression recognition based on graph convolution network
AbstractDue to the problems of occlusion, pose change, illumination change, and image blur in the wild facial expression dataset, it is a challenging computer vision problem to recognize facial expressions in a complex environment. To solve this problem, ...
Local Eyebrow Feature Attention Network for Masked Face Recognition
During the COVID-19 coronavirus epidemic, wearing masks has become increasingly popular. Traditional occlusion face recognition algorithms are almost ineffective for such heavy mask occlusion. Therefore, it is urgent to improve the recognition performance ...
Comments