skip to main content
10.1145/3529466.3529501acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiciaiConference Proceedingsconference-collections
research-article

Human Action Recognition Based on Multi-Scale Feature Augmented Graph Convolutional Network

Authors Info & Claims
Published:04 June 2022Publication History

ABSTRACT

Nowadays, video has gradually become the mainstream media of communication, and the massive amounts of videos bring challenge to the task of manual review of the videos. So, using computers to understand the videos is of great significance. Among the approaches of automatic action recognition, skeleton-based approach has many advantages, such as strong robustness to light changes, strong action expression ability, small amount of computation time, etc. In this paper, a multi-scale feature augmented graph convolutional network is proposed. It uses the spatial multi-scale GCN module to extract spatial features of different scales, the multi-scale temporal augmentation module to capture temporal features of different scales. To prove the performance of the proposed method, experiments were performed on two public datasets, NTU-RGB+D and The Kinetics-Skeleton. Compared with other advanced action recognition methods, the proposed method can accomplish action recognize effectively, and the recognition accuracy is improved.

References

  1. Zhang, Z, “Microsoft kinect sensor and its effect”, IEEE multimedia (2012), 19(2), pp. 4-10.Google ScholarGoogle Scholar
  2. Cao, Z., Hidalgo, G., Simon, T., Wei, S. E., & Sheikh, Y, “OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields”, IEEE transactions on pattern analysis and machine intelligence (2019), 43(1), pp. 172-186.Google ScholarGoogle Scholar
  3. H. Wang and L. Wang, “Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 3633-3642.Google ScholarGoogle ScholarCross RefCross Ref
  4. Wu Zheng, Lin Li, Zhaoxiang Zhang, Yan Huang, Liang Wang, “Skeleton-Based Relational Modeling for Action Recognition”, Proceedings of the IEEE International Conference on Multimedia and Expo (2019), pp. 826-831.Google ScholarGoogle Scholar
  5. J. Liu, G. Wang, P. Hu, L. Duan and A. C. Kot, “Global Context-Aware Attention LSTM Networks for 3D Action Recognition”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 3671-3680.Google ScholarGoogle ScholarCross RefCross Ref
  6. Zewei Ding, Pichao Wang, P. O. Ogunbona and Wanqing Li, “Investigation of different skeleton features for CNN-based 3D action recognition”, Proceedings of the IEEE International Conference on Multimedia & Expo Workshops (2017), pp. 617-622.Google ScholarGoogle ScholarCross RefCross Ref
  7. P. Wang, W. Li, C. Li, and Y. Hou, “Action recognition based on joint trajectory maps with convolutional neural networks”, Knowledge-Based Systems (2018), 158: pp. 43-53.Google ScholarGoogle Scholar
  8. Y. Li, R. Xia, X. Liu and Q. Huang, “Learning Shape-Motion Representations from Geometric Algebra Spatio-Temporal Model for Skeleton-Based Action Recognition”, Proceedings of the IEEE International Conference on Multimedia and Expo (2019), pp. 1066-1071.Google ScholarGoogle ScholarCross RefCross Ref
  9. C. Caetano, J. Sena, F. Brémond, J. A. Dos Santos and W. R. Schwartz, “SkeleMotion: A New Representation of Skeleton Joint Sequences based on Motion Information for 3D Action Recognition”, Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (2019), pp. 1-8.Google ScholarGoogle ScholarCross RefCross Ref
  10. S. Yan, Y. Xiong, D. Lin, “Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition”, Proceedings of the AAAI Conference on Artificial Intelligence (2018), pp. 4875-4885.Google ScholarGoogle ScholarCross RefCross Ref
  11. M. Li, S. Chen, X. Chen, Y. Zhang, Y. Wang and Q. Tian, “Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 3590-3598.Google ScholarGoogle ScholarCross RefCross Ref
  12. L. Shi, Y. Zhang, J. Cheng and H. Lu, “Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 12018-12027.Google ScholarGoogle ScholarCross RefCross Ref
  13. Kip F, T. N., & Welling, M, “Semi-supervised classification with graph convolutional networks”, Proceedings of the International Conference on Learning Representations (2017), pp. 1-14.Google ScholarGoogle Scholar
  14. Bryan Perozzi, Rami Al-Rfou, and Steven Skiena, “DeepWalk: online learning of social representations”, Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '14), pp. 701–710.Google ScholarGoogle Scholar
  15. Grover A, Leskovec J, “Node2vec: Scalable Feature Learning for Networks”, Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining (2016), pp. 855-864.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Bruna, Joan & Zaremba, Wojciech & Szlam, Arthur & Lecun, Yann, “Spectral Networks and Locally Connected Networks on Graphs”, arXiv preprint arXiv:1312.6203. (2013).Google ScholarGoogle Scholar
  17. Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering”, Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS'16), pp. 3844–3852.Google ScholarGoogle Scholar
  18. Xu, Bingbing & Shen, Huawei & Cao, Qi & Keting, Cen & Cheng, Xueqi, “Graph Convolutional Networks using Heat Kernel for Semi-supervised Learning”, Proceedings of the International Joint Conference on Artificial Intelligence (2019), pp. 1-7.Google ScholarGoogle ScholarCross RefCross Ref
  19. Xu, B., Shen, H., Cao, Q., Qiu, Y., & Cheng, X, “Graph wavelet neural network”, Proceedings of the International Conference on Learning Representations (2019), pp. 1-13.Google ScholarGoogle Scholar
  20. Hamilton, W. L., Ying, R., & Leskovec, J, “Inductive representation learning on large graphs”, Proceedings of the 31st International Conference on Neural Information Processing Systems (2017), pp. 1025-1035.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y, “Graph attention networks”, Proceedings of the International Conference on Learning Representations (2018), pp. 1-12.Google ScholarGoogle Scholar
  22. K. Cheng, Y. Zhang, X. He, W. Chen, J. Cheng and H. Lu, “Skeleton-Based Action Recognition With Shift Graph Convolutional Network”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020), pp. 180-189.Google ScholarGoogle ScholarCross RefCross Ref
  23. Bai, S., Kolter, J. Z., & Koltun, V, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling”, arXiv preprint arXiv:1803.01271. (2018).Google ScholarGoogle Scholar
  24. Zhang, X., Xu, C., & Tao, D, “Context aware graph convolution for skeleton-based action recognition”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 14333-14342.Google ScholarGoogle ScholarCross RefCross Ref
  25. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I, “Attention is all you need”, In Advances in neural information processing systems (2017), pp. 5998-6008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hu, J., Shen, L., & Sun, G, “Squeeze-and-excitation networks”, Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 7132-7141.Google ScholarGoogle ScholarCross RefCross Ref
  27. He, K., Zhang, X., Ren, S., & Sun, J, “Deep residual learning for image recognition”, Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770-778.Google ScholarGoogle ScholarCross RefCross Ref
  28. Shahroudy, A., Liu, J., Ng, T. T., & Wang, G, “NTU RGB+D: A large scale dataset for 3d human activity analysis”, Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 1010-1019.Google ScholarGoogle ScholarCross RefCross Ref
  29. Kay, W., Carreira, J., Simonyan, K., Zhang, B., & Zisserman, A, “The kinetics human action video dataset”, arXiv preprint arXiv:1705.06950. (2019).Google ScholarGoogle Scholar
  30. L. Li, W. Zheng, Z. Zhang, Y. Huang, and L. Wang “Relational network for skeleton-based action recognition”, Proceedings of the IEEE International Conference on Multimedia & Expo (2019), pp. 826-831.Google ScholarGoogle Scholar
  31. B. Li, Y. Dai, X. Cheng, H. Chen, Y. Lin, and M. He, “Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN”, Proceedings of the IEEE International Conference on Multimedia & Expo Workshops (2017), pp. 601- 604.Google ScholarGoogle Scholar
  32. Song, Y. F., Zhang, Z., Shan, C., & Wang, L, “Richly activated graph convolutional network for robust skeleton-based action recognition”, IEEE Transactions on Circuits and Systems for Video Technology (2020), 31(5), pp. 1915-1925.Google ScholarGoogle Scholar
  33. Huang, L., Huang, Y., Ouyang, W., & Wang, L, “Part-level graph convolutional network for skeleton-based action recognition”, Proceedings of the AAAI Conference on Artificial Intelligence (2020, April), Vol. 34, No. 07, pp. 11045-11052.Google ScholarGoogle ScholarCross RefCross Ref
  34. Soo Kim, T., & Reiter, A, “Interpretable 3d human action analysis with temporal convolutional networks”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2017), pp. 20-28.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICIAI '22: Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence
    March 2022
    240 pages
    ISBN:9781450395502
    DOI:10.1145/3529466

    Copyright © 2022 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 4 June 2022

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format