research-article

Human Action Recognition Based on Multi-Scale Feature Augmented Graph Convolutional Network

Authors:
Wangyang Lv

College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, China

College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, China
View Profile

,
Yinghua Zhou

College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, China

College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, China
View Profile

ICIAI '22: Proceedings of the 2022 6th International Conference on Innovation in Artificial IntelligenceMarch 2022Pages 112–118https://doi.org/10.1145/3529466.3529501

Published:04 June 2022Publication History

ICIAI '22: Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence

Pages 112–118

ABSTRACT

Nowadays, video has gradually become the mainstream media of communication, and the massive amounts of videos bring challenge to the task of manual review of the videos. So, using computers to understand the videos is of great significance. Among the approaches of automatic action recognition, skeleton-based approach has many advantages, such as strong robustness to light changes, strong action expression ability, small amount of computation time, etc. In this paper, a multi-scale feature augmented graph convolutional network is proposed. It uses the spatial multi-scale GCN module to extract spatial features of different scales, the multi-scale temporal augmentation module to capture temporal features of different scales. To prove the performance of the proposed method, experiments were performed on two public datasets, NTU-RGB+D and The Kinetics-Skeleton. Compared with other advanced action recognition methods, the proposed method can accomplish action recognize effectively, and the recognition accuracy is improved.

References

Zhang, Z, “Microsoft kinect sensor and its effect”, IEEE multimedia (2012), 19(2), pp. 4-10.Google Scholar
Cao, Z., Hidalgo, G., Simon, T., Wei, S. E., & Sheikh, Y, “OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields”, IEEE transactions on pattern analysis and machine intelligence (2019), 43(1), pp. 172-186.Google Scholar
H. Wang and L. Wang, “Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 3633-3642.Google ScholarCross Ref
Wu Zheng, Lin Li, Zhaoxiang Zhang, Yan Huang, Liang Wang, “Skeleton-Based Relational Modeling for Action Recognition”, Proceedings of the IEEE International Conference on Multimedia and Expo (2019), pp. 826-831.Google Scholar
J. Liu, G. Wang, P. Hu, L. Duan and A. C. Kot, “Global Context-Aware Attention LSTM Networks for 3D Action Recognition”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 3671-3680.Google ScholarCross Ref
Zewei Ding, Pichao Wang, P. O. Ogunbona and Wanqing Li, “Investigation of different skeleton features for CNN-based 3D action recognition”, Proceedings of the IEEE International Conference on Multimedia & Expo Workshops (2017), pp. 617-622.Google ScholarCross Ref
P. Wang, W. Li, C. Li, and Y. Hou, “Action recognition based on joint trajectory maps with convolutional neural networks”, Knowledge-Based Systems (2018), 158: pp. 43-53.Google Scholar
Y. Li, R. Xia, X. Liu and Q. Huang, “Learning Shape-Motion Representations from Geometric Algebra Spatio-Temporal Model for Skeleton-Based Action Recognition”, Proceedings of the IEEE International Conference on Multimedia and Expo (2019), pp. 1066-1071.Google ScholarCross Ref
C. Caetano, J. Sena, F. Brémond, J. A. Dos Santos and W. R. Schwartz, “SkeleMotion: A New Representation of Skeleton Joint Sequences based on Motion Information for 3D Action Recognition”, Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (2019), pp. 1-8.Google ScholarCross Ref
S. Yan, Y. Xiong, D. Lin, “Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition”, Proceedings of the AAAI Conference on Artificial Intelligence (2018), pp. 4875-4885.Google ScholarCross Ref
M. Li, S. Chen, X. Chen, Y. Zhang, Y. Wang and Q. Tian, “Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 3590-3598.Google ScholarCross Ref
L. Shi, Y. Zhang, J. Cheng and H. Lu, “Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 12018-12027.Google ScholarCross Ref
Kip F, T. N., & Welling, M, “Semi-supervised classification with graph convolutional networks”, Proceedings of the International Conference on Learning Representations (2017), pp. 1-14.Google Scholar
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena, “DeepWalk: online learning of social representations”, Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '14), pp. 701–710.Google Scholar
Grover A, Leskovec J, “Node2vec: Scalable Feature Learning for Networks”, Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining (2016), pp. 855-864.Google ScholarDigital Library
Bruna, Joan & Zaremba, Wojciech & Szlam, Arthur & Lecun, Yann, “Spectral Networks and Locally Connected Networks on Graphs”, arXiv preprint arXiv:1312.6203. (2013).Google Scholar
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering”, Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS'16), pp. 3844–3852.Google Scholar
Xu, Bingbing & Shen, Huawei & Cao, Qi & Keting, Cen & Cheng, Xueqi, “Graph Convolutional Networks using Heat Kernel for Semi-supervised Learning”, Proceedings of the International Joint Conference on Artificial Intelligence (2019), pp. 1-7.Google ScholarCross Ref
Xu, B., Shen, H., Cao, Q., Qiu, Y., & Cheng, X, “Graph wavelet neural network”, Proceedings of the International Conference on Learning Representations (2019), pp. 1-13.Google Scholar
Hamilton, W. L., Ying, R., & Leskovec, J, “Inductive representation learning on large graphs”, Proceedings of the 31st International Conference on Neural Information Processing Systems (2017), pp. 1025-1035.Google ScholarDigital Library
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y, “Graph attention networks”, Proceedings of the International Conference on Learning Representations (2018), pp. 1-12.Google Scholar
K. Cheng, Y. Zhang, X. He, W. Chen, J. Cheng and H. Lu, “Skeleton-Based Action Recognition With Shift Graph Convolutional Network”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020), pp. 180-189.Google ScholarCross Ref
Bai, S., Kolter, J. Z., & Koltun, V, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling”, arXiv preprint arXiv:1803.01271. (2018).Google Scholar
Zhang, X., Xu, C., & Tao, D, “Context aware graph convolution for skeleton-based action recognition”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 14333-14342.Google ScholarCross Ref
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I, “Attention is all you need”, In Advances in neural information processing systems (2017), pp. 5998-6008.Google ScholarDigital Library
Hu, J., Shen, L., & Sun, G, “Squeeze-and-excitation networks”, Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 7132-7141.Google ScholarCross Ref
He, K., Zhang, X., Ren, S., & Sun, J, “Deep residual learning for image recognition”, Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770-778.Google ScholarCross Ref
Shahroudy, A., Liu, J., Ng, T. T., & Wang, G, “NTU RGB+D: A large scale dataset for 3d human activity analysis”, Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 1010-1019.Google ScholarCross Ref
Kay, W., Carreira, J., Simonyan, K., Zhang, B., & Zisserman, A, “The kinetics human action video dataset”, arXiv preprint arXiv:1705.06950. (2019).Google Scholar
L. Li, W. Zheng, Z. Zhang, Y. Huang, and L. Wang “Relational network for skeleton-based action recognition”, Proceedings of the IEEE International Conference on Multimedia & Expo (2019), pp. 826-831.Google Scholar
B. Li, Y. Dai, X. Cheng, H. Chen, Y. Lin, and M. He, “Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN”, Proceedings of the IEEE International Conference on Multimedia & Expo Workshops (2017), pp. 601- 604.Google Scholar
Song, Y. F., Zhang, Z., Shan, C., & Wang, L, “Richly activated graph convolutional network for robust skeleton-based action recognition”, IEEE Transactions on Circuits and Systems for Video Technology (2020), 31(5), pp. 1915-1925.Google Scholar
Huang, L., Huang, Y., Ouyang, W., & Wang, L, “Part-level graph convolutional network for skeleton-based action recognition”, Proceedings of the AAAI Conference on Artificial Intelligence (2020, April), Vol. 34, No. 07, pp. 11045-11052.Google ScholarCross Ref
Soo Kim, T., & Reiter, A, “Interpretable 3d human action analysis with temporal convolutional networks”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2017), pp. 20-28.Google Scholar

Recommendations

A comparative review of graph convolutional networks for human skeleton-based action recognition
Abstract
Human action recognition is one of the hottest topics in the research field, so there are many relevant review papers illustrating the multi-modality of data, the selection of feature vectors, and the pros and cons of classification networks. With ...
Read More
FERGCN: facial expression recognition based on graph convolution network
Abstract
Due to the problems of occlusion, pose change, illumination change, and image blur in the wild facial expression dataset, it is a challenging computer vision problem to recognize facial expressions in a complex environment. To solve this problem, ...
Read More
Local Eyebrow Feature Attention Network for Masked Face Recognition
During the COVID-19 coronavirus epidemic, wearing masks has become increasingly popular. Traditional occlusion face recognition algorithms are almost ineffective for such heavy mask occlusion. Therefore, it is urgent to improve the recognition performance ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICIAI '22: Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence
March 2022
240 pages
ISBN:9781450395502
DOI:10.1145/3529466

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 June 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Graph convolutional network
Human action recognition
Skeleton-based
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 41
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Human Action Recognition Based on Multi-Scale Feature Augmented Graph Convolutional Network

ICIAI '22: Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence

ABSTRACT

References

Cited By

Recommendations

A comparative review of graph convolutional networks for human skeleton-based action recognition

FERGCN: facial expression recognition based on graph convolution network

Local Eyebrow Feature Attention Network for Masked Face Recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Human Action Recognition Based on Multi-Scale Feature Augmented Graph Convolutional Network

ICIAI '22: Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence

ABSTRACT

References

Cited By

Recommendations

A comparative review of graph convolutional networks for human skeleton-based action recognition

FERGCN: facial expression recognition based on graph convolution network

Local Eyebrow Feature Attention Network for Masked Face Recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media