skip to main content
10.1145/3331453.3361651acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsaeConference Proceedingsconference-collections
research-article

Action Recognition Based on Spatial Temporal Graph Convolutional Networks

Published: 22 October 2019 Publication History

Abstract

Compared with the achievements of convolutional neural networks in image classification, human action recognition for video is not ideal in terms of accuracy and practicability. A major method in action recognition is based on the human skeleton, which is an important information for characterizing human motion in video. In this paper, the human skeleton in video is extracted by OpenPose, and the spatial and temporal graph of skeleton is constructed. The spatial and temporal graph convolution network (ST-GCN) is used to extract the spatial and temporal features of the human skeleton on consecutive video frames, and the features is used for video classification. In order to verify the action recognition performance based on the ST-GCN, a 50.53% top-1 and 81.58% top-5 accuracy is obtained on the UCF-101 dataset. A specific UCF-31 dataset is constructed manually and a 68.73% top-1 and 94.43% top-5 accuracy is obtained, verifying that the identification accuracy of ST-GCN model would also be improved when the accuracy of skeleton acquisition was improved.

References

[1]
M. Capecci, M. G Ceravolo, F. Ferracuti, et al. 2018. An instrumental approach for monitoring physical exercises in a visual markerless scenario: A proof of concept. Journal of Biomechanics, Volume 69, 70--80.
[2]
M. Niepert, M. Ahmed and K. Kutzkov. 2016. Learning convolutional neural networks for graphs. International Conference on Machine Learning. New York City, NY, USA, 2014--2023.
[3]
S. Bai, J. Z Kolter and V. Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271.
[4]
K. Simonyan and A. Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In NIPS. Montreal, CANADA, 568--576.
[5]
C. Feichtenhofer, A. Pinz and A. Zisserman. 2016. Convolutional two-stream network fusion for video action recognition. In CVPR. Las Vegas, Nevada, USA, 1933--1941.
[6]
L. Wang, Y. Xiong, Z. Wang, et al. 2016. Temporal segment networks: Towards good practices for deep action recognition. In ECCV. Amsterdam, The Netherlands, 20--36.
[7]
S. Ji, W. Xu, M. Yang, et al. 2013. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 1(2013), 221--231.
[8]
D. Tran, L. Bourdev and R. Fergus, 2015, et al. Learning spatiotemporal features with 3d convolutional networks. In ICCV. Santiago, Chile, 4489--4497.
[9]
S. Song, C. Lan, J. Xing, et al. 2017. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In AAAI. San Francisco, California, USA, 4263--4270.
[10]
W. Zhu, C. Lan, J. Xing, et al. 2016. Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In AAAI. Phoenix, Arizona, USA, 3697--3704.
[11]
Y. Li, C. Lan, J. Xing, et al. 2016. Online human action detection using joint classification-regression recurrent neural networks. In ECCV. Amsterdam, The Netherlands, 203--220.
[12]
S. Yan, Y. Xiong and D. Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In AAAI. New Orleans, LA, USA, 7444--7452.
[13]
C. Zhe, T. Simon, S. E Wei, et al. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In CVPR. Honolulu, HI, USA, 1302--1310.
[14]
J. Dai, H. Qi, Y. Xiong, et al. 2017. Deformable Convolutional Networks. In ICCV. Venice, Italy.
[15]
K. Soomro, A. R Zamir and M. Shah. A dataset of 101 human action classes from videos in the wild. Center for Research in Computer Vision, 2012.

Cited By

View all
  • (2024)Extended Multi-stream Temporal-attention Module for Skeleton-based Human Action Recognition (HAR)Computers in Human Behavior10.1016/j.chb.2024.108482(108482)Online publication date: Oct-2024
  • (2023)HAR-time: human action recognition with time factor analysis on worker operating timeInternational Journal of Computer Integrated Manufacturing10.1080/0951192X.2023.217773636:8(1219-1237)Online publication date: 24-Feb-2023
  • (2023)Determination of workers' compliance to safety regulations using a spatio-temporal graph convolution networkAdvanced Engineering Informatics10.1016/j.aei.2023.10194256(101942)Online publication date: Apr-2023
  • Show More Cited By

Index Terms

  1. Action Recognition Based on Spatial Temporal Graph Convolutional Networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CSAE '19: Proceedings of the 3rd International Conference on Computer Science and Application Engineering
    October 2019
    942 pages
    ISBN:9781450362948
    DOI:10.1145/3331453
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 October 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Human action recognition
    2. Human skeleton
    3. Temporal and spatial graph convolution
    4. UCF-101 dataset
    5. UCF-31

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    CSAE 2019

    Acceptance Rates

    Overall Acceptance Rate 368 of 770 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Extended Multi-stream Temporal-attention Module for Skeleton-based Human Action Recognition (HAR)Computers in Human Behavior10.1016/j.chb.2024.108482(108482)Online publication date: Oct-2024
    • (2023)HAR-time: human action recognition with time factor analysis on worker operating timeInternational Journal of Computer Integrated Manufacturing10.1080/0951192X.2023.217773636:8(1219-1237)Online publication date: 24-Feb-2023
    • (2023)Determination of workers' compliance to safety regulations using a spatio-temporal graph convolution networkAdvanced Engineering Informatics10.1016/j.aei.2023.10194256(101942)Online publication date: Apr-2023
    • (2023)A Novel Action Recognition Method Based on Attention Enhancement and Relative EntropyProceedings of 2nd International Conference on Artificial Intelligence, Robotics, and Communication10.1007/978-981-99-4554-2_3(19-26)Online publication date: 1-Oct-2023
    • (2022)A Sliding Window Based Approach With Majority Voting for Online Human Action Recognition using Spatial Temporal Graph Convolutional Neural NetworksProceedings of the 2022 7th International Conference on Machine Learning Technologies10.1145/3529399.3529425(155-163)Online publication date: 11-Mar-2022
    • (2022)Human Action Recognition using BlazePose Skeleton on Spatial Temporal Graph Convolutional Neural Networks2022 9th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE)10.1109/ICITACEE55701.2022.9924010(206-211)Online publication date: 25-Aug-2022
    • (2022)Skeleton-Based ST-GCN for Human Action Recognition With Extended Skeleton Graph and Partitioning StrategyIEEE Access10.1109/ACCESS.2022.316471110(41403-41410)Online publication date: 2022
    • (2021)Research on Gait Evaluation Method Based on Machine Vision2021 International Conference on Intelligent Computing, Automation and Applications (ICAA)10.1109/ICAA53760.2021.00072(357-362)Online publication date: Jun-2021
    • (2021)Facial Expressions and Body Postures Emotion Recognition based on Convolutional Attention Network2021 International Conference on Computer, Information and Telecommunication Systems (CITS)10.1109/CITS52676.2021.9618520(1-5)Online publication date: 11-Nov-2021
    • (2021)Skeleton-Split Framework using Spatial Temporal Graph Convolutional Networks for Action Recognition2021 4th International Conference on Bio-Engineering for Smart Technologies (BioSMART)10.1109/BioSMART54244.2021.9677634(1-5)Online publication date: 8-Dec-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media