skip to main content
10.1145/3581807.3581840acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccprConference Proceedingsconference-collections
research-article

Human Action Recognition Based on Vision Transformer and L2 Regularization

Published:22 May 2023Publication History

ABSTRACT

In recent years, the field of human action recognition has been the focus of computer vision, and human action recognition has a good prospect in many fields, such as security state monitoring, behavior characteristics analysis and network video image restoration. In this paper, based on attention mechanism of human action recognition method is studied, in order to improve the model accuracy and efficiency in VIT network structure as the framework of feature extraction, because video data includes characteristics of time and space, so choose the space and time attention mechanism instead of the traditional convolution network for feature extraction, In addition, L2 weight attenuation regularization is introduced in model training to prevent the model from overfitting the training data. Through the test on the human action related dataset UCF101, it is found that the proposed model can effectively improve the recognition accuracy compared with other models.

References

  1. Hu Qiong, Qin Lei, Huang Qingming. Overview of Human Action Recognition Based on Vision. Chinese Journal of Computers, 2013, Vol.12(12):2512-2524.Google ScholarGoogle Scholar
  2. Fu Bin, Fu Xin, Cui Jianguo.Human Pose Recognition Method for Elderly Assistance Mechanism Based on MEMS Sensor. Journal of Harbin University of Commerce (Natural Science Edition),2021,37(05):590-594.Google ScholarGoogle Scholar
  3. Cao Shumin. Human Action Recognition and Interaction Based on Intelligent Wearable Device. Anhui: University of Science and Technology of China, China, 2012 (in Chinese) 2020.Google ScholarGoogle Scholar
  4. Simonyan K Zisserman A. Two-stream convolutional networks for action recognition in videos. https://arxiv.org/pdf/1406.2199.pdfGoogle ScholarGoogle Scholar
  5. Feichtenhofer C,Pinz A, Zisserman A. Convolutional two-stream network fusion for video action recognition. https://arxiv.org/pdf/1604.06573.pdfGoogle ScholarGoogle Scholar
  6. K. Cho, B. Van Merrienboer, D. Bahdanau Learning phrase representations using RNN encoder decoder for statistical machine translation.in EMNLP, ACL, 2014,1724–1734.Google ScholarGoogle Scholar
  7. Wang Zengqiang, Zhang Wenqiang, Zhang Liang. Human behavior recognition by introducing high-order attention mechanism. Signal processing, 2020,36 (08) :1272-1279.Google ScholarGoogle Scholar
  8. Seyma Yucer and Yusuf Sinan Akgul, 3D Human Action Recognition with Siamese-LSTM Based Deep Metric Learning. Journal of Image and Graphics, 2018, pp. 21-26.Google ScholarGoogle Scholar
  9. Naresh Kumar and Nagarajan Sukavanam, Motion Trajectory for Human Action Recognition Using Fourier Temporal Features of Skeleton Joints, Journal of Image and Graphics, 2018, pp. 174-180.Google ScholarGoogle ScholarCross RefCross Ref
  10. Tasweer Ahmad, Junaid Rafique, Hassam Muazzam, and Tahir Rizvi, Using Discrete Cosine Transform Based Features for Human Action Recognition, Journal of Image and Graphics, 2015, pp. 96-101.Google ScholarGoogle Scholar
  11. Muhammad Hassan, Tasweer Ahmad, Nudrat Liaqat, Ali Farooq, Syed Asghar Ali, and Syed Rizwan hassan, A Review on Human Actions Recognition Using Vision Based Techniques, Journal of Image and Graphics, 2014, pp. 28-32.Google ScholarGoogle ScholarCross RefCross Ref
  12. Ye Qing,Tan Zexian,Qu Chang,Zhang Li. Human motion recognition using three-dimensional skeleton model based on RGBD vision system. Journal of Physics: Conference Series,2021,1754(1).Google ScholarGoogle Scholar
  13. Oomro K, Zamira R, Shah M. Ucf101:a dataset of 101 human actions classes from videos in the wild. [2020-08-10] https://arxiv. org/pdf/1212. 0402. pdfGoogle ScholarGoogle Scholar
  14. Kuehne H, Jhuang H, Stiefelhagen R, et al. HMDB:a large video database for human motion recognition. Proceedings of International Conference on High Performance Computing in Science and Engineering. Berlin, Germany:Springer, 2013.Google ScholarGoogle Scholar
  15. Tran D, Bourdev L, Fergus R, et al. Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction. Greece:IEEE,2020.Google ScholarGoogle Scholar
  16. Zhang Yu. Research on Human Action Recognition Method Based on Deep Learning. Beijing: Beijing University of Civil Engineering and Architecture,2021.Google ScholarGoogle Scholar
  17. Gedas Bertasius ,Heng Wang , Lorenzo Torresani. Is Space-Time Attention All You Need for Video Understanding. https://arxiv.org/pdf/2102.05095.pdfGoogle ScholarGoogle Scholar
  18. Si C Y, Jing Y, Wang W, Skeleton-based action recognition with spatial reasoning and temporal stack learning. Guang Zhou:ICDIP,2019.Google ScholarGoogle Scholar
  19. Zhang P F, Lan C L, Xing J L, View adaptive neural networks for high performance skeleton-based human action recognition. Xi An:IEEE,2019Google ScholarGoogle Scholar
  20. Shou Z, Lin X, Kalantidis Y, et al. Dmc-net: Generating discriminative motion cues for fast compressed video action recognition. Greece:IEEE,2019.Google ScholarGoogle Scholar

Index Terms

  1. Human Action Recognition Based on Vision Transformer and L2 Regularization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICCPR '22: Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition
      November 2022
      683 pages
      ISBN:9781450397056
      DOI:10.1145/3581807

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 May 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)45
      • Downloads (Last 6 weeks)14

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format