research-article

Human Action Recognition Based on Vision Transformer and L2 Regularization

Authors:
Qiliang Chen

School of Computer and Information Engineering, Xiamen University of Technology, China

School of Computer and Information Engineering, Xiamen University of Technology, China

0000-0002-3896-9365
View Profile

,
Hasiqidalatu Tang

School of Mathematics and Statistics, Xiamen University of Technology, China

School of Mathematics and Statistics, Xiamen University of Technology, China

0000-0002-3359-0946
View Profile

,
Jiaxin Cai

School of Mathematics and Statistics, Xiamen University of Technology, China

School of Mathematics and Statistics, Xiamen University of Technology, China

0000-0002-8989-085X
View Profile

ICCPR '22: Proceedings of the 2022 11th International Conference on Computing and Pattern RecognitionNovember 2022Pages 224–228https://doi.org/10.1145/3581807.3581840

Published:22 May 2023Publication History

ICCPR '22: Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pages 224–228

ABSTRACT

In recent years, the field of human action recognition has been the focus of computer vision, and human action recognition has a good prospect in many fields, such as security state monitoring, behavior characteristics analysis and network video image restoration. In this paper, based on attention mechanism of human action recognition method is studied, in order to improve the model accuracy and efficiency in VIT network structure as the framework of feature extraction, because video data includes characteristics of time and space, so choose the space and time attention mechanism instead of the traditional convolution network for feature extraction, In addition, L2 weight attenuation regularization is introduced in model training to prevent the model from overfitting the training data. Through the test on the human action related dataset UCF101, it is found that the proposed model can effectively improve the recognition accuracy compared with other models.

References

Hu Qiong, Qin Lei, Huang Qingming. Overview of Human Action Recognition Based on Vision. Chinese Journal of Computers, 2013, Vol.12(12):2512-2524.Google Scholar
Fu Bin, Fu Xin, Cui Jianguo.Human Pose Recognition Method for Elderly Assistance Mechanism Based on MEMS Sensor. Journal of Harbin University of Commerce (Natural Science Edition),2021,37(05):590-594.Google Scholar
Cao Shumin. Human Action Recognition and Interaction Based on Intelligent Wearable Device. Anhui: University of Science and Technology of China, China, 2012 (in Chinese) 2020.Google Scholar
Simonyan K Zisserman A. Two-stream convolutional networks for action recognition in videos. https://arxiv.org/pdf/1406.2199.pdfGoogle Scholar
Feichtenhofer C,Pinz A, Zisserman A. Convolutional two-stream network fusion for video action recognition. https://arxiv.org/pdf/1604.06573.pdfGoogle Scholar
K. Cho, B. Van Merrienboer, D. Bahdanau Learning phrase representations using RNN encoder decoder for statistical machine translation.in EMNLP, ACL, 2014,1724–1734.Google Scholar
Wang Zengqiang, Zhang Wenqiang, Zhang Liang. Human behavior recognition by introducing high-order attention mechanism. Signal processing, 2020,36 (08) :1272-1279.Google Scholar
Seyma Yucer and Yusuf Sinan Akgul, 3D Human Action Recognition with Siamese-LSTM Based Deep Metric Learning. Journal of Image and Graphics, 2018, pp. 21-26.Google Scholar
Naresh Kumar and Nagarajan Sukavanam, Motion Trajectory for Human Action Recognition Using Fourier Temporal Features of Skeleton Joints, Journal of Image and Graphics, 2018, pp. 174-180.Google ScholarCross Ref
Tasweer Ahmad, Junaid Rafique, Hassam Muazzam, and Tahir Rizvi, Using Discrete Cosine Transform Based Features for Human Action Recognition, Journal of Image and Graphics, 2015, pp. 96-101.Google Scholar
Muhammad Hassan, Tasweer Ahmad, Nudrat Liaqat, Ali Farooq, Syed Asghar Ali, and Syed Rizwan hassan, A Review on Human Actions Recognition Using Vision Based Techniques, Journal of Image and Graphics, 2014, pp. 28-32.Google ScholarCross Ref
Ye Qing,Tan Zexian,Qu Chang,Zhang Li. Human motion recognition using three-dimensional skeleton model based on RGBD vision system. Journal of Physics: Conference Series,2021,1754(1).Google Scholar
Oomro K, Zamira R, Shah M. Ucf101:a dataset of 101 human actions classes from videos in the wild. [2020-08-10] https://arxiv. org/pdf/1212. 0402. pdfGoogle Scholar
Kuehne H, Jhuang H, Stiefelhagen R, et al. HMDB:a large video database for human motion recognition. Proceedings of International Conference on High Performance Computing in Science and Engineering. Berlin, Germany:Springer, 2013.Google Scholar
Tran D, Bourdev L, Fergus R, et al. Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction. Greece:IEEE,2020.Google Scholar
Zhang Yu. Research on Human Action Recognition Method Based on Deep Learning. Beijing: Beijing University of Civil Engineering and Architecture,2021.Google Scholar
Gedas Bertasius ,Heng Wang , Lorenzo Torresani. Is Space-Time Attention All You Need for Video Understanding. https://arxiv.org/pdf/2102.05095.pdfGoogle Scholar
Si C Y, Jing Y, Wang W, Skeleton-based action recognition with spatial reasoning and temporal stack learning. Guang Zhou:ICDIP,2019.Google Scholar
Zhang P F, Lan C L, Xing J L, View adaptive neural networks for high performance skeleton-based human action recognition. Xi An:IEEE,2019Google Scholar
Shou Z, Lin X, Kalantidis Y, et al. Dmc-net: Generating discriminative motion cues for fast compressed video action recognition. Greece:IEEE,2019.Google Scholar

Index Terms

Human Action Recognition Based on Vision Transformer and L2 Regularization
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
      1. Usability testing

Recommendations

PIDViT: Pose-Invariant Distilled Vision Transformer for Facial Expression Recognition in the Wild
Many Facial expression recognition methods have achieved great success, but they only considered front facial images or facial images close to the front. Besides, unlike in-the-laboratory datasets, the facial images in the real world (or in the wild) are ...
Read More
Web-Based Classifiers for Human Action Recognition
Part 1

Action recognition in uncontrolled videos is a challenging task, where it is relatively hard to find the large amount of required training videos to model all the variations of the domain. This paper addresses this challenge and proposes a generic ...
Read More
Vision-Based Human Action Recognition Using Machine Learning Techniques
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICCPR '22: Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition
November 2022
683 pages
ISBN:9781450397056
DOI:10.1145/3581807

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 May 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 45
  Total Downloads
- Downloads (Last 12 months)45
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Human Action Recognition Based on Vision Transformer and L2 Regularization

ICCPR '22: Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

ABSTRACT

References

Cited By

Index Terms

Recommendations

PIDViT: Pose-Invariant Distilled Vision Transformer for Facial Expression Recognition in the Wild

Web-Based Classifiers for Human Action Recognition

Vision-Based Human Action Recognition Using Machine Learning Techniques

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Human Action Recognition Based on Vision Transformer and L2 Regularization

ICCPR '22: Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

ABSTRACT

References

Cited By

Index Terms

Recommendations

PIDViT: Pose-Invariant Distilled Vision Transformer for Facial Expression Recognition in the Wild

Web-Based Classifiers for Human Action Recognition

Vision-Based Human Action Recognition Using Machine Learning Techniques

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media