ABSTRACT
In recent years, the field of human action recognition has been the focus of computer vision, and human action recognition has a good prospect in many fields, such as security state monitoring, behavior characteristics analysis and network video image restoration. In this paper, based on attention mechanism of human action recognition method is studied, in order to improve the model accuracy and efficiency in VIT network structure as the framework of feature extraction, because video data includes characteristics of time and space, so choose the space and time attention mechanism instead of the traditional convolution network for feature extraction, In addition, L2 weight attenuation regularization is introduced in model training to prevent the model from overfitting the training data. Through the test on the human action related dataset UCF101, it is found that the proposed model can effectively improve the recognition accuracy compared with other models.
- Hu Qiong, Qin Lei, Huang Qingming. Overview of Human Action Recognition Based on Vision. Chinese Journal of Computers, 2013, Vol.12(12):2512-2524.Google Scholar
- Fu Bin, Fu Xin, Cui Jianguo.Human Pose Recognition Method for Elderly Assistance Mechanism Based on MEMS Sensor. Journal of Harbin University of Commerce (Natural Science Edition),2021,37(05):590-594.Google Scholar
- Cao Shumin. Human Action Recognition and Interaction Based on Intelligent Wearable Device. Anhui: University of Science and Technology of China, China, 2012 (in Chinese) 2020.Google Scholar
- Simonyan K Zisserman A. Two-stream convolutional networks for action recognition in videos. https://arxiv.org/pdf/1406.2199.pdfGoogle Scholar
- Feichtenhofer C,Pinz A, Zisserman A. Convolutional two-stream network fusion for video action recognition. https://arxiv.org/pdf/1604.06573.pdfGoogle Scholar
- K. Cho, B. Van Merrienboer, D. Bahdanau Learning phrase representations using RNN encoder decoder for statistical machine translation.in EMNLP, ACL, 2014,1724–1734.Google Scholar
- Wang Zengqiang, Zhang Wenqiang, Zhang Liang. Human behavior recognition by introducing high-order attention mechanism. Signal processing, 2020,36 (08) :1272-1279.Google Scholar
- Seyma Yucer and Yusuf Sinan Akgul, 3D Human Action Recognition with Siamese-LSTM Based Deep Metric Learning. Journal of Image and Graphics, 2018, pp. 21-26.Google Scholar
- Naresh Kumar and Nagarajan Sukavanam, Motion Trajectory for Human Action Recognition Using Fourier Temporal Features of Skeleton Joints, Journal of Image and Graphics, 2018, pp. 174-180.Google ScholarCross Ref
- Tasweer Ahmad, Junaid Rafique, Hassam Muazzam, and Tahir Rizvi, Using Discrete Cosine Transform Based Features for Human Action Recognition, Journal of Image and Graphics, 2015, pp. 96-101.Google Scholar
- Muhammad Hassan, Tasweer Ahmad, Nudrat Liaqat, Ali Farooq, Syed Asghar Ali, and Syed Rizwan hassan, A Review on Human Actions Recognition Using Vision Based Techniques, Journal of Image and Graphics, 2014, pp. 28-32.Google ScholarCross Ref
- Ye Qing,Tan Zexian,Qu Chang,Zhang Li. Human motion recognition using three-dimensional skeleton model based on RGBD vision system. Journal of Physics: Conference Series,2021,1754(1).Google Scholar
- Oomro K, Zamira R, Shah M. Ucf101:a dataset of 101 human actions classes from videos in the wild. [2020-08-10] https://arxiv. org/pdf/1212. 0402. pdfGoogle Scholar
- Kuehne H, Jhuang H, Stiefelhagen R, et al. HMDB:a large video database for human motion recognition. Proceedings of International Conference on High Performance Computing in Science and Engineering. Berlin, Germany:Springer, 2013.Google Scholar
- Tran D, Bourdev L, Fergus R, et al. Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction. Greece:IEEE,2020.Google Scholar
- Zhang Yu. Research on Human Action Recognition Method Based on Deep Learning. Beijing: Beijing University of Civil Engineering and Architecture,2021.Google Scholar
- Gedas Bertasius ,Heng Wang , Lorenzo Torresani. Is Space-Time Attention All You Need for Video Understanding. https://arxiv.org/pdf/2102.05095.pdfGoogle Scholar
- Si C Y, Jing Y, Wang W, Skeleton-based action recognition with spatial reasoning and temporal stack learning. Guang Zhou:ICDIP,2019.Google Scholar
- Zhang P F, Lan C L, Xing J L, View adaptive neural networks for high performance skeleton-based human action recognition. Xi An:IEEE,2019Google Scholar
- Shou Z, Lin X, Kalantidis Y, et al. Dmc-net: Generating discriminative motion cues for fast compressed video action recognition. Greece:IEEE,2019.Google Scholar
Index Terms
- Human Action Recognition Based on Vision Transformer and L2 Regularization
Recommendations
PIDViT: Pose-Invariant Distilled Vision Transformer for Facial Expression Recognition in the Wild
Many Facial expression recognition methods have achieved great success, but they only considered front facial images or facial images close to the front. Besides, unlike in-the-laboratory datasets, the facial images in the real world (or in the wild) are ...
Web-Based Classifiers for Human Action Recognition
Part 1Action recognition in uncontrolled videos is a challenging task, where it is relatively hard to find the large amount of required training videos to model all the variations of the domain. This paper addresses this challenge and proposes a generic ...
Comments