research-article

Research on Human Behavior Recognition Based on 3D-Ghostnet

Authors:

Yupeng LiuAuthors Info & Claims

IoTML '24: Proceedings of the 2024 4th International Conference on Internet of Things and Machine Learning

Pages 161 - 166

https://doi.org/10.1145/3697467.3697629

Published: 08 November 2024 Publication History

Abstract

To address the problems of large number of parameters, low accuracy and information redundancy in traditional 3D convolutional neural networks, the paper proposes an improved lightweight action recognition network with fused attention mechanism. Firstly, the feature extraction capability is enhanced by adding an improved spatio-temporal separation self-attention mechanism; Then, an improved 3D-Ghostnet convolutional neural network is used in the 3D convolutional part to lighten the network and solve the information redundancy problem; Finally, experiments are conducted on several datasets to verify the effectiveness of the network. The results show that the proposed behaviour recognition network has a computational cost of14.85GFlops, a parametric amount of18.83 M, and a recognition accuracy of 75.9% and 96.2% on the HMDB51 dataset and the UCF101 dataset, demonstrating the effectiveness of the algorithmic framework.

References

[1]

Hatirnaz E, Sah M, Direkoglu C. A novel framework and concept-based semantic search Interface for abnormal crowd behaviour analysis in surveillance videos[J]. Multimedia Tools and Applications, 2020, 79(25-26): 17579-17617.

Digital Library

[2]

Liu ZH, Luo YZ. Intelligent video supervising technologies and their applications in security[J]. Ordnance Industry Automation. 2009; 4:7578.

[3]

Qianyin J, Guoming L, Jinwei Y, et al. A model based method of pedestrian abnormal behavior detection in traffic scene[C]//2015 IEEE First International Smart Cities Conference (ISC2). IEEE, 2015: 1-6.

[4]

Lentzas A, Vrakas D. Non-intrusive human activity recognition and abnormal behavior detection on elderly people: A review[J]. Artificial Intelligence Review, 2020, 53(3): 1975-2021.

[5]

Wang H, Schmid C. Action recognition with improved trajectories[C]//Proceedings of the IEEE international conference on computer vision. 2013: 3551-3558.

[6]

Lu J, Yan W Q, Nguyen M. Human behaviour recognition using deep learning[C]//2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2018: 1-6.

[7]

Tran D, Bourdev L, Fergus R, et al. Learning spatiotemporal features with 3d convolutional networks[C]//Proceedings of the IEEE international conference on computer vision. 2015: 4489-4497.

[8]

Carreira J, Zisserman A. Quo vadis, action recognition? a new model and the kinetics dataset[C]//proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 6299-6308.

[9]

Qiu Z, Yao T, Mei T. Learning spatio-temporal representation with pseudo-3d residual networks[C]//proceedings of the IEEE International Conference on Computer Vision. 2017: 5533-5541.

[10]

Tran D, Wang H, Torresani L, et al. A closer look at spatiotemporal convolutions for action recognition[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018: 6450-6459.

[11]

Han K, Wang Y, Tian Q, et al. Ghostnet: More features from cheap operations[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 1580-1589.

[12]

Wang X, Girshick R, Gupta A, et al. Non-local neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7794-7803.

[13]

Bertasius G, Wang H, Torresani L. Is space-time attention all you need for video understanding? [C]//ICML. 2021, 2(3): 4.

[14]

Xie C, Wu Y, Maaten L, et al. Feature denoising for improving adversarial robustness[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 501-509.

[15]

Tai C, Xiao T, Zhang Y, et al. Convolutional neural networks with low-rank regularization[J]. arXiv preprint arXiv:1511.06067, 2015.

[16]

Huang Z, Wang X, Huang L, et al. Ccnet: Criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 603-612.

[17]

Kumawat S, Verma M, Nakashima Y, et al. Depthwise spatio-temporal STFT convolutional neural networks for human action recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(9): 4839-4851.

[18]

Wang M, Xing J, Liu Y. Actionclip: A new paradigm for video action recognition [J]. arXiv preprint arXiv: 2109.08472, 2021.

[19]

Diba A, Fayyaz M, Sharma V, et al. Temporal 3d convnets: New architecture and transfer learning for video classification [J]. arXiv preprint arXiv:1711.08200, 2017.

[20]

Wang L, Xiong Y, Wang Z, et al. Temporal segment networks: Towards good practices for deep action recognition[C]//European conference on computer vision. Springer, Cham, 2016: 20-36.

Index Terms

Research on Human Behavior Recognition Based on 3D-Ghostnet
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Human Action Recognition using Pre-trained Convolutional Neural Networks
VSIP '20: Proceedings of the 2020 2nd International Conference on Video, Signal and Image Processing

Recognition of human action is one of the challenges in the field of artificial intelligence. Deep learning model has become a research issue in action recognition applications due to its ability to outperform traditional machine learning approaches. ...
Multi-stream with Deep Convolutional Neural Networks for Human Action Recognition in Videos
Neural Information Processing
Abstract
Recently, convolutional neural networks (CNNs) have been extensively applied for human action recognition in videos with the fusion of appearance and motion information by two-stream network. However, for human action recognition in videos, the ...
Weakly-supervised temporal attention 3D network for human action recognition
Highlights
- We propose weakly-supervised temporal attention 3D network for human action recognition, called TA3DNet.
Abstract
From a series of observations, we have inferred that human actions in videos are defined by a set of significant frames. In this paper, we propose a weakly-supervised temporal attention 3D network for human action recognition, called ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

IoTML '24: Proceedings of the 2024 4th International Conference on Internet of Things and Machine Learning

August 2024

443 pages

ISBN:9798400710353

DOI:10.1145/3697467

Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

IoTML 2024

IoTML 2024: 2024 4th International Conference on Internet of Things and Machine Learning

August 9 - 11, 2024

Nanchang, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
24
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)11

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Table of Conten