skip to main content
10.1145/3442705.3442710acmotherconferencesArticle/Chapter ViewAbstractPublication PagesvsipConference Proceedingsconference-collections
research-article

Human Action Recognition using Pre-trained Convolutional Neural Networks

Published: 21 March 2021 Publication History

Abstract

Recognition of human action is one of the challenges in the field of artificial intelligence. Deep learning model has become a research issue in action recognition applications due to its ability to outperform traditional machine learning approaches. The Convolutional Neural Network is one of the architectures commonly used in most action recognition works. There are different models in the Convolutional Neural Network, but no study has been done to evaluate which model has the best performance in understanding human actions. Thus, in this paper, we compare the performance of two separate pre-trained models of deep Convolutional Neural Network in classifying the human actions to identify the different behaviours. GoogleNet and AlexNet are the used two models with fine-tuned parameters used for comparison, in addition, to use Long-Short Term Memory for the video's labels prediction. The paper's main contribution is that it offers a performance analysis of two separate fine-tuned deep CNN pre-trained models compared to the results of other recently proposed human action recognition methods applied on KTH, Weizmann, UCF11(YouTube actions) and UCF-Sports datasets.

References

[1]
N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Jul. 2005, pp. 886–893.
[2]
R. Chaudhry, A. Ravichandran, G. Hager and R. Vida, "Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, Jun. 2009, pp. 1932–1939.
[3]
H. Schmid and C. Wang, "Action recognition with improved trajectories," in n Proc. IEEE Int. Conf. Comput. Vis, Dec. 2013, pp. 3551–3558.
[4]
B. Leng, X. Zhang, M. Yao and Z. Xiong, "A 3D model recognition mechanism based on deep Boltzmann machines," in Neurocomputing, vol. 151, pp. 593–602, Mar. 2015.
[5]
A. Krizhevsky, I. Sutskever and G. E. Hinton, "Imagenet classification with deep convolutional neural network," in n Proc. Adv. Neural Inf. Process. Syst, 2012, pp. 1097–1105.
[6]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, "Going deeper with convolutions," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015,pp. 1–9.
[7]
K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image recognition," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.
[8]
S. Schmidhuber and J. Hochreiter, "Long short-term memory," in Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[9]
M. Koohzadi and N. Charkari, "Survey on deep learning methods in human action recognition," in IET Computer Vision, 2017; 11: 623-632.
[10]
M. Saufi, M. Zamanhuri, N. Mohammad and Z. Ibrahim, "Deep Learning for Roman Handwritten Character Recognition," Indonesian Journal of Electrical Engineering and Computer Science, vol. 2, no. 12, pp. 455-460, 2018.
[11]
N. Kasim, N. Rahman, Z. Ibrahim and N. N. Abu Mangshor, "Celebrity Face Recognition using Deep Learning," Indonesian Journal of Electrical Engineering and Computer Science, vol. 2, no. 12, pp. 476-481, 2018.
[12]
C. Feichtenhofer, A. Pinz and R. P. Wildes, "Spatiotemporal multiplier networks for video action recognition," in Proc. IEEE Conf. Comput. Vis., Jul. 2017,pp. 4768–4777.
[13]
A. Kar, N. Rai, K. Sikka and G. Sharma, "AdaScan: Adaptive scan pooling in deep convolutional neural networks for human action recognition in videos," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 3376–3385.
[14]
F. Tapamo and J.-R. Osayamwen, "Deep learning class discrimination based on prior probability for human activity recognition," IEEE Access, vol. 7, p. 14747–14756, 2019.
[15]
Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu and M. Lew, "Deep learning for visual understanding: A review," in Neurocomputing, 2016, pp. 27-48.
[16]
V. A. Chenarlogh, H. Jond and J. Platoš, "A Robust Deep Model for Human Action Recognition," in International Conference on Telecommunications and Signal Processing (TSP), 2020.
[17]
B. Sukrit, S. Vaibhav, K. Pawan, S. Ram and B. Debotosh, "SV-NET: A Deep Learning Approach to Video Based Human Activity Recognition," in Proceedings of the 11th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2019), 2020.
[18]
A. Nadeem, A. Jalal and K. Kim, "Human Actions Tracking and Recognition Based on Body Parts Detection via Artificial Neural Network," in 2020 3rd International Conference on Advancements in Computational Sciences (ICACS), Lahore, Pakistan, 2020, pp. 1-6.
[19]
P. Gao, D. Zhao and X. Chen, "Multi-dimensional data modelling of video image action recognition and motion capture in deep learning framework," IET Image Processing, vol. 14, no. 7, p. p. 1257 – 1264, 29 May 2020.

Cited By

View all
  • (2024)Transfer Learning Models for CNN Fusion With Fisher Vector for Codebook Optimization of Foreground FeaturesIEEE Access10.1109/ACCESS.2023.333957512(5648-5658)Online publication date: 2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
VSIP '20: Proceedings of the 2020 2nd International Conference on Video, Signal and Image Processing
December 2020
108 pages
ISBN:9781450388931
DOI:10.1145/3442705
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Action recognition
  2. AlexNet
  3. Convolutional neural network (CNN)
  4. Deep learning
  5. GoogleNet
  6. Long-Short Term Memory (LSTM)

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

VSIP '20

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Transfer Learning Models for CNN Fusion With Fisher Vector for Codebook Optimization of Foreground FeaturesIEEE Access10.1109/ACCESS.2023.333957512(5648-5658)Online publication date: 2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media