research-article

Human Action Recognition using Pre-trained Convolutional Neural Networks

Authors:

Fazly Salleh Abas,

Hock Ann GohAuthors Info & Claims

VSIP '20: Proceedings of the 2020 2nd International Conference on Video, Signal and Image Processing

Pages 30 - 34

https://doi.org/10.1145/3442705.3442710

Published: 21 March 2021 Publication History

Abstract

Recognition of human action is one of the challenges in the field of artificial intelligence. Deep learning model has become a research issue in action recognition applications due to its ability to outperform traditional machine learning approaches. The Convolutional Neural Network is one of the architectures commonly used in most action recognition works. There are different models in the Convolutional Neural Network, but no study has been done to evaluate which model has the best performance in understanding human actions. Thus, in this paper, we compare the performance of two separate pre-trained models of deep Convolutional Neural Network in classifying the human actions to identify the different behaviours. GoogleNet and AlexNet are the used two models with fine-tuned parameters used for comparison, in addition, to use Long-Short Term Memory for the video's labels prediction. The paper's main contribution is that it offers a performance analysis of two separate fine-tuned deep CNN pre-trained models compared to the results of other recently proposed human action recognition methods applied on KTH, Weizmann, UCF11(YouTube actions) and UCF-Sports datasets.

References

[1]

N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Jul. 2005, pp. 886–893.

[2]

R. Chaudhry, A. Ravichandran, G. Hager and R. Vida, "Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, Jun. 2009, pp. 1932–1939.

[3]

H. Schmid and C. Wang, "Action recognition with improved trajectories," in n Proc. IEEE Int. Conf. Comput. Vis, Dec. 2013, pp. 3551–3558.

[4]

B. Leng, X. Zhang, M. Yao and Z. Xiong, "A 3D model recognition mechanism based on deep Boltzmann machines," in Neurocomputing, vol. 151, pp. 593–602, Mar. 2015.

[5]

A. Krizhevsky, I. Sutskever and G. E. Hinton, "Imagenet classification with deep convolutional neural network," in n Proc. Adv. Neural Inf. Process. Syst, 2012, pp. 1097–1105.

[6]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, "Going deeper with convolutions," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015,pp. 1–9.

[7]

K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image recognition," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.

[8]

S. Schmidhuber and J. Hochreiter, "Long short-term memory," in Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.

Digital Library

[9]

M. Koohzadi and N. Charkari, "Survey on deep learning methods in human action recognition," in IET Computer Vision, 2017; 11: 623-632.

[10]

M. Saufi, M. Zamanhuri, N. Mohammad and Z. Ibrahim, "Deep Learning for Roman Handwritten Character Recognition," Indonesian Journal of Electrical Engineering and Computer Science, vol. 2, no. 12, pp. 455-460, 2018.

[11]

N. Kasim, N. Rahman, Z. Ibrahim and N. N. Abu Mangshor, "Celebrity Face Recognition using Deep Learning," Indonesian Journal of Electrical Engineering and Computer Science, vol. 2, no. 12, pp. 476-481, 2018.

[12]

C. Feichtenhofer, A. Pinz and R. P. Wildes, "Spatiotemporal multiplier networks for video action recognition," in Proc. IEEE Conf. Comput. Vis., Jul. 2017,pp. 4768–4777.

[13]

A. Kar, N. Rai, K. Sikka and G. Sharma, "AdaScan: Adaptive scan pooling in deep convolutional neural networks for human action recognition in videos," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 3376–3385.

[14]

F. Tapamo and J.-R. Osayamwen, "Deep learning class discrimination based on prior probability for human activity recognition," IEEE Access, vol. 7, p. 14747–14756, 2019.

[15]

Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu and M. Lew, "Deep learning for visual understanding: A review," in Neurocomputing, 2016, pp. 27-48.

Digital Library

[16]

V. A. Chenarlogh, H. Jond and J. Platoš, "A Robust Deep Model for Human Action Recognition," in International Conference on Telecommunications and Signal Processing (TSP), 2020.

[17]

B. Sukrit, S. Vaibhav, K. Pawan, S. Ram and B. Debotosh, "SV-NET: A Deep Learning Approach to Video Based Human Activity Recognition," in Proceedings of the 11th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2019), 2020.

[18]

A. Nadeem, A. Jalal and K. Kim, "Human Actions Tracking and Recognition Based on Body Parts Detection via Artificial Neural Network," in 2020 3rd International Conference on Advancements in Computational Sciences (ICACS), Lahore, Pakistan, 2020, pp. 1-6.

[19]

P. Gao, D. Zhao and X. Chen, "Multi-dimensional data modelling of video image action recognition and motion capture in deep learning framework," IET Image Processing, vol. 14, no. 7, p. p. 1257 – 1264, 29 May 2020.

Cited By

Kamaleldin MAbu-Bakar SSheikh U(2024)Transfer Learning Models for CNN Fusion With Fisher Vector for Codebook Optimization of Foreground FeaturesIEEE Access10.1109/ACCESS.2023.333957512(5648-5658)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2023.3339575

Recommendations

Bangla Handwritten Digit Recognition Using Deep Convolutional Neural Network
ICCA 2020: Proceedings of the International Conference on Computing Advancements

Handwritten Bangla digit recognition is one of the most challenging computer vision problems due to its diverse shapes and writing style. Recently deep learning based convolutional neural network known as deep CNN finds wide-spread applications in ...
A High-Performance Reconfigurable Accelerator for Convolutional Neural Networks
ICMSSP '18: Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing

In this paper, we propose a new high-performance accelerator that supports a variety of convolutional neural networks (CNNs) such as GoogLeNet, ResNet and AlexNet. The proposed accelerator mainly includes 24 parallel PEs (processing engines) for ...
Benchmarking deep learning techniques for face recognition
Highlights
- Training networks for face recognition is very complex and time-consuming.
- Multiple factors need to be considered: deep learning frameworks, GPU platforms, deep network models, and datasets.
- We compare three deep learning ...
Abstract
Recent progresses in Convolutional Neural Networks (CNNs) and GPUs have greatly advanced the state-of-the-art performance for face recognition. However, training CNNs for face recognition is complex and time-consuming. Multiple factors need to be ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

VSIP '20: Proceedings of the 2020 2nd International Conference on Video, Signal and Image Processing

December 2020

108 pages

ISBN:9781450388931

DOI:10.1145/3442705

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

VSIP '20

VSIP '20: 2020 2nd International Conference on Video, Signal and Image Processing

December 4 - 6, 2020

Jakarta, Indonesia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
140
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kamaleldin MAbu-Bakar SSheikh U(2024)Transfer Learning Models for CNN Fusion With Fisher Vector for Codebook Optimization of Foreground FeaturesIEEE Access10.1109/ACCESS.2023.333957512(5648-5658)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2023.3339575

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten