A resource conscious human action recognition framework using 26-layered deep convolutional neural network

Khan, Muhammad Attique; Zhang, Yu-Dong; Khan, Sajid Ali; Attique, Muhammad; Rehman, Amjad; Seo, Sanghyun

doi:10.1007/s11042-020-09408-1

A resource conscious human action recognition framework using 26-layered deep convolutional neural network

Published: 01 August 2020

Volume 80, pages 35827–35849, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Muhammad Attique Khan¹,
Yu-Dong Zhang²,
Sajid Ali Khan³,
Muhammad Attique⁴,
Amjad Rehman⁵ &
…
Sanghyun Seo ORCID: orcid.org/0000-0002-4824-3517⁶

2654 Accesses
56 Citations
62 Altmetric
2 Mentions
Explore all metrics

Abstract

Vision-based human action recognition (HAR) is a hot topic of research from the decade due to a few popular applications such as visual surveillance and robotics. For correct action recognition, various local and global points are requires known as features. These features modified during the variation in human movement. But due to a bit change in several human actions, the features of these actions are mixed that degrade the recognition performance. In this article, we design a new 26-layered Convolutional Neural Network (CNN) architecture for accurate complex action recognition. The features are extracted from the global average pooling layer and fully connected (FC) layer, and fused by a proposed high entropy-based approach. Further, we propose a feature selection method name Poisson distribution along with Univariate Measures (PDaUM). Few of fused CNN features are irrelevant, and few of them are redundant that makes the incorrect prediction among complex human actions. Therefore, the proposed PDaUM based approach selects only the strongest features that later passed to the Extreme Learning Machine (ELM) and Softmax for final recognition. Four datasets are using for experimental analysis - HMDB51 (51 classes), UCF Sports (10 classes), KTH (6 classes), and Weizmann (10 classes). On these datasets, the ELM classifier gives an improved performance as compared to a Softmax classifier. The achieved accuracy on each dataset is 81.4%, 99.2%, 98.3%, and 98.7%, respectively. Comparison with existing techniques, it is shown that the proposed architecture gives better performance in terms of accuracy and testing time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

CBAM: Convolutional Block Attention Module

A review of object detection based on deep learning

Article 12 June 2020

References

Arshad H, Khan MA, Sharif M, Yasmin M, Javed MY (2019) Multi-level features fusion and selection for human gait recognition: an optimized framework of Bayesian model and binomial distribution. Int J Mach Learn Cybern 10:3601–3618
Article Google Scholar
S Asghari-Esfeden, M Sznaier, O Camps (2020) Dynamic Motion Representation for Human Action Recognition. IEEE Winter Conf Appl Comput Vis 557–566
Aurangzeb K, Haider I, Khan MA, Saba T, Javed K, Iqbal T, Rehman A, Ali H, Sarfraz MS (2019) Human behavior analysis based on multi-types features fusion and Von Nauman entropy based features reduction. J Med Imaging Health Inform 9:662–669
Article Google Scholar
Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. Tenth IEEE Int Conf Comput Vis (ICCV'05) 1:1395–1402
Article Google Scholar
S Chen, Y Shen, Y Yan, D Wang, S Zhu (2020) Cholesky decomposition based metric learning for video-based human action recognition, IEEE Access
Dai C, Liu X, Lai J (2020) Human action recognition using two-stream attention based LSTM networks. Appl Soft Comput 86:105820
Article Google Scholar
Gu Y, Ye X, Sheng W, Ou Y, Li Y (2020) Multiple stream deep learning model for human action recognition. Image Vis Comput 93:103818
Article Google Scholar
S Hiriyannaiah, B Akanksh, A Koushik, G Siddesh, K Srinivasa (2020) Deep Learning for Multimedia Data in IoT. Multimed Big Data Comput IoT Appl, ed: Springer, 101–129
Huang G-B, Zhou H, Ding X, Zhang R (2011) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst, Man, Cybernet, Part B (Cybernetics) 42:513–529
Article Google Scholar
Hussain N, Khan MA, Sharif M, Khan SA, Albesher AA, Saba T et al (2020) A deep neural network and classical features based scheme for objects recognition: an application for machine inspection. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-08852-3
Huynh-The T, Hua C-H, Ngo T-T, Kim D-S (2020) Image representation of pose-transition feature for 3D skeleton-based action recognition. Inf Sci 513:112–126
Article Google Scholar
Khan M, Akram T, Sharif M, Muhammad N, Javed M, Naqvi S (2019) An improved strategy for human action recognition; experiencing a cascaded design. IET Image Process
Khan MA, Akram T, Sharif M, Javed MY, Muhammad N, Yasmin M (2019) An implementation of optimized framework for action classification using multilayers neural network on selected fused features. Pattern Anal Applic 22:1377–1397
Article MathSciNet Google Scholar
Khan MA, Javed K, Khan SA, Saba T, Habib U, Khan JA et al (2020) Human action recognition using fusion of multiview and deep features: an application to video surveillance. Multimed Tools Appl:1–27
S Kulkarni, S Jadhav, D Adhikari (2020) A Survey on Human Group Activity Recognition by Analysing Person Action from Video Sequences Using Machine Learning Techniques. Optim Mach Learn Appl, ed: Springer, 141–153
X Long, C Gan, G De Melo, J Wu, X Liu, S Wen (2018) Attention clusters: Purely attention based local feature integration for video classification," in Proc IEEE Conf Comput Vis Patt Recog 7834–7843
P-E Martin, J Benois-Pineau, R Péteri, J Morlier (2020) Fine grained sport action recognition with twin spatio-temporal convolutional neural networks: application to table tennis. Multimed Tools Appl 1–19
Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2:1
Article Google Scholar
Nazir S, Yousaf MH, Nebel J-C, Velastin SA (2018) A bag of expression framework for improved human action recognition. Pattern Recogn Lett 103:39–45
Article Google Scholar
Ouyang X, Xu S, Zhang C, Zhou P, Yang Y, Liu G, Li X (2019) A 3D-CNN and LSTM based multi-task learning architecture for action recognition. IEEE Access 7:40757–40770
Article Google Scholar
T Ozcan, A Basturk (2020) Human action recognition with deep learning and structural optimization using a hybrid heuristic algorithm. Clust Comput 1–14
MD Rodriguez, J Ahmed, M Shah (2008) Action mach a spatio-temporal maximum average correlation height filter for action recognition. 2008 IEEE Conf Comput Vis Patt Recog 1–8
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. Proc 17th Int Conf Patt Recog, 2004 ICPR 2004:32–36
Article Google Scholar
Sharif M, Khan MA, Akram T, Javed MY, Saba T, Rehman A (2017) A framework of human detection and action recognition based on uniform segmentation and combination of Euclidean distance and joint entropy-based features selection. EURASIP J Image Video Proc 2017:89
Article Google Scholar
Sharif A, Khan MA, Javed K, Gulfam H, Iqbal T, Saba T et al (2019) Intelligent human action recognition: a framework of optimal features selection based on Euclidean distance and strong correlation. J Control Eng Appl Inform 21:3–11
Google Scholar
Sharif M, Attique M, Tahir MZ, Yasmim M, Saba T, Tanik UJ (2020) A Machine Learning Method with Threshold Based Parallel Feature Fusion and Feature Selection for Automated Gait Recognition. J Organ End User Comput (JOEUC) 32:67–92
Article Google Scholar
Sharif M, Akram T, Raza M, Saba T, Rehman A (2020) Hand-crafted and deep convolutional neural network features fusion and selection strategy: an application to intelligent human action recognition. Appl Soft Comput 87:105986
Article Google Scholar
Sharif M, Khan MA, Zahid F, Shah JH, Akram T (2020) Human action recognition: a framework of statistical weighted segmentation and rank correlation-based selection. Pattern Anal Applic 23:281–294
Article Google Scholar
Siddiqui S, Khan MA, Bashir K, Sharif M, Azam F, Javed MY (2018) Human action recognition: a construction of codebook by discriminative features selection approach. Int J Appl Patt Recog 5:206–228
Google Scholar
K Simonyan, A Zisserman (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Proces Syst, 568–576
K Soomro, AR Zamir, M Shah (2012) UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402
Stoian A, Ferecatu M, Benois-Pineau J, Crucianu M (2015) Fast action localization in large-scale video archives. IEEE Trans Circ Syst Video Technol 26:1917–1930
Article Google Scholar
L Sun, K Jia, D-Y Yeung, BE Shi (2015) Human action recognition using factorized spatio-temporal convolutional networks. Proc IEEE Int Conf Comput Vis 4597–4605
Tu NA, Huynh-The T, Khan KU, Lee Y-K (2018) ML-HDP: a hierarchical Bayesian nonparametric model for recognizing human actions in video. IEEE Trans Circ Syst Video Technol 29:800–814
Article Google Scholar
Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW (2017) Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access 6:1155–1166
Article Google Scholar
Varol G, Laptev I, Schmid C (2017) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40:1510–1517
Article Google Scholar
Vishwakarma DK (2020) A two-fold transformation model for human action recognition using decisive pose. Cogn Syst Res 61:1–13
Article Google Scholar
L Wang, Y Qiao, X Tang (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. Proceedings of the IEEE conference on computer vision and pattern recognition 4305–4314
L Wang, Y Xiong, Z Wang, Y Qiao, D Lin, X Tang, et al. (2016) Temporal segment networks: Towards good practices for deep action recognition. Eur Conf Comput Vis 20–36
J Wang, X Peng, Y Qiao (2020) Cascade multi-head attention networks for action recognition. Comput Vis Image Understanding 102898
Xiong Q, Zhang J, Wang P, Liu D, Gao RX (2020) Transferable two-stream convolutional neural network for human action recognition. J Manuf Syst
Yi Y, Li A, Zhou X (2020) Human action recognition based on action relevance weighted encoding. Signal Process Image Commun 80:115640
Article Google Scholar
Yudistira N, Kurita T (2020) Correlation net: spatiotemporal multimodal deep learning for action recognition. Signal Process Image Commun 82:115731
Article Google Scholar
Zhang H-B, Zhang Y-X, Zhong B, Lei Q, Yang L, Du J-X et al (2019) A comprehensive survey of vision-based human action recognition methods. Sensors 19:1005
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2019R1F1A1058715).

Author information

Authors and Affiliations

Department of Computer Science, HITEC University, Museum Road, Taxila, Pakistan
Muhammad Attique Khan
Department of Informatics, University of Leicester, Leicester, LE1 7RH, UK
Yu-Dong Zhang
Department of Software Engineering, Foundation University Islamabad, Islamabad, Pakistan
Sajid Ali Khan
Department of Software, Sejong University, Seoul, South Korea
Muhammad Attique
Department of Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia
Amjad Rehman
School of Computer Art, College of Art and Technology, Chung-Ang University, Anseong, Republic of Korea
Sanghyun Seo

Authors

Muhammad Attique Khan
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Dong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Sajid Ali Khan
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Attique
View author publications
You can also search for this author in PubMed Google Scholar
Amjad Rehman
View author publications
You can also search for this author in PubMed Google Scholar
Sanghyun Seo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanghyun Seo.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khan, M.A., Zhang, YD., Khan, S.A. et al. A resource conscious human action recognition framework using 26-layered deep convolutional neural network. Multimed Tools Appl 80, 35827–35849 (2021). https://doi.org/10.1007/s11042-020-09408-1

Download citation

Received: 25 February 2020
Revised: 08 July 2020
Accepted: 21 July 2020
Published: 01 August 2020
Issue Date: November 2021
DOI: https://doi.org/10.1007/s11042-020-09408-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A resource conscious human action recognition framework using 26-layered deep convolutional neural network

Abstract

Access this article

Similar content being viewed by others

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

CBAM: Convolutional Block Attention Module

A review of object detection based on deep learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A resource conscious human action recognition framework using 26-layered deep convolutional neural network

Abstract

Access this article

Similar content being viewed by others

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

CBAM: Convolutional Block Attention Module

A review of object detection based on deep learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation