Iv3-MGRUA: a novel human action recognition features extraction using Inception v3 and video behaviour prediction using modified gated recurrent units with attention mechanism model

Jayamohan, M.; Yuvaraj, S.

doi:10.1007/s11760-024-03726-9

Iv3-MGRUA: a novel human action recognition features extraction using Inception v3 and video behaviour prediction using modified gated recurrent units with attention mechanism model

Original Paper
Published: 15 December 2024

Volume 19, article number 134, (2025)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

M. Jayamohan¹ &
S. Yuvaraj¹

351 Accesses
Explore all metrics

Abstract

Video analytics has become an essential tool for improving security monitoring by automating the tedious task of manually reviewing large CCTV footage. Despite advancements in this field, the accurate recognition of human actions in videos remains challenging because of the complex nature of actions, varied backgrounds, and different camera angles. To address these difficulties, we developed a novel action recognition model that integrates an attention mechanism with a modified Gated Recurrent Unit (GRU) architecture. Our approach leverages Inception v3 for feature extraction, which is combined with an attention mechanism that focuses on the most critical portions of the input sequence. This allows the model to better identify the key aspects of the video, thereby enhancing the precision of action recognition. The attention-enhanced features were further processed by the modified GRU, which utilizes an attention mechanism to categorize video behaviors more effectively, particularly for complex video sequences. To validate the effectiveness of our model, we conducted extensive tests on two well-known and challenging datasets, the Human Metabolome Database (HMDB51) and the University of Central Florida (UCF101). The results showed that our model achieved notable accuracy rates of 75.32% for HMDB51 and 96.82% for UCF101, demonstrating its capability to address the complexities of human action recognition in videos. These results highlight the potential of our approach for advancing the state-of-the-art video analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Study of Human Action Recognition in Videos with Long Short-Term Memory Model

A Hybrid Architecture for Action Recognition in Videos Using Deep Learning

Learning hierarchical video representation for action recognition

Article 15 February 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

No datasets were generated or analysed during the current study.

References

Vishwakarma, D.K., Singh, T.: A visual cognizance based multi-resolution descriptor for human action recognition using key pose. AEU Int. J. Electron. Commun. 107, 157–169 (2019). https://doi.org/10.1016/j.aeue.2019.05.023
Article MATH Google Scholar
Manoharan, J., Yuvaraj, S.: Enhanced hand gesture recognition using optimized preprocessing and VGG16-based deep learning model. In: 2024 10th International Conference on Communication and Signal Processing (ICCSP), pp. 1101–1105. IEEE (2024).
Singh, T., Vishwakarma, D.K.: Video benchmarks of human action datasets: a review. Artif. Intell. Rev. 52, 1107–1154 (2019)
Article MATH Google Scholar
Jayamohan, M., Yuvaraj, S.: Video-based action recognition of spatial and temporal deep learning models. In: International Conference on Advances in Data-driven Computing and Intelligent Systems, pp. 379–391. Singapore: Springer Nature Singapore (2023).
Zhou, A., Ma, Y., Ji, W., Zong, M., Yang, P., Min, Wu., Liu, M.: Multi-head attention-based two-stream EfficientNet for action recognition. Multimed. Syst. 29(2), 487–498 (2023)
Article MATH Google Scholar
Wang, K., Boonpratatong, A., Chen, W., Ren, L., Wei, G., Qian, Z., Zhao, D.: The fundamental property of human leg during walking: linearity and nonlinearity. IEEE Trans. Neural Syst. Rehabil. Eng. 31, 4871–4881 (2023). https://doi.org/10.1109/TNSRE.2023.3339801
Article Google Scholar
Li, Y., Song, L., Hu, Y., Lee, H., Wu, D., Rehm, P.J., Lu, N.: Load profile inpainting for missing load data restoration and baseline estimation. IEEE Trans. Smart Grid 15(2), 2251–2260 (2024). https://doi.org/10.1109/TSG.2023.3293188
Article MATH Google Scholar
Cai, G., Zheng, X., Guo, J., Gao, W.: Real-time identification of borehole rescue environment situation in underground disaster areas based on multi-source heterogeneous data fusion. Saf. Sci. 181, 106690 (2025). https://doi.org/10.1016/j.ssci.2024.106690
Article Google Scholar
Pan, H., Tong, S., Wei, X., Teng, B.: Fatigue state recognition system for miners based on a multi-modal feature extraction and fusion framework. IEEE Trans. Cogn. Dev. Syst. (2024). https://doi.org/10.1109/TCDS.2024.3461713
Article MATH Google Scholar
Li, J., Zhang, C., Liu, Z., Hong, R., Hu, H.: Optimal volumetric video streaming with hybrid saliency based tiling. IEEE Trans. Multimedia 25, 2939–2953 (2023). https://doi.org/10.1109/TMM.2022.3153208
Article MATH Google Scholar
Li, J., Han, L., Zhang, C., Li, Q., Liu, Z.: Spherical convolution empowered viewport prediction in 360 video multicast with limited FoV feedback. ACM Trans. Multimedia Comput. Commun. Appl. (2023). https://doi.org/10.1145/3511603
Article MATH Google Scholar
Zhou, Y., Xie, J., Zhang, X., Wu, W., Kwong, S.: Energy-efficient and interpretable multisensor human activity recognition via deep fused lasso net. IEEE Trans. Emerg. Top. Comput. Intell. 8(5), 3576–3588 (2024). https://doi.org/10.1109/TETCI.2024.3430008
Article MATH Google Scholar
Zhang, X., Hou, D., Xiong, Z., Liu, Y., Wang, S., Li, Y.: EALLR: energy-aware low-latency routing data driven model in mobile edge computing. IEEE Trans. Consum. Electron. (2024). https://doi.org/10.1109/TCE.2024.3507158
Article Google Scholar
Wang, B., Wang, Z., Song, Y., Zong, W., Zhang, L., Ji, K., Dai, Z.: A neural coordination strategy for attachment and detachment of a climbing robot inspired by gecko locomotion. Cyborg Bionic Syst. (2023). https://doi.org/10.34133/cbsystems.0008
Article MATH Google Scholar
Gu, X., Ren, H.: A survey of transoral robotic mechanisms: distal dexterity, variable stiffness, and triangulation. Cyborg Bion. Syst. (2023). https://doi.org/10.34133/cbsystems.0007
Article MATH Google Scholar
Wang, Y., Chen, H., Law, J., Du, X., Yu, J.: Ultrafast miniature robotic swimmers with upstream motility. Cyborg Bionic Syst. (2023). https://doi.org/10.34133/cbsystems.0015
Article Google Scholar
Gao, Q., Deng, Z., Ju, Z., Zhang, T.: Dual-hand motion capture by using biological inspiration for bionic bimanual robot teleoperation. Cyborg Bionic Syst. 4, 52 (2023). https://doi.org/10.34133/cbsystems.0052
Article Google Scholar
He, S., Luo, H., Jiang, W., Jiang, X., Ding, H.: VGSG: vision-guided semantic-group network for text-based person search. IEEE Trans. Image Process. 33, 163–176 (2024). https://doi.org/10.1109/TIP.2023.3337653
Article MATH Google Scholar
Hu, C., Zhao, C., Shao, H., Deng, J., Wang, Y.: TMFF: trustworthy multi-focus fusion framework for multi-label sewer defect classification in sewer inspection videos. IEEE Trans. Circuits Syst. Video Technol. (2024). https://doi.org/10.1109/TCSVT.2024.3433415
Article Google Scholar
Song, W., Wang, X., Jiang, Y., Li, S., Hao, A., Hou, X., Qin, H.: Expressive 3D facial animation generation based on local-to-global latent diffusion. IEEE Trans. Vis. Comput. Gr. 30(11), 7397–7407 (2024). https://doi.org/10.1109/TVCG.2024.3456213
Article Google Scholar
Jiang, H., Ji, P., Zhang, T., Cao, H., Liu, D.: Two-factor authentication for keyless entry system via finger-induced vibrations. IEEE Trans. Mob. Comput. 23(10), 9708–9720 (2024). https://doi.org/10.1109/TMC.2024.3368331
Article MATH Google Scholar
Yang, J., Yang, F., Zhou, Y., Wang, D., Li, R., Wang, G., Chen, W.: A data-driven structural damage detection framework based on parallel convolutional neural network and bidirectional gated recurrent unit. Inf. Sci. 566, 103–117 (2021). https://doi.org/10.1016/j.ins.2021.02.064
Article MATH Google Scholar
Yang, J., Zhang, L., Chen, C., Li, Y., Li, R., Wang, G., Zeng, Z.: A hierarchical deep convolutional neural network and gated recurrent unit framework for structural damage detection. Inf. Sci. 540, 117–130 (2020). https://doi.org/10.1016/j.ins.2020.05.090
Article MATH Google Scholar
Guo, T., Yuan, H., Hamzaoui, R., Wang, X., Wang, L.: Dependence-based coarse-to-fine approach for reducing distortion accumulation in G-PCC attribute compression. IEEE Trans. Industr. Inf. (2024). https://doi.org/10.1109/TII.2024.3403262
Article MATH Google Scholar
Yu, S., Guan, D., Gu, Z., Guo, J., Liu, Z., Liu, Y.: Radar Target complex high-resolution range profile modulation by external time coding metasurface. IEEE Trans. Microw. Theory Tech. 72(10), 6083–6093 (2024). https://doi.org/10.1109/TMTT.2024.3385421
Article MATH Google Scholar
Cheng, D., Chen, L., Lv, C., Guo, L., Kou, Q.: Light-guided and cross-fusion U-net for anti-illumination image super-resolution. IEEE Trans. Circuits Syst. Video Technol. 32(12), 8436–8449 (2022). https://doi.org/10.1109/TCSVT.2022.3194169
Article MATH Google Scholar
Xing, J., Yuan, H., Hamzaoui, R., Liu, H., Hou, J.: GQE-net: a graph-based quality enhancement network for point cloud color attribute. IEEE Trans. Image Process. 32, 6303–6317 (2023). https://doi.org/10.1109/TIP.2023.3330086
Article MATH Google Scholar
Sun, Y., Peng, Z., Hu, J., Ghosh, B.K.: Event-triggered critic learning impedance control of lower limb exoskeleton robots in interactive environments. Neurocomputing 564, 126963 (2024). https://doi.org/10.1016/j.neucom.2023.126963
Article MATH Google Scholar
Gu, X., Chen, X., Lu, P., Lan, X., Li, X., Du, Y.: SiMaLSTM-SNP: novel semantic relatedness learning model preserving both Siamese networks and membrane computing. J. Supercomput. 80(3), 3382–3411 (2024). https://doi.org/10.1007/s11227-023-05592-7
Article MATH Google Scholar
Karpagalakshmi, R., Rani, D., Magendiran, N., Manikandan, A.: An energy-efficient bio-inspired mobility-aware cluster p-WOA algorithm for intelligent whale optimization and fuzzy-logic-based zonal clustering algorithm in FANET. Int. J. Comput. Intell. Syst. (2024). https://doi.org/10.1007/s44196-024-006515
Article Google Scholar
Lu, L., Zhang, C., Cao, K., Deng, T., Yang, Q.: A multichannel CNN-GRU model for human activity recognition. IEEE Access 10, 66797–66810 (2022)
Article Google Scholar
Annamalai, M., Muthiah, P.: An Early prediction of tumor in heart by cardiac masses classification in echocardiogram images using robust back propagation neural network classifier. Braz. Arch. Biol. Technol. (2022). https://doi.org/10.1590/1678-4324-2022210316
Article MATH Google Scholar
Jayamohan, M., Yuvaraj, S., Vijayakumar. P.: Review of video analytics method for video surveillance. In: 2021 4th International Conference on Recent Trends in Computer Science and Technology (ICRTCST), pp. 43–47. IEEE (2022).
Palaniappan, M., Annamalai, M.: Advances in signal and image processing in biomedical applications. In: Radhakrishnan, S., Sarfraz, M. (eds.) Coding Theory. IntechOpen (2020). https://doi.org/10.5772/intechopen.88759
Chapter MATH Google Scholar
Manikandan, A., Ponni Bala, M.: Intracardiac mass detection and classification using double convolutional neural network classifier. J. Eng. Res. 11(2A), 272–280 (2023)
Google Scholar
Balamurugan, D., Seshadri, S.A., Reddy, P., Rupani, A., Manikandan, A.: Multiview objects recognition using deep learning-based wrap-CNN with voting scheme. Neural Process. Lett. 54, 1–27 (2022). https://doi.org/10.1007/s11063-021-10679-4
Article Google Scholar
Parvathala, B.R., Manikandan, A., Vijayalakshmi, P., Muzammil Parvez, M., Harihara Gopalan, S., Ramalingam, S.: Bio-inspired metaheuristic algorithm for network intrusion detection system of architecture: In: Jaganathan, R., Mehta, S., Krishan, R. (eds.) Bio-Inspired Intelligence for Smart Decision-Making:, pp. 62–84. IGI Global (2024). https://doi.org/10.4018/979-8-3693-5276-2.ch004
Chapter Google Scholar
Mazari, A., Sahbi, H.: Deep multiple aggregation networks for action recognition. Int. J. Multimed. Inf. Retr. 13(1), 9 (2024)
Article MATH Google Scholar
Ali, R., Manikandan, A., Lei, R., et al.: A novel SpaSA based hyper-parameter optimized FCEDN with adaptive CNN classification for skin cancer detection. Sci. Rep. 14, 9336 (2024). https://doi.org/10.1038/s41598-024-57393-4
Article MATH Google Scholar
Ali, R., Manikandan, A., Xu, J.: A novel framework of adaptive fuzzy-GLCM segmentation and fuzzy with capsules network (F-CapsNet) classification. Neural Comput. Appl. (2023). https://doi.org/10.1007/s00521-023-08666-y
Article MATH Google Scholar
Hariprasath, S., Ramkumar, M., Takale, D.G., Harihara Gopalan, S., Manikandan, A.: Deep learning algorithm and self-powered tactile sensors for gesture recognition. In: Self-Powered Sensors, pp. 251–268. Elsevier (2025). https://doi.org/10.1016/B978-0-443-13792-1.00012-2
Chapter Google Scholar
Hariprasath, S., Ramkumar, M., Takale, D.G., Harihara Gopalan, S., Manikandan, A.: Stretchable and flexible wearable sensors based on carbon and textile for health monitoring. In: Self-Powered Sensors, pp. 93–108. Elsevier (2025). https://doi.org/10.1016/B978-0-443-13792-1.00014-6
Chapter Google Scholar
Kasiselvanathan, M., Manikandan, A.: Biometric and bio-inspired approaches for MEMS/NEMS enabled self-powered sensors. In: self-powered sensors, pp. 171–185. Elsevier (2025). https://doi.org/10.1016/B978-0-443-13792-1.00017-1
Chapter Google Scholar
Kolli, S., Praveen, V., Ashok, J., Manikandan, A.: Internet of Things for Pervasive and Personalized Healthcare: Architecture, Technologies, Components, Applications, and Prototype Development. https://doi.org/10.4018/978-1-6684-8913-0.ch008 (2023)
Venmathi, A.R., David, S., Govinda, E., Ganapriya, K., Dhanapal, R., Manikandan, A.: An automatic brain tumors detection and classification using deep convolutional neural network with VGG-19. In: 2023 2nd International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), Coimbatore, India, pp. 1–5, https://doi.org/10.1109/ICAECA56562.2023.10200949. (2023)

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, College of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamilnadu, 603203, India
M. Jayamohan & S. Yuvaraj

Authors

M. Jayamohan
View author publications
You can also search for this author inPubMed Google Scholar
S. Yuvaraj
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

M.J wrote the main manuscript text and S.Y prepared all figures. All authors reviewed the manuscript.

Corresponding author

Correspondence to S. Yuvaraj.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jayamohan, M., Yuvaraj, S. Iv3-MGRUA: a novel human action recognition features extraction using Inception v3 and video behaviour prediction using modified gated recurrent units with attention mechanism model. SIViP 19, 134 (2025). https://doi.org/10.1007/s11760-024-03726-9

Download citation

Received: 03 February 2024
Revised: 24 August 2024
Accepted: 03 September 2024
Published: 15 December 2024
DOI: https://doi.org/10.1007/s11760-024-03726-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Iv3-MGRUA: a novel human action recognition features extraction using Inception v3 and video behaviour prediction using modified gated recurrent units with attention mechanism model

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The Study of Human Action Recognition in Videos with Long Short-Term Memory Model

A Hybrid Architecture for Action Recognition in Videos Using Deep Learning

Learning hierarchical video representation for action recognition

Explore related subjects

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now