Abstract
Video analytics has become an essential tool for improving security monitoring by automating the tedious task of manually reviewing large CCTV footage. Despite advancements in this field, the accurate recognition of human actions in videos remains challenging because of the complex nature of actions, varied backgrounds, and different camera angles. To address these difficulties, we developed a novel action recognition model that integrates an attention mechanism with a modified Gated Recurrent Unit (GRU) architecture. Our approach leverages Inception v3 for feature extraction, which is combined with an attention mechanism that focuses on the most critical portions of the input sequence. This allows the model to better identify the key aspects of the video, thereby enhancing the precision of action recognition. The attention-enhanced features were further processed by the modified GRU, which utilizes an attention mechanism to categorize video behaviors more effectively, particularly for complex video sequences. To validate the effectiveness of our model, we conducted extensive tests on two well-known and challenging datasets, the Human Metabolome Database (HMDB51) and the University of Central Florida (UCF101). The results showed that our model achieved notable accuracy rates of 75.32% for HMDB51 and 96.82% for UCF101, demonstrating its capability to address the complexities of human action recognition in videos. These results highlight the potential of our approach for advancing the state-of-the-art video analytics.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03726-9/MediaObjects/11760_2024_3726_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03726-9/MediaObjects/11760_2024_3726_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03726-9/MediaObjects/11760_2024_3726_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03726-9/MediaObjects/11760_2024_3726_Fig4_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03726-9/MediaObjects/11760_2024_3726_Fig5_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03726-9/MediaObjects/11760_2024_3726_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03726-9/MediaObjects/11760_2024_3726_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03726-9/MediaObjects/11760_2024_3726_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03726-9/MediaObjects/11760_2024_3726_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03726-9/MediaObjects/11760_2024_3726_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03726-9/MediaObjects/11760_2024_3726_Fig11_HTML.png)
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
No datasets were generated or analysed during the current study.
References
Vishwakarma, D.K., Singh, T.: A visual cognizance based multi-resolution descriptor for human action recognition using key pose. AEU Int. J. Electron. Commun. 107, 157–169 (2019). https://doi.org/10.1016/j.aeue.2019.05.023
Manoharan, J., Yuvaraj, S.: Enhanced hand gesture recognition using optimized preprocessing and VGG16-based deep learning model. In: 2024 10th International Conference on Communication and Signal Processing (ICCSP), pp. 1101–1105. IEEE (2024).
Singh, T., Vishwakarma, D.K.: Video benchmarks of human action datasets: a review. Artif. Intell. Rev. 52, 1107–1154 (2019)
Jayamohan, M., Yuvaraj, S.: Video-based action recognition of spatial and temporal deep learning models. In: International Conference on Advances in Data-driven Computing and Intelligent Systems, pp. 379–391. Singapore: Springer Nature Singapore (2023).
Zhou, A., Ma, Y., Ji, W., Zong, M., Yang, P., Min, Wu., Liu, M.: Multi-head attention-based two-stream EfficientNet for action recognition. Multimed. Syst. 29(2), 487–498 (2023)
Wang, K., Boonpratatong, A., Chen, W., Ren, L., Wei, G., Qian, Z., Zhao, D.: The fundamental property of human leg during walking: linearity and nonlinearity. IEEE Trans. Neural Syst. Rehabil. Eng. 31, 4871–4881 (2023). https://doi.org/10.1109/TNSRE.2023.3339801
Li, Y., Song, L., Hu, Y., Lee, H., Wu, D., Rehm, P.J., Lu, N.: Load profile inpainting for missing load data restoration and baseline estimation. IEEE Trans. Smart Grid 15(2), 2251–2260 (2024). https://doi.org/10.1109/TSG.2023.3293188
Cai, G., Zheng, X., Guo, J., Gao, W.: Real-time identification of borehole rescue environment situation in underground disaster areas based on multi-source heterogeneous data fusion. Saf. Sci. 181, 106690 (2025). https://doi.org/10.1016/j.ssci.2024.106690
Pan, H., Tong, S., Wei, X., Teng, B.: Fatigue state recognition system for miners based on a multi-modal feature extraction and fusion framework. IEEE Trans. Cogn. Dev. Syst. (2024). https://doi.org/10.1109/TCDS.2024.3461713
Li, J., Zhang, C., Liu, Z., Hong, R., Hu, H.: Optimal volumetric video streaming with hybrid saliency based tiling. IEEE Trans. Multimedia 25, 2939–2953 (2023). https://doi.org/10.1109/TMM.2022.3153208
Li, J., Han, L., Zhang, C., Li, Q., Liu, Z.: Spherical convolution empowered viewport prediction in 360 video multicast with limited FoV feedback. ACM Trans. Multimedia Comput. Commun. Appl. (2023). https://doi.org/10.1145/3511603
Zhou, Y., Xie, J., Zhang, X., Wu, W., Kwong, S.: Energy-efficient and interpretable multisensor human activity recognition via deep fused lasso net. IEEE Trans. Emerg. Top. Comput. Intell. 8(5), 3576–3588 (2024). https://doi.org/10.1109/TETCI.2024.3430008
Zhang, X., Hou, D., Xiong, Z., Liu, Y., Wang, S., Li, Y.: EALLR: energy-aware low-latency routing data driven model in mobile edge computing. IEEE Trans. Consum. Electron. (2024). https://doi.org/10.1109/TCE.2024.3507158
Wang, B., Wang, Z., Song, Y., Zong, W., Zhang, L., Ji, K., Dai, Z.: A neural coordination strategy for attachment and detachment of a climbing robot inspired by gecko locomotion. Cyborg Bionic Syst. (2023). https://doi.org/10.34133/cbsystems.0008
Gu, X., Ren, H.: A survey of transoral robotic mechanisms: distal dexterity, variable stiffness, and triangulation. Cyborg Bion. Syst. (2023). https://doi.org/10.34133/cbsystems.0007
Wang, Y., Chen, H., Law, J., Du, X., Yu, J.: Ultrafast miniature robotic swimmers with upstream motility. Cyborg Bionic Syst. (2023). https://doi.org/10.34133/cbsystems.0015
Gao, Q., Deng, Z., Ju, Z., Zhang, T.: Dual-hand motion capture by using biological inspiration for bionic bimanual robot teleoperation. Cyborg Bionic Syst. 4, 52 (2023). https://doi.org/10.34133/cbsystems.0052
He, S., Luo, H., Jiang, W., Jiang, X., Ding, H.: VGSG: vision-guided semantic-group network for text-based person search. IEEE Trans. Image Process. 33, 163–176 (2024). https://doi.org/10.1109/TIP.2023.3337653
Hu, C., Zhao, C., Shao, H., Deng, J., Wang, Y.: TMFF: trustworthy multi-focus fusion framework for multi-label sewer defect classification in sewer inspection videos. IEEE Trans. Circuits Syst. Video Technol. (2024). https://doi.org/10.1109/TCSVT.2024.3433415
Song, W., Wang, X., Jiang, Y., Li, S., Hao, A., Hou, X., Qin, H.: Expressive 3D facial animation generation based on local-to-global latent diffusion. IEEE Trans. Vis. Comput. Gr. 30(11), 7397–7407 (2024). https://doi.org/10.1109/TVCG.2024.3456213
Jiang, H., Ji, P., Zhang, T., Cao, H., Liu, D.: Two-factor authentication for keyless entry system via finger-induced vibrations. IEEE Trans. Mob. Comput. 23(10), 9708–9720 (2024). https://doi.org/10.1109/TMC.2024.3368331
Yang, J., Yang, F., Zhou, Y., Wang, D., Li, R., Wang, G., Chen, W.: A data-driven structural damage detection framework based on parallel convolutional neural network and bidirectional gated recurrent unit. Inf. Sci. 566, 103–117 (2021). https://doi.org/10.1016/j.ins.2021.02.064
Yang, J., Zhang, L., Chen, C., Li, Y., Li, R., Wang, G., Zeng, Z.: A hierarchical deep convolutional neural network and gated recurrent unit framework for structural damage detection. Inf. Sci. 540, 117–130 (2020). https://doi.org/10.1016/j.ins.2020.05.090
Guo, T., Yuan, H., Hamzaoui, R., Wang, X., Wang, L.: Dependence-based coarse-to-fine approach for reducing distortion accumulation in G-PCC attribute compression. IEEE Trans. Industr. Inf. (2024). https://doi.org/10.1109/TII.2024.3403262
Yu, S., Guan, D., Gu, Z., Guo, J., Liu, Z., Liu, Y.: Radar Target complex high-resolution range profile modulation by external time coding metasurface. IEEE Trans. Microw. Theory Tech. 72(10), 6083–6093 (2024). https://doi.org/10.1109/TMTT.2024.3385421
Cheng, D., Chen, L., Lv, C., Guo, L., Kou, Q.: Light-guided and cross-fusion U-net for anti-illumination image super-resolution. IEEE Trans. Circuits Syst. Video Technol. 32(12), 8436–8449 (2022). https://doi.org/10.1109/TCSVT.2022.3194169
Xing, J., Yuan, H., Hamzaoui, R., Liu, H., Hou, J.: GQE-net: a graph-based quality enhancement network for point cloud color attribute. IEEE Trans. Image Process. 32, 6303–6317 (2023). https://doi.org/10.1109/TIP.2023.3330086
Sun, Y., Peng, Z., Hu, J., Ghosh, B.K.: Event-triggered critic learning impedance control of lower limb exoskeleton robots in interactive environments. Neurocomputing 564, 126963 (2024). https://doi.org/10.1016/j.neucom.2023.126963
Gu, X., Chen, X., Lu, P., Lan, X., Li, X., Du, Y.: SiMaLSTM-SNP: novel semantic relatedness learning model preserving both Siamese networks and membrane computing. J. Supercomput. 80(3), 3382–3411 (2024). https://doi.org/10.1007/s11227-023-05592-7
Karpagalakshmi, R., Rani, D., Magendiran, N., Manikandan, A.: An energy-efficient bio-inspired mobility-aware cluster p-WOA algorithm for intelligent whale optimization and fuzzy-logic-based zonal clustering algorithm in FANET. Int. J. Comput. Intell. Syst. (2024). https://doi.org/10.1007/s44196-024-006515
Lu, L., Zhang, C., Cao, K., Deng, T., Yang, Q.: A multichannel CNN-GRU model for human activity recognition. IEEE Access 10, 66797–66810 (2022)
Annamalai, M., Muthiah, P.: An Early prediction of tumor in heart by cardiac masses classification in echocardiogram images using robust back propagation neural network classifier. Braz. Arch. Biol. Technol. (2022). https://doi.org/10.1590/1678-4324-2022210316
Jayamohan, M., Yuvaraj, S., Vijayakumar. P.: Review of video analytics method for video surveillance. In: 2021 4th International Conference on Recent Trends in Computer Science and Technology (ICRTCST), pp. 43–47. IEEE (2022).
Palaniappan, M., Annamalai, M.: Advances in signal and image processing in biomedical applications. In: Radhakrishnan, S., Sarfraz, M. (eds.) Coding Theory. IntechOpen (2020). https://doi.org/10.5772/intechopen.88759
Manikandan, A., Ponni Bala, M.: Intracardiac mass detection and classification using double convolutional neural network classifier. J. Eng. Res. 11(2A), 272–280 (2023)
Balamurugan, D., Seshadri, S.A., Reddy, P., Rupani, A., Manikandan, A.: Multiview objects recognition using deep learning-based wrap-CNN with voting scheme. Neural Process. Lett. 54, 1–27 (2022). https://doi.org/10.1007/s11063-021-10679-4
Parvathala, B.R., Manikandan, A., Vijayalakshmi, P., Muzammil Parvez, M., Harihara Gopalan, S., Ramalingam, S.: Bio-inspired metaheuristic algorithm for network intrusion detection system of architecture: In: Jaganathan, R., Mehta, S., Krishan, R. (eds.) Bio-Inspired Intelligence for Smart Decision-Making:, pp. 62–84. IGI Global (2024). https://doi.org/10.4018/979-8-3693-5276-2.ch004
Mazari, A., Sahbi, H.: Deep multiple aggregation networks for action recognition. Int. J. Multimed. Inf. Retr. 13(1), 9 (2024)
Ali, R., Manikandan, A., Lei, R., et al.: A novel SpaSA based hyper-parameter optimized FCEDN with adaptive CNN classification for skin cancer detection. Sci. Rep. 14, 9336 (2024). https://doi.org/10.1038/s41598-024-57393-4
Ali, R., Manikandan, A., Xu, J.: A novel framework of adaptive fuzzy-GLCM segmentation and fuzzy with capsules network (F-CapsNet) classification. Neural Comput. Appl. (2023). https://doi.org/10.1007/s00521-023-08666-y
Hariprasath, S., Ramkumar, M., Takale, D.G., Harihara Gopalan, S., Manikandan, A.: Deep learning algorithm and self-powered tactile sensors for gesture recognition. In: Self-Powered Sensors, pp. 251–268. Elsevier (2025). https://doi.org/10.1016/B978-0-443-13792-1.00012-2
Hariprasath, S., Ramkumar, M., Takale, D.G., Harihara Gopalan, S., Manikandan, A.: Stretchable and flexible wearable sensors based on carbon and textile for health monitoring. In: Self-Powered Sensors, pp. 93–108. Elsevier (2025). https://doi.org/10.1016/B978-0-443-13792-1.00014-6
Kasiselvanathan, M., Manikandan, A.: Biometric and bio-inspired approaches for MEMS/NEMS enabled self-powered sensors. In: self-powered sensors, pp. 171–185. Elsevier (2025). https://doi.org/10.1016/B978-0-443-13792-1.00017-1
Kolli, S., Praveen, V., Ashok, J., Manikandan, A.: Internet of Things for Pervasive and Personalized Healthcare: Architecture, Technologies, Components, Applications, and Prototype Development. https://doi.org/10.4018/978-1-6684-8913-0.ch008 (2023)
Venmathi, A.R., David, S., Govinda, E., Ganapriya, K., Dhanapal, R., Manikandan, A.: An automatic brain tumors detection and classification using deep convolutional neural network with VGG-19. In: 2023 2nd International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), Coimbatore, India, pp. 1–5, https://doi.org/10.1109/ICAECA56562.2023.10200949. (2023)
Author information
Authors and Affiliations
Contributions
M.J wrote the main manuscript text and S.Y prepared all figures. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jayamohan, M., Yuvaraj, S. Iv3-MGRUA: a novel human action recognition features extraction using Inception v3 and video behaviour prediction using modified gated recurrent units with attention mechanism model. SIViP 19, 134 (2025). https://doi.org/10.1007/s11760-024-03726-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11760-024-03726-9