Skip to main content

Advertisement

Log in

Iv3-MGRUA: a novel human action recognition features extraction using Inception v3 and video behaviour prediction using modified gated recurrent units with attention mechanism model

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Video analytics has become an essential tool for improving security monitoring by automating the tedious task of manually reviewing large CCTV footage. Despite advancements in this field, the accurate recognition of human actions in videos remains challenging because of the complex nature of actions, varied backgrounds, and different camera angles. To address these difficulties, we developed a novel action recognition model that integrates an attention mechanism with a modified Gated Recurrent Unit (GRU) architecture. Our approach leverages Inception v3 for feature extraction, which is combined with an attention mechanism that focuses on the most critical portions of the input sequence. This allows the model to better identify the key aspects of the video, thereby enhancing the precision of action recognition. The attention-enhanced features were further processed by the modified GRU, which utilizes an attention mechanism to categorize video behaviors more effectively, particularly for complex video sequences. To validate the effectiveness of our model, we conducted extensive tests on two well-known and challenging datasets, the Human Metabolome Database (HMDB51) and the University of Central Florida (UCF101). The results showed that our model achieved notable accuracy rates of 75.32% for HMDB51 and 96.82% for UCF101, demonstrating its capability to address the complexities of human action recognition in videos. These results highlight the potential of our approach for advancing the state-of-the-art video analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

No datasets were generated or analysed during the current study.

References

  1. Vishwakarma, D.K., Singh, T.: A visual cognizance based multi-resolution descriptor for human action recognition using key pose. AEU Int. J. Electron. Commun. 107, 157–169 (2019). https://doi.org/10.1016/j.aeue.2019.05.023

    Article  MATH  Google Scholar 

  2. Manoharan, J., Yuvaraj, S.: Enhanced hand gesture recognition using optimized preprocessing and VGG16-based deep learning model. In: 2024 10th International Conference on Communication and Signal Processing (ICCSP), pp. 1101–1105. IEEE (2024).

  3. Singh, T., Vishwakarma, D.K.: Video benchmarks of human action datasets: a review. Artif. Intell. Rev. 52, 1107–1154 (2019)

    Article  MATH  Google Scholar 

  4. Jayamohan, M., Yuvaraj, S.: Video-based action recognition of spatial and temporal deep learning models. In: International Conference on Advances in Data-driven Computing and Intelligent Systems, pp. 379–391. Singapore: Springer Nature Singapore (2023).

  5. Zhou, A., Ma, Y., Ji, W., Zong, M., Yang, P., Min, Wu., Liu, M.: Multi-head attention-based two-stream EfficientNet for action recognition. Multimed. Syst. 29(2), 487–498 (2023)

    Article  MATH  Google Scholar 

  6. Wang, K., Boonpratatong, A., Chen, W., Ren, L., Wei, G., Qian, Z., Zhao, D.: The fundamental property of human leg during walking: linearity and nonlinearity. IEEE Trans. Neural Syst. Rehabil. Eng. 31, 4871–4881 (2023). https://doi.org/10.1109/TNSRE.2023.3339801

    Article  Google Scholar 

  7. Li, Y., Song, L., Hu, Y., Lee, H., Wu, D., Rehm, P.J., Lu, N.: Load profile inpainting for missing load data restoration and baseline estimation. IEEE Trans. Smart Grid 15(2), 2251–2260 (2024). https://doi.org/10.1109/TSG.2023.3293188

    Article  MATH  Google Scholar 

  8. Cai, G., Zheng, X., Guo, J., Gao, W.: Real-time identification of borehole rescue environment situation in underground disaster areas based on multi-source heterogeneous data fusion. Saf. Sci. 181, 106690 (2025). https://doi.org/10.1016/j.ssci.2024.106690

    Article  Google Scholar 

  9. Pan, H., Tong, S., Wei, X., Teng, B.: Fatigue state recognition system for miners based on a multi-modal feature extraction and fusion framework. IEEE Trans. Cogn. Dev. Syst. (2024). https://doi.org/10.1109/TCDS.2024.3461713

    Article  MATH  Google Scholar 

  10. Li, J., Zhang, C., Liu, Z., Hong, R., Hu, H.: Optimal volumetric video streaming with hybrid saliency based tiling. IEEE Trans. Multimedia 25, 2939–2953 (2023). https://doi.org/10.1109/TMM.2022.3153208

    Article  MATH  Google Scholar 

  11. Li, J., Han, L., Zhang, C., Li, Q., Liu, Z.: Spherical convolution empowered viewport prediction in 360 video multicast with limited FoV feedback. ACM Trans. Multimedia Comput. Commun. Appl. (2023). https://doi.org/10.1145/3511603

    Article  MATH  Google Scholar 

  12. Zhou, Y., Xie, J., Zhang, X., Wu, W., Kwong, S.: Energy-efficient and interpretable multisensor human activity recognition via deep fused lasso net. IEEE Trans. Emerg. Top. Comput. Intell. 8(5), 3576–3588 (2024). https://doi.org/10.1109/TETCI.2024.3430008

    Article  MATH  Google Scholar 

  13. Zhang, X., Hou, D., Xiong, Z., Liu, Y., Wang, S., Li, Y.: EALLR: energy-aware low-latency routing data driven model in mobile edge computing. IEEE Trans. Consum. Electron. (2024). https://doi.org/10.1109/TCE.2024.3507158

    Article  Google Scholar 

  14. Wang, B., Wang, Z., Song, Y., Zong, W., Zhang, L., Ji, K., Dai, Z.: A neural coordination strategy for attachment and detachment of a climbing robot inspired by gecko locomotion. Cyborg Bionic Syst. (2023). https://doi.org/10.34133/cbsystems.0008

    Article  MATH  Google Scholar 

  15. Gu, X., Ren, H.: A survey of transoral robotic mechanisms: distal dexterity, variable stiffness, and triangulation. Cyborg Bion. Syst. (2023). https://doi.org/10.34133/cbsystems.0007

    Article  MATH  Google Scholar 

  16. Wang, Y., Chen, H., Law, J., Du, X., Yu, J.: Ultrafast miniature robotic swimmers with upstream motility. Cyborg Bionic Syst. (2023). https://doi.org/10.34133/cbsystems.0015

    Article  Google Scholar 

  17. Gao, Q., Deng, Z., Ju, Z., Zhang, T.: Dual-hand motion capture by using biological inspiration for bionic bimanual robot teleoperation. Cyborg Bionic Syst. 4, 52 (2023). https://doi.org/10.34133/cbsystems.0052

    Article  Google Scholar 

  18. He, S., Luo, H., Jiang, W., Jiang, X., Ding, H.: VGSG: vision-guided semantic-group network for text-based person search. IEEE Trans. Image Process. 33, 163–176 (2024). https://doi.org/10.1109/TIP.2023.3337653

    Article  MATH  Google Scholar 

  19. Hu, C., Zhao, C., Shao, H., Deng, J., Wang, Y.: TMFF: trustworthy multi-focus fusion framework for multi-label sewer defect classification in sewer inspection videos. IEEE Trans. Circuits Syst. Video Technol. (2024). https://doi.org/10.1109/TCSVT.2024.3433415

    Article  Google Scholar 

  20. Song, W., Wang, X., Jiang, Y., Li, S., Hao, A., Hou, X., Qin, H.: Expressive 3D facial animation generation based on local-to-global latent diffusion. IEEE Trans. Vis. Comput. Gr. 30(11), 7397–7407 (2024). https://doi.org/10.1109/TVCG.2024.3456213

    Article  Google Scholar 

  21. Jiang, H., Ji, P., Zhang, T., Cao, H., Liu, D.: Two-factor authentication for keyless entry system via finger-induced vibrations. IEEE Trans. Mob. Comput. 23(10), 9708–9720 (2024). https://doi.org/10.1109/TMC.2024.3368331

    Article  MATH  Google Scholar 

  22. Yang, J., Yang, F., Zhou, Y., Wang, D., Li, R., Wang, G., Chen, W.: A data-driven structural damage detection framework based on parallel convolutional neural network and bidirectional gated recurrent unit. Inf. Sci. 566, 103–117 (2021). https://doi.org/10.1016/j.ins.2021.02.064

    Article  MATH  Google Scholar 

  23. Yang, J., Zhang, L., Chen, C., Li, Y., Li, R., Wang, G., Zeng, Z.: A hierarchical deep convolutional neural network and gated recurrent unit framework for structural damage detection. Inf. Sci. 540, 117–130 (2020). https://doi.org/10.1016/j.ins.2020.05.090

    Article  MATH  Google Scholar 

  24. Guo, T., Yuan, H., Hamzaoui, R., Wang, X., Wang, L.: Dependence-based coarse-to-fine approach for reducing distortion accumulation in G-PCC attribute compression. IEEE Trans. Industr. Inf. (2024). https://doi.org/10.1109/TII.2024.3403262

    Article  MATH  Google Scholar 

  25. Yu, S., Guan, D., Gu, Z., Guo, J., Liu, Z., Liu, Y.: Radar Target complex high-resolution range profile modulation by external time coding metasurface. IEEE Trans. Microw. Theory Tech. 72(10), 6083–6093 (2024). https://doi.org/10.1109/TMTT.2024.3385421

    Article  MATH  Google Scholar 

  26. Cheng, D., Chen, L., Lv, C., Guo, L., Kou, Q.: Light-guided and cross-fusion U-net for anti-illumination image super-resolution. IEEE Trans. Circuits Syst. Video Technol. 32(12), 8436–8449 (2022). https://doi.org/10.1109/TCSVT.2022.3194169

    Article  MATH  Google Scholar 

  27. Xing, J., Yuan, H., Hamzaoui, R., Liu, H., Hou, J.: GQE-net: a graph-based quality enhancement network for point cloud color attribute. IEEE Trans. Image Process. 32, 6303–6317 (2023). https://doi.org/10.1109/TIP.2023.3330086

    Article  MATH  Google Scholar 

  28. Sun, Y., Peng, Z., Hu, J., Ghosh, B.K.: Event-triggered critic learning impedance control of lower limb exoskeleton robots in interactive environments. Neurocomputing 564, 126963 (2024). https://doi.org/10.1016/j.neucom.2023.126963

    Article  MATH  Google Scholar 

  29. Gu, X., Chen, X., Lu, P., Lan, X., Li, X., Du, Y.: SiMaLSTM-SNP: novel semantic relatedness learning model preserving both Siamese networks and membrane computing. J. Supercomput. 80(3), 3382–3411 (2024). https://doi.org/10.1007/s11227-023-05592-7

    Article  MATH  Google Scholar 

  30. Karpagalakshmi, R., Rani, D., Magendiran, N., Manikandan, A.: An energy-efficient bio-inspired mobility-aware cluster p-WOA algorithm for intelligent whale optimization and fuzzy-logic-based zonal clustering algorithm in FANET. Int. J. Comput. Intell. Syst. (2024). https://doi.org/10.1007/s44196-024-006515

    Article  Google Scholar 

  31. Lu, L., Zhang, C., Cao, K., Deng, T., Yang, Q.: A multichannel CNN-GRU model for human activity recognition. IEEE Access 10, 66797–66810 (2022)

    Article  Google Scholar 

  32. Annamalai, M., Muthiah, P.: An Early prediction of tumor in heart by cardiac masses classification in echocardiogram images using robust back propagation neural network classifier. Braz. Arch. Biol. Technol. (2022). https://doi.org/10.1590/1678-4324-2022210316

    Article  MATH  Google Scholar 

  33. Jayamohan, M., Yuvaraj, S., Vijayakumar. P.: Review of video analytics method for video surveillance. In: 2021 4th International Conference on Recent Trends in Computer Science and Technology (ICRTCST), pp. 43–47. IEEE (2022).

  34. Palaniappan, M., Annamalai, M.: Advances in signal and image processing in biomedical applications. In: Radhakrishnan, S., Sarfraz, M. (eds.) Coding Theory. IntechOpen (2020). https://doi.org/10.5772/intechopen.88759

    Chapter  MATH  Google Scholar 

  35. Manikandan, A., Ponni Bala, M.: Intracardiac mass detection and classification using double convolutional neural network classifier. J. Eng. Res. 11(2A), 272–280 (2023)

    Google Scholar 

  36. Balamurugan, D., Seshadri, S.A., Reddy, P., Rupani, A., Manikandan, A.: Multiview objects recognition using deep learning-based wrap-CNN with voting scheme. Neural Process. Lett. 54, 1–27 (2022). https://doi.org/10.1007/s11063-021-10679-4

    Article  Google Scholar 

  37. Parvathala, B.R., Manikandan, A., Vijayalakshmi, P., Muzammil Parvez, M., Harihara Gopalan, S., Ramalingam, S.: Bio-inspired metaheuristic algorithm for network intrusion detection system of architecture: In: Jaganathan, R., Mehta, S., Krishan, R. (eds.) Bio-Inspired Intelligence for Smart Decision-Making:, pp. 62–84. IGI Global (2024). https://doi.org/10.4018/979-8-3693-5276-2.ch004

    Chapter  Google Scholar 

  38. Mazari, A., Sahbi, H.: Deep multiple aggregation networks for action recognition. Int. J. Multimed. Inf. Retr. 13(1), 9 (2024)

    Article  MATH  Google Scholar 

  39. Ali, R., Manikandan, A., Lei, R., et al.: A novel SpaSA based hyper-parameter optimized FCEDN with adaptive CNN classification for skin cancer detection. Sci. Rep. 14, 9336 (2024). https://doi.org/10.1038/s41598-024-57393-4

    Article  MATH  Google Scholar 

  40. Ali, R., Manikandan, A., Xu, J.: A novel framework of adaptive fuzzy-GLCM segmentation and fuzzy with capsules network (F-CapsNet) classification. Neural Comput. Appl. (2023). https://doi.org/10.1007/s00521-023-08666-y

    Article  MATH  Google Scholar 

  41. Hariprasath, S., Ramkumar, M., Takale, D.G., Harihara Gopalan, S., Manikandan, A.: Deep learning algorithm and self-powered tactile sensors for gesture recognition. In: Self-Powered Sensors, pp. 251–268. Elsevier (2025). https://doi.org/10.1016/B978-0-443-13792-1.00012-2

    Chapter  Google Scholar 

  42. Hariprasath, S., Ramkumar, M., Takale, D.G., Harihara Gopalan, S., Manikandan, A.: Stretchable and flexible wearable sensors based on carbon and textile for health monitoring. In: Self-Powered Sensors, pp. 93–108. Elsevier (2025). https://doi.org/10.1016/B978-0-443-13792-1.00014-6

    Chapter  Google Scholar 

  43. Kasiselvanathan, M., Manikandan, A.: Biometric and bio-inspired approaches for MEMS/NEMS enabled self-powered sensors. In: self-powered sensors, pp. 171–185. Elsevier (2025). https://doi.org/10.1016/B978-0-443-13792-1.00017-1

    Chapter  Google Scholar 

  44. Kolli, S., Praveen, V., Ashok, J., Manikandan, A.: Internet of Things for Pervasive and Personalized Healthcare: Architecture, Technologies, Components, Applications, and Prototype Development. https://doi.org/10.4018/978-1-6684-8913-0.ch008 (2023)

  45. Venmathi, A.R., David, S., Govinda, E., Ganapriya, K., Dhanapal, R., Manikandan, A.: An automatic brain tumors detection and classification using deep convolutional neural network with VGG-19. In: 2023 2nd International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), Coimbatore, India, pp. 1–5, https://doi.org/10.1109/ICAECA56562.2023.10200949. (2023)

Download references

Author information

Authors and Affiliations

Authors

Contributions

M.J wrote the main manuscript text and S.Y prepared all figures. All authors reviewed the manuscript.

Corresponding author

Correspondence to S. Yuvaraj.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jayamohan, M., Yuvaraj, S. Iv3-MGRUA: a novel human action recognition features extraction using Inception v3 and video behaviour prediction using modified gated recurrent units with attention mechanism model. SIViP 19, 134 (2025). https://doi.org/10.1007/s11760-024-03726-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-024-03726-9

Keywords

Navigation