Skip to main content
Log in

Modeling transformer architecture with attention layer for human activity recognition

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Human activity recognition (HAR) is necessary in numerous fields, involving medicine, sports, and security. Traditional HAR methods often rely on complex feature extraction from raw input data, while convolutional neural networks (CNN) are primarily designed for 2D data. The proposed approach seeks to overcome these limitations by leveraging both spatial and temporal attributes for improved action detection and enhancing the understanding of human movements across adjacent frames. This research aims to address the challenges of HAR by introducing a new model that combines a 3D CNN architecture with an attention layer. A 3D convolution transformer is employed to capture intricate spatial and temporal features, generate multiple data channels from input frames, and optimize performance through regularization and model ensemble techniques. The main findings reveal outstanding results on benchmark datasets, with an accuracy of 98.09% and 99.09% on the Weizmann and UCF101 datasets, respectively. These results underscore the model's effectiveness in accurately identifying human activities in movie-based natural environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

The data associated with this work will be provided on a reasonable request.

References

  1. D’Arco L, Wang H, Zheng H (2023) DeepHAR: a deep feed-forward neural network algorithm for smart insole-based human activity recognition. Neural Comput Appl 35:13547–13563. https://doi.org/10.1007/s00521-023-08363-w

    Article  Google Scholar 

  2. Kushwaha A, Khare A, Prakash O (2023) Micro-network-based deep convolutional neural network for human activity recognition from realistic and multi-view visual data. Neural Comput Appl 35:13321–13341. https://doi.org/10.1007/s00521-023-08440-0

    Article  Google Scholar 

  3. Nguyen HP, Ribeiro B (2023) Video action recognition collaborative learning with dynamics via PSO-ConvNet Transformer. Sci Rep 13:14624. https://doi.org/10.1038/s41598-023-39744-9

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  4. Saoudi EM, Jaafari J, Andaloussi SJ (2023) Advancing human action recognition: a hybrid approach using attention-based LSTM and 3D CNN. Sci Afr 21:e01796. https://doi.org/10.1016/j.sciaf.2023.e01796

    Article  Google Scholar 

  5. Surek GA, Seman LO, Stefenon SF, Mariani VC, Coelho LD (2023) Video-based human activity recognition using deep learning approaches. Sensors. https://doi.org/10.3390/s23146384

    Article  PubMed  PubMed Central  Google Scholar 

  6. Zhang H, Wang L, Sun J (2023) Exploiting spatio-temporal knowledge for video action recognition. IET Comput Vision 17:222–230. https://doi.org/10.1049/cvi2.12154

    Article  Google Scholar 

  7. Zhu S, Chen W, Liu F, Zhang X, Han X (2023) Human activity recognition based on a modified capsule network. Mob Inf Syst 2023:8273546. https://doi.org/10.1155/2023/8273546

    Article  Google Scholar 

  8. Tyagi B, Nigam S, Singh R (2022) A review of deep learning techniques for crowd behavior analysis. Arch Comput Method Eng 29(7):5427–5455

    Article  Google Scholar 

  9. Umar IM, Ibrahim KM, Gital AYU, Zambuk FU, Lawal MA, Yakubu ZI (2022) Hybrid model for human activity recognition using an inflated i3-D two stream convolutional-LSTM network with optical flow mechanism. In: 2022 IEEE Delhi section conference, DELCON 2022. https://doi.org/10.1109/DELCON54057.2022.9752782.

  10. Nigam S, Singh R, Singh MK, Singh VK (2023) Multiview human activity recognition using uniform rotation invariant local binary patterns. J Ambient Intell Humaniz Comput 14(5):4707–4725

    Article  Google Scholar 

  11. Manaf FA, Singh S (2021) A novel hybridization model for human activity recognition using stacked parallel LSTMs with 2D-CNN for feature extraction. In: 2021 12th International conference on computing communication and networking technologies (ICCCNT), pp 1–7. https://doi.org/10.1109/ICCCNT51525.2021.9579686

  12. Nigam S, Singh R, Misra AK (2019) A review of computational approaches for human behavior detection. Arch Comput Method Eng 26:831–863

    Google Scholar 

  13. Rodríguez-Moreno I, Martínez-Otzeta JM, Sierra B, Rodriguez I, Jauregi E (2019) Video activity recognition: state-of-the-art. Sensors (Switzerland) 19:1–25. https://doi.org/10.3390/s19143160

    Article  Google Scholar 

  14. Xia K, Huang J, Wang H (2020) LSTM-CNN architecture for human activity recognition. IEEE Access 8:56855–56866. https://doi.org/10.1109/ACCESS.2020.2982225

    Article  Google Scholar 

  15. Fereidoonian F, Firouzi F, Farahani B (2020) Human Activity recognition: from sensors to applications. In: 2020 International conference on omni-layer intelligent systems, COINS 2020. https://doi.org/10.1109/COINS49042.2020.9191417

  16. Ehatisham-Ul-Haq M, Javed A, Azam MA, Malik HMA, Irtaza A, Lee IH, Mahmood MT (2019) Robust human activity recognition using multimodal feature-level fusion. IEEE Access 7:60736–60751. https://doi.org/10.1109/ACCESS.2019.2913393

    Article  Google Scholar 

  17. Muaaz M, Chelli A, Abdelgawwad AA, Mallofré AC, Pätzold M (2020) WiWeHAR: Multimodal human activity recognition using Wi-Fi and wearable sensing modalities. IEEE Access 8:164453–164470. https://doi.org/10.1109/ACCESS.2020.3022287

    Article  Google Scholar 

  18. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 1725–1732. https://doi.org/10.1109/CVPR.2014.223

  19. Soomro K, Zamir AR, Shah M (2012) UCF101: A Dataset of 101 human actions classes from videos in the wild

  20. Wang X, Gao L, Wang P, Sun X, Liu X (2018) Two-stream 3-D convNet fusion for action recognition in videos with arbitrary size and length. IEEE Trans Multimedia 20:634–644. https://doi.org/10.1109/TMM.2017.2749159

    Article  Google Scholar 

  21. Wang L, Xiong Y, Wang Z, Qiao Y (2015) Towards good practices for very deep two-stream ConvNets, pp 1–5

  22. Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 4305–4314. https://doi.org/10.1109/CVPR.2015.7299059

  23. Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. 2016-Decem, pp 1933–1941. https://doi.org/10.1109/CVPR.2016.213

  24. Bilen H, Fernando B, Gavves E, Vedaldi A, Gould S(2016) Dynamic image networks for action recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. 2016-Decem, pp 3034–3042. https://doi.org/10.1109/CVPR.2016.331

  25. Carreira J, Zisserman A (2017) Quo Vadis, action recognition? A new model and the kinetics dataset. In: Proceedings—30th IEEE conference on computer vision and pattern recognition, CVPR 2017. 2017-Janua, pp 4724–4733. https://doi.org/10.1109/CVPR.2017.502

  26. Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW (2017) Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access 6:1155–1166. https://doi.org/10.1109/ACCESS.2017.2778011

    Article  Google Scholar 

  27. Chen J, Xu Y, Zhang C, Xu Z, Meng X, Wang J (2019) An improved two-stream 3D convolutional neural network for human action recognition. In: 2019 25th International conference on automation and computing (ICAC), pp 1–6. https://doi.org/10.23919/IConAC.2019.8894962

  28. Tanberk S, Kilimci ZH, Tukel DB, Uysal M, Akyokus S (2020) A hybrid deep model using deep learning and dense optical flow approaches for human activity recognition. IEEE Access 8:19799–19809. https://doi.org/10.1109/ACCESS.2020.2968529

    Article  Google Scholar 

  29. Gatt T, Seychell D, Dingli A (2019) Detecting human abnormal behaviour through a video generated model. In: International symposium on image and signal processing and analysis, ISPA. 2019-Septe, pp 264–270. https://doi.org/10.1109/ISPA.2019.8868795

  30. Zheng Y, Liu Q, Chen E, Ge Y, Zhao JL (2014) Time series classification using multi-channels deep convolutional neural networks. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 8485 LNCS, pp 298–310. https://doi.org/10.1007/978-3-319-08010-9_33

  31. Ordóñez FJ, Roggen D (2016) Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors. https://doi.org/10.3390/s16010115

    Article  PubMed  PubMed Central  Google Scholar 

  32. Mishra P, Dey S, Ghosh SS, Seal DB, Goswami S (2019) Human Activity Recognition using Deep Neural Network. In: 2019 International conference on data science and engineering (ICDSE). pp. 77–83. https://doi.org/10.1109/ICDSE47409.2019.8971476

  33. Khimraj, Shukla, PK, Vijayvargiya A, Kumar R (2020) Human Activity Recognition using Accelerometer and Gyroscope Data from Smartphones. In: Proceedings - 2020 international conference on emerging trends in communication, control and computing, ICONC3 2020. https://doi.org/10.1109/ICONC345789.2020.9117456

  34. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: A large video database for human motion recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2556–2563. https://doi.org/10.1109/ICCV.2011.6126543

  35. Chen C, Jafari R, Kehtarnavaz, N (2016) Fusion of depth, skeleton, and inertial data for human action recognition. In: 2016 IEEE International conference on acoustics, speech and signal processing (ICASSP). pp. 2712–2716. https://doi.org/10.1109/ICASSP.2016.7472170

  36. Li K, Zhao X, Bian J, Tan M (2017) Sequential learning for multimodal 3D human activity recognition with Long-Short Term Memory. IN: 2017 IEEE International conference on mechatronics and automation, ICMA 2017, pp 1556–1561. https://doi.org/10.1109/ICMA.2017.8016048

  37. Fuad Z, Unel M (2018) Human action recognition using fusion of depth and inertial sensors. Springer, Berlin. https://doi.org/10.1007/978-3-319-93000-8_42

    Book  Google Scholar 

  38. Manzi A, Moschetti A, Limosani R, Fiorini L, Cavallo F (2018) Enhancing activity recognition of self-localized robot through depth camera and wearable sensors. IEEE Sens J 18:9324–9331. https://doi.org/10.1109/JSEN.2018.2869807

    Article  ADS  Google Scholar 

  39. Sefen B, Baumbach S, Dengel A, Abdennadher S (2016) Human activity recognition using sensor data of smartphones and smartwatches. In: ICAART 2016 - Proceedings of the 8th international conference on agents and artificial intelligence. 2, pp 488–493. https://doi.org/10.5220/0005816004880493

  40. Bharti P, De D, Chellappan S, Das SK (2019) HuMAn: Complex activity recognition with multi-modal multi-positional body sensing. IEEE Trans Mob Comput 18:857–870. https://doi.org/10.1109/TMC.2018.2841905

    Article  Google Scholar 

  41. Martiez-Gonzalez A, Villamizar M, Canevet O, Odobez JM (2018) Real-time convolutional networks for depth-based human pose estimation. In: IEEE International conference on intelligent robots and systems, pp 41–47. https://doi.org/10.1109/IROS.2018.8593383

  42. Mohammad AN, Ohashi H, Ahmed S, Nakamura K, Akiyama T, Sato T, Nguyen P, Dengel A (2018) Hierarchical model for zero-shot activity recognition using wearable sensors. In: ICAART 2018 - Proceedings of the 10th international conference on agents and artificial intelligence. 2, pp 478–485. https://doi.org/10.5220/0006595204780485

  43. Cruciani F, Sun C, Zhang S, Nugent C, Li C, Song S, Cheng C, Cleland I, McCullagh P (2019) A public domain dataset for human activity recognition in free-living conditions. In: Proceedings - 2019 IEEE SmartWorld, ubiquitous intelligence and computing, advanced and trusted computing, scalable computing and communications, internet of people and smart city innovation, SmartWorld/UIC/ATC/SCALCOM/IOP/SCI 2019, pp 166–171. https://doi.org/10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00071

  44. Chavarriaga R, Sagha H, Calatroni A, Digumarti ST, Tröster G, Millán JDR, Roggen D (2013) The opportunity challenge: a benchmark database for on-body sensor-based activity recognition. Pattern Recogn Lett 34:2033–2042. https://doi.org/10.1016/j.patrec.2012.12.014

    Article  ADS  Google Scholar 

  45. Chen C, Jafari R, Kehtarnavaz N (2015) UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: Proceedings - international conference on image processing, ICIP. 2015-Decem, pp 168–172. https://doi.org/10.1109/ICIP.2015.7350781

  46. Nigam S, Singh R, Singh MK, Singh VK (2021) Multiple views-based recognition of human activities using uniform patterns. In: 2021 Sixth international conference on image information processing (ICIIP), Vol. 6, pp. 483–488. IEEE

  47. Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2013) Berkeley MHAD: A comprehensive Multimodal Human Action Database. In: Proceedings of IEEE Workshop on applications of computer vision, pp 53–60. https://doi.org/10.1109/WACV.2013.6474999

  48. Shreyas DG, Raksha S, Prasad BG (2020) Implementation of an anomalous human activity recognition system. SN Comput Sci 1:1–10. https://doi.org/10.1007/s42979-020-00169-0

    Article  Google Scholar 

  49. Chen K, Zhang D, Yao L, Guo B, Yu Z, Liu Y (2021) Deep learning for sensor-based human activity recognition: overview, challenges, and opportunities. ACM Comput Surv. https://doi.org/10.1145/3447744

    Article  Google Scholar 

  50. Sun J, Fu Y, Li S, He J, Xu C, Tan L (2018) Sequential human activity recognition based on deep convolutional network and extreme learning machine using wearable sensors. J Sens. https://doi.org/10.1155/2018/8580959

    Article  Google Scholar 

  51. Yadav SK, Tiwari K, Pandey HM, Akbar SA (2021) A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions. Knowl-Based Syst 223:106970. https://doi.org/10.1016/j.knosys.2021.106970

    Article  Google Scholar 

  52. Kalfaoglu ME, Kalkan S, Alatan AA (2020) Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). 12539 LNCS, pp 731–747. https://doi.org/10.1007/978-3-030-68238-5_48

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajiv Singh.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding this manuscript and received no funding for this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pareek, G., Nigam, S. & Singh, R. Modeling transformer architecture with attention layer for human activity recognition. Neural Comput & Applic 36, 5515–5528 (2024). https://doi.org/10.1007/s00521-023-09362-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09362-7

Keywords

Navigation