Skip to main content
Log in

Unethical human action recognition using deep learning based hybrid model for video forensics

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the rapid growth in multimedia collections around the world, video forensics faces new obstacles in recognizing human actions under video surveillance systems, human-computer interaction, etc. that requires multiple activity recognition systems. Due to issues such as background clutter, partial occlusion, scaling, viewpoint, lighting, and appearance, recognizing human activities from video sequences or still images is a difficult process. In the literature, there are a variety of Deep Learning methods that can be employed to solve the problems of unethical human action recognition which are effective in learning low-level temporal and spatial features but struggle from learning high-level features that affect the feature learning capability of the model. Due to this problem, deep learning methods suffer from poor performance and learning ability. From digital forensic perspective, deep analysis of video has become a prerequisite in human action recognition methods concerning to cyber-crime investigation and prevention. In this paper, we propose a Deep Learning based hybrid model for unethical human action recognition using two-stream inflated 3D ConvNet (I3D) and spatio-temporal attention (STA) modules. The I3D model improves the performance of 3D CNN architecture by inflating 2D convolution kernels into 3D kernels and STA increases the learning capability by giving attention to each frame’s spatial and temporal information. To test the capability of our model, we have built a multi-action dataset using the subset of diverse datasets like Weizmann, HMDB51, UCF-101, NPDI, and UCF-Crime then compared our proposed model with existing models using unique and multi-action datasets to show better performance capability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Materials Availability

The authors confirm that the data supporting the findings of this study are available within the article.

Code Availability

Available with the authors and provided to the research community based on request and mutual understanding.

References

  1. Avila S, Thome N, Cord M et al (2013) Pooling in image representation: the visual codeword point of view. Comput Vis Image Underst 117(5):453–465

    Article  Google Scholar 

  2. Battiato S, Giudice O, Paratore A (2016) Multimedia forensics: discovering the history of multimedia contents. In: Proceedings of the 17th international conference on computer systems and technologies 2016, pp 5–16

  3. Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308

  4. Donahue J, Anne Hendricks L, Guadarrama S et al (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634

  5. Dumoulin V, Visin F (2016) A guide to convolution arithmetic for deep learning. arXiv:160307285

  6. Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1933–1941

  7. Gorelick L, Blank M, Shechtman E et al (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253

    Article  Google Scholar 

  8. Huang Y, Guo Y, Gao C (2020) Efficient parallel inflated 3d convolution architecture for action recognition. IEEE Access 8:45,753–45,765

    Article  Google Scholar 

  9. Jalal A, Kamal S, Azurdia-Meza CA (2019) Depth maps-based human segmentation and action recognition using full-body plus body color cues via recognizer engine. Journal of Electrical Engineering & Technology 14(1):455–461

    Article  Google Scholar 

  10. Ji S, Xu W, Yang M et al (2012) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231

    Article  Google Scholar 

  11. Karnataka Minister involved in SEX CD scandal (2021) IndiaToday. https://bit.ly/37I8ZCV, [Online; accessed 23-March-2021]

  12. Karpathy A, Toderici G, Shetty S et al (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1725–1732

  13. Kay W, Carreira J, Simonyan K et al (2017) The kinetics human action video dataset. arXiv:170506950

  14. Khan MA, Javed K, Khan SA et al (2020) Human action recognition using fusion of multiview and deep features: an application to video surveillance. Multimed Tools Appl, pp 1–27

  15. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, vol 25

  16. Kuehne H, Jhuang H, Garrote E et al (2011) Hmdb: a large video database for human motion recognition. In: 2011 International conference on computer vision, IEEE, pp 2556–2563

  17. Li J, Liu X, Zhang W et al (2020) Spatio-temporal attention networks for action recognition and detection. IEEE Trans Multimedia 22(11):2990–3001

    Article  Google Scholar 

  18. Liu G, Zhang C, Xu Q et al (2020) I3d-shufflenet based human action recognition. Algorithms 13(11):301

    Article  Google Scholar 

  19. Liu J, Shahroudy A, Xu D, et al (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision. Springer, pp 816–833

  20. Maqsood R, Bajwa UI, Saleem G et al (2021) Anomaly recognition from surveillance videos using 3d convolution neural network. Multimedia Tools and Applications 80(12):18,693–18,716

    Article  Google Scholar 

  21. Moustafa M (2015) Applying deep learning to classify pornographic images and videos. arXiv:151108899

  22. Sam SM, Kamardin K, Sjarif NNA et al (2019) Offline signature verification using deep learning convolutional neural network (cnn) architectures googlenet inception-v1 and inception-v3. Procedia Computer Science 161:475–483

    Article  Google Scholar 

  23. Sargano AB, Wang X, Angelov P et al (2017) Human action recognition using transfer learning with deep representations. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 463-469

  24. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004., IEEE, pp 32–36

  25. Sharma S, Sharma S, Athaiya A (2017) Activation functions in neural networks. Towards Data Science 6(12):310–316

    Google Scholar 

  26. Silva MVd, Marana AN (2018) Spatiotemporal cnns for pornography detection in videos. In: Iberoamerican congress on pattern recognition. Springer, pp 547–555

  27. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Advances in Neural Information Processing Systems, vol 27

  28. Soomro K, Zamir AR, Shah M (2012) Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv:12120402

  29. Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6479–6488

  30. Tran D, Bourdev L, Fergus R, et al (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497

  31. Varol G, Laptev I, Schmid C (2017) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1510–1517

    Article  Google Scholar 

  32. Wang X, Miao Z, Zhang R et al (2019) I3d-lstm: a new model for human action recognition. In: IOP conference series: materials science and engineering, IOP Publishing, pp 032035

  33. Zhou Y, Sun X, Zha ZJ et al (2018) Mict: mixed 3d/2d convolutional tube for human action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 449–458

  34. Zhu Y, Newsam S (2019) Motion-aware feature for improved video anomaly detection. arXiv:190710211

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Contributions

All the authors contributed to the study’s conception and design.

Corresponding author

Correspondence to Raghavendra Gowada.

Ethics declarations

Conflict of Interests

There are no conflict of interest/Competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gowada, R., Pawar, D. & Barman, B. Unethical human action recognition using deep learning based hybrid model for video forensics. Multimed Tools Appl 82, 28713–28738 (2023). https://doi.org/10.1007/s11042-023-14508-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14508-9

Keywords

Navigation