Abstract
With the increasing number of crimes in crowded and remote areas, there is a necessity to recognize any abnormal or violent event with the help of video surveillance systems. Anomaly detection is still a challenging task in the domain of computer vision because of its changing color, backgrounds, and illuminations. In recent years, vision transformers, along with the introduction of attention modules in deep learning algorithms showed promising results. This paper presents an attention-based anomaly detection framework that focuses on the extraction of spatial features. The proposed framework is implemented in two steps. The first step involves the extraction of spatial features with the Spatial Attention Module (SAM) and Shifted Window (SWIN) transformer. In the second step, a binary classification of abnormal or violent activities is done with extracted features via fully connected layers. A performance analysis of pretrained variants of SWIN transformers is also presented in this paper for the choice of the model. Four public benchmark datasets, namely, CUHK Avenue, University of Minnesota (UMN), AIRTLab, and Industrial Surveillance (IS) are employed for analysis and implementations. The proposed framework outperformed existing state of the art methods by 18% and 2–20% with accuracy of 98.58% (IS) and 100% (Avenue) respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Franklin, R.J., Dabbagol, V.: Anomaly detection in videos for video surveillance applications using neural networks. In: Fourth International Conference on Inventive Systems and Controls (ICISC), p. 632. IEEE (2020)
Anomaly Detection in Videos using LSTM Convolutional Autoencoder, https://towardsdatascience.com/prototyping-an-anomaly-detection-system-for-videos-step-by-step-using-lstm-convolutional-4e06b7dcdd29. Last accessed 25 Apr 2023
Garg, A., Nigam, S., Singh, R.: Vision based human activity recognition using hybrid deep learning. In: 2022 International Conference on Connected Systems & Intelligence (CSI), pp. 1–6. IEEE (2022)
Berroukham, A., Housni, K., Lahraichi, M., Boulfrifi, I.: Deep learning-based method for anomaly detection in video surveillance: a review. Bull. Electr. Eng. Inf. 2(1), 314–327 (2023)
Suarez, J.J.P., Naval Jr, P.C.: A survey on deep learning techniques for video anomaly detection. arXiv preprint arXiv: 2009.14146 (2020)
Ramzan, M., et al.: A review on state-of-the-art violence detection techniques. IEEE Access 7, 107560–107575 (2019)
Chandrakala, S., Deepak, K., Revathy, G.: Anomaly detection in surveillance videos: a thematic taxonomy of deep models, review and performance analysis. Artif. Intell. Rev. 1–50 (2022)
Jamil, S., Jalil Piran, M., Kwon, O.J.: A comprehensive survey of transformers for computer vision. Drones 7(5), 287 (2022)
Nigam, S., Singh, R., Misra, A.K.: A review of computational approaches for human behavior detection. Arch. Comput. Meth. Eng. 26, 831–863 (2019)
Guo, M.H., et al.: Attention mechanisms in computer vision: a survey. Comp. Visual Media 8(3), 331–368 (2022)
Kukkala, V.K., Thiruloga, S.V., Pasricha, S.: Latte: LSTM self-attention based anomaly detection in embedded automotive platforms. ACM Trans. Embedded Comput. Syst. 20(5s), 1–23 (2021)
Ma, H., Zhang, L.: Attention-based framework for weakly supervised video anomaly detection. J. Supercomput. 78(6), 8409–8429 (2022)
Nasaruddin, N., Muchtar, K., Afdhal, A., Dwiyantoro, A.P.J.: Deep anomaly detection through visual attention in surveillance videos. J. Big Data 7(1), 1–17 (2020)
Li, Q., Yang, R., Xiao, F., Bhanu, B., Zhang, F.: Attention-based anomaly detection in multi-view surveillance videos. Knowl.-Based Syst. 252, 109348 (2022)
Du, Z., Zhang, G., Gan, J., Wu, C., Liu, X.: VadTR: video anomaly detection with transformer. In: 2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE), pp. 711–714. IEEE (2022)
Aslam, N.K., Narayanan, S., Kolekar, M.H.: Bidirectional motion learning using transformer based Siamese network for video anomaly detection (2023)
Pang, W., He, Q., Li, Y.: Predicting skeleton trajectories using a skeleton-transformer for video anomaly detection. Multimedia Syst. 28(4), 1481–1494 (2022)
Monitoring Human Activity – Detection of Events. http://mha.cs.umn.edu/proj_events.shtml#crowd. Last accessed 15 Aug 2023
Avenue Dataset for Abnormal Event Detection. http://www.cse.cuhk.edu.hk/leojia/projects/detectabnormal/dataset.html last accessed 2023/08/14
Bianculli, M., et al.: A dataset for automatic violence detection in videos. Data in Brief 33, 106587 (2020)
Ullah, F.U.M., et al.: AI-assisted edge vision for violence detection in IoT-based industrial surveillance networks. IEEE Trans. Industr. Inf. 18(8), 5359–5370 (2021)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Nigam, S., Singh, R., Singh, M.K., Singh, V.K.: Multiview human activity recognition using uniform rotation invariant local binary patterns. J. Ambient. Intell. Humaniz. Comput. 14(5), 4705–4725 (2022)
Ghadi, Y.Y., et al.: Extrinsic behaviour prediction of pedestrian via maximum entropy Markov model and graph-based features mining. Appl. Sci. 12(12), 5985 (2022)
Alarfaj, M., et al.: Automatic anomaly monitoring in public surveillance areas. Intell. Autom. Soft Comput. 35(3), 2655–2671 (2023)
Ullah, W., Ullah, A., Hussain, T., Khan, Z.A., Baik, S.W.: An efficient anomaly recognition framework using an attention residual LSTM in surveillance videos. Sensors 21(8), 2811 (2021)
Ilyas, Z., Aziz, Z., Qasim, T., Bhatti, N., Hayat, M.F.: A hybrid deep network based approach for crowd anomaly detection. Multimed. Tools Appl. 80, 24053–24067 (2021)
Abdullah, F., Jalal, A.: Semantic segmentation based crowd tracking and anomaly detection via neuro-fuzzy classifier in smart surveillance system. Arab. J. Sci. Eng. 48(2), 2173–2190 (2023)
Aziz, Z., Bhatti, N., Mahmood, H., Zia, M.: Video anomaly detection and localization based on appearance and motion models. Multimed. Tools Appl. 80(17), 25875–25895 (2021)
Sharif, M.H., Jiao, L., Omlin, C.W.: Deep crowd anomaly detection by fusing reconstruction and prediction networks. Electronics 12(7), 1517 (2023)
Deepak, K., Chandrakala, S., Mohan, C.K.: Residual spatiotemporal autoencoder for unsupervised video anomaly detection. SIViP 15(1), 215–222 (2021)
Khaire, P., Kumar, P.: A semi-supervised deep learning based video anomaly detection framework using RGB-D for surveillance of real-world critical environments. Forensic Sci. Int. Digit. Investig. 40, 301346 (2022)
Ehsan, T.Z., Nahvi, M., Mohtavipour, S.M.: An accurate violence detection framework using unsupervised spatial-temporal action translation network. Vis. Comput. 1–21 (2023). https://doi.org/10.1007/s00371-023-02865-3
Yuan, H., Cai, Z., Zhou, H., Wang, Y., Chen, X.: Transanomaly: video anomaly detection using video vision transformer. IEEE Access 9, 123977–123986 (2021)
Yang, M., et al.: Transformer-based deep learning model and video dataset for unsafe action identification in construction projects. Autom. Constr. 146, 104703 (2023)
Lee, Y., Kang, P.: AnoViT: unsupervised anomaly detection and localization with vision transformer-based encoder-decoder. IEEE Access 10, 46717–46724 (2022)
Ullah, W., Hussain, T., Ullah, F.U.M., Lee, M.Y., Baik, S.W.: TransCNN: hybrid CNN and transformer mechanism for surveillance anomaly detection. Eng. Appl. Artif. Intell. 123, 106173 (2023)
Pillai, A., Verma, G.V., Sen, D.: Transformer based self-context aware prediction for few-shot anomaly detection in videos. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 3485–3489. IEEE (2022)
Sivalingan, H., Anandakrishnan, N.: Crowd localization and anomaly detection using video anomaly scoring network. Math. Stat. Eng. Appl. 72(1), 825–837 (2023)
Sernani, P., Falcionelli, N., Tomassini, S., Contardo, P., Dragoni, A.F.: Deep learning for automatic violence detection: Tests on the AIRTLab dataset. IEEE Access 9, 160580–160595 (2021)
Kumar, A., Khari, M.: Efficient video anomaly detection using variational autoencoder. In: 2023 International Conference on Communication System, Computing and IT Applications (CSCITA), pp. 50–55. IEEE (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Garg, A., Nigam, S., Singh, R., Shastri, A., Singh, M. (2024). Spatial Attention Transformer Based Framework for Anomaly Classification in Image Sequences. In: Choi, B.J., Singh, D., Tiwary, U.S., Chung, WY. (eds) Intelligent Human Computer Interaction. IHCI 2023. Lecture Notes in Computer Science, vol 14532. Springer, Cham. https://doi.org/10.1007/978-3-031-53830-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-53830-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53829-2
Online ISBN: 978-3-031-53830-8
eBook Packages: Computer ScienceComputer Science (R0)