Abstract
In recent years, there has been a growing focus on video anomaly detection from unlabeled data, raising the task of Unsupervised Video Abnormal Detection (UVAD). However, most of the existing approaches predominantly focus on global features derived from entire frames, while overlooking local features associated with individual objects. This oversight can result in sub-optimal performance and loss of semantics and explainability of the video scene provided by local features. In this paper, for the task UVAD, we introduce a Global-Local Explainable Network (GLE), which focuses on local features and is based on a Multi-Instance Learning (MIL) method. The proposed approach not only outperforms UVAD state-of-the-art approaches, but also provides explanations about the anomaly by leveraging the rich information within local features. Our experiments demonstrate that GLE achieves the state-of-the-art performance in both detection and explanation. GLE achieves up to 6.22 improvement in the AUC for abnormal event detection on two widely used datasets, UCF-crime and ShanghaiTech. Moreover, GLE offers up to 62.94% enhancement in explanatory capabilities as validated on the X-MAN dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, D., Yue, L., Chang, X., Xu, M., Jia, T.: Nm-gan: noise-modulated generative adversarial network for video anomaly detection. Pattern Recogn. 116, 107969 (2021)
Fan, H., et al.: PyTorchVideo: a deep learning library for video understanding. In: Proceedings of the 29th ACM International Conference on Multimedia (2021)
Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition. In: ICCV, pp. 6201–6210 (2019)
Gao, C., Xu, J., Zou, Y., Huang, J.-B.: DRG: dual relation graph for human-object interaction detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 696–712. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_41
Gritsenko, A.A., et al.: End-to-end spatio-temporal action localisation with video transformers. CoRR abs/ arXiv: 2304.12160 (2023)
Gu, C., et al.: AVA: a video dataset of spatio-temporally localized atomic visual actions. In: CVPR, pp. 6047–6056 (2018)
Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning temporal regularity in video sequences. In: CVPR, pp. 733–742 (2016)
Kim, J., Kim, D., Yi, S., Lee, T.: Semi-orthogonal embedding for efficient unsupervised anomaly segmentation. CoRR arXiv: 2105.14737 (2021)
Lee, J., Nam, W.J., Lee, S.W.: Multi-contextual predictions with vision transformer for video anomaly detection. In: ICPR, pp. 1012–1018. IEEE (2022)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 FPS in MATLAB. In: ICCV, pp. 2720–2727 (2013)
Park, H., Noh, J., Ham, B.: Learning memory-guided normality for anomaly detection. In: CVPR, pp. 14372–14381 (2020)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, vol. 139, pp. 8748–8763 (2021)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. CoRR abs/ arXiv: 1804.02767 (2018)
Reiss, T., Hoshen, Y.: Attribute-based representations for accurate and interpretable video anomaly detection. CoRR abs/ arXiv: 2212.00789 (2022)
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression, pp. 658–666 (2019)
Ryali, C., et al.: Hiera: a hierarchical vision transformer without the bells-and-whistles. In: ICML, vol. 202, pp. 29441–29454 (2023)
Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: CVPR, pp. 6479–6488 (2018)
Szymanowicz, S., Charles, J., Cipolla, R.: X-MAN: explaining multiple sources of anomalies in video. In: CVPRW, pp. 3224–3232 (2021)
Szymanowicz, S., Charles, J., Cipolla, R.: Discrete neural representations for explainable anomaly detection. In: WACV, pp. 1506–1514 (2022)
Tian, Y., Pang, G., Chen, Y., Singh, R., Verjans, J.W., Carneiro, G.: Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In: CVPR, p. 4975–4986 (2021)
Tur, A.O., Dall’Asen, N., Beyan, C., Ricci, E.: Exploring diffusion models for unsupervised video anomaly detection. In: ICIP, pp. 2540–2544 (2023)
Wang, J., Cherian, A.: GODS: generalized one-class discriminative subspaces for anomaly detection. In: ICCV, pp. 8200–8210 (2019)
Wang, L., et al.: Videomae V2: scaling video masked autoencoders with dual masking. In: CVPR, pp. 14549–14560 (2023)
Wang, X., et al.: Robust unsupervised video anomaly detection by multipath frame prediction. IEEE Trans. Neural Netw. Learn. Syst. 33(6), 2301–2312 (2022)
Wu, J., Hsieh, H., Chen, D., Fuh, C., Liu, T.: Self-supervised sparse representation for video anomaly detection. In: Avidan, S., Brostow, G.J., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV, vol. 13673, pp. 729–745 (2022). https://doi.org/10.1007/978-3-031-19778-9_42
Yu, G., Wang, S., Cai, Z., Liu, X., Xu, C., Wu, C.: Deep anomaly discovery from unlabeled videos via normality advantage and self-paced refinement. In: CVPR, pp. 13967–13978 (2022)
Yuan, H., Cai, Z., Zhou, H., Wang, Y., Chen, X.: Transanomaly: video anomaly detection using video vision transformer. IEEE Access 9, 123977–123986 (2021)
Zaheer, M.Z., Lee, J., Astrid, M., Lee, S.: Old is gold: redefining the adversarially learned one-class classifier training paradigm. In: CVPR, pp. 14171–14181 (2020)
Zaheer, M.Z., Mahmood, A., Astrid, M., Lee, S.-I.: CLAWS: clustering assisted weakly supervised learning with normalcy suppression for anomalous event detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 358–376. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_22
Zaheer, M.Z., Mahmood, A., Khan, M.H., Segù, M., Yu, F., Lee, S.: Generative cooperative learning for unsupervised video anomaly detection. In: CVPR, pp. 14724–14734 (2022)
Zaheer, M.Z., Mahmood, A., Shin, H., Lee, S.I.: A self-reasoning framework for anomaly detection using video-level labels. IEEE Signal Process. Lett. 27, 1705–1709 (2020)
Zhong, J.X., Li, N., Kong, W., Liu, S., Li, T.H., Li, G.: Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. In: CVPR, p. 1237–1246 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, H., Khouadjia, M., Seghouani, N., Ma, Y., Delmas, S. (2025). Explainable Action-Recognition Based Approach for Unsupervised Video Anomaly Detection. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2024. Lecture Notes in Computer Science, vol 15046. Springer, Cham. https://doi.org/10.1007/978-3-031-77392-1_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-77392-1_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-77391-4
Online ISBN: 978-3-031-77392-1
eBook Packages: Computer ScienceComputer Science (R0)