Skip to main content

Explainable Action-Recognition Based Approach for Unsupervised Video Anomaly Detection

  • Conference paper
  • First Online:
Advances in Visual Computing (ISVC 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15046))

Included in the following conference series:

  • 158 Accesses

Abstract

In recent years, there has been a growing focus on video anomaly detection from unlabeled data, raising the task of Unsupervised Video Abnormal Detection (UVAD). However, most of the existing approaches predominantly focus on global features derived from entire frames, while overlooking local features associated with individual objects. This oversight can result in sub-optimal performance and loss of semantics and explainability of the video scene provided by local features. In this paper, for the task UVAD, we introduce a Global-Local Explainable Network (GLE), which focuses on local features and is based on a Multi-Instance Learning (MIL) method. The proposed approach not only outperforms UVAD state-of-the-art approaches, but also provides explanations about the anomaly by leveraging the rich information within local features. Our experiments demonstrate that GLE achieves the state-of-the-art performance in both detection and explanation. GLE achieves up to 6.22 improvement in the AUC for abnormal event detection on two widely used datasets, UCF-crime and ShanghaiTech. Moreover, GLE offers up to 62.94% enhancement in explanatory capabilities as validated on the X-MAN dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, D., Yue, L., Chang, X., Xu, M., Jia, T.: Nm-gan: noise-modulated generative adversarial network for video anomaly detection. Pattern Recogn. 116, 107969 (2021)

    Article  Google Scholar 

  2. Fan, H., et al.: PyTorchVideo: a deep learning library for video understanding. In: Proceedings of the 29th ACM International Conference on Multimedia (2021)

    Google Scholar 

  3. Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition. In: ICCV, pp. 6201–6210 (2019)

    Google Scholar 

  4. Gao, C., Xu, J., Zou, Y., Huang, J.-B.: DRG: dual relation graph for human-object interaction detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 696–712. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_41

    Chapter  Google Scholar 

  5. Gritsenko, A.A., et al.: End-to-end spatio-temporal action localisation with video transformers. CoRR abs/ arXiv: 2304.12160 (2023)

  6. Gu, C., et al.: AVA: a video dataset of spatio-temporally localized atomic visual actions. In: CVPR, pp. 6047–6056 (2018)

    Google Scholar 

  7. Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning temporal regularity in video sequences. In: CVPR, pp. 733–742 (2016)

    Google Scholar 

  8. Kim, J., Kim, D., Yi, S., Lee, T.: Semi-orthogonal embedding for efficient unsupervised anomaly segmentation. CoRR arXiv: 2105.14737 (2021)

  9. Lee, J., Nam, W.J., Lee, S.W.: Multi-contextual predictions with vision transformer for video anomaly detection. In: ICPR, pp. 1012–1018. IEEE (2022)

    Google Scholar 

  10. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  MATH  Google Scholar 

  11. Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 FPS in MATLAB. In: ICCV, pp. 2720–2727 (2013)

    Google Scholar 

  12. Park, H., Noh, J., Ham, B.: Learning memory-guided normality for anomaly detection. In: CVPR, pp. 14372–14381 (2020)

    Google Scholar 

  13. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, vol. 139, pp. 8748–8763 (2021)

    Google Scholar 

  14. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. CoRR abs/ arXiv: 1804.02767 (2018)

  15. Reiss, T., Hoshen, Y.: Attribute-based representations for accurate and interpretable video anomaly detection. CoRR abs/ arXiv: 2212.00789 (2022)

  16. Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression, pp. 658–666 (2019)

    Google Scholar 

  17. Ryali, C., et al.: Hiera: a hierarchical vision transformer without the bells-and-whistles. In: ICML, vol. 202, pp. 29441–29454 (2023)

    Google Scholar 

  18. Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: CVPR, pp. 6479–6488 (2018)

    Google Scholar 

  19. Szymanowicz, S., Charles, J., Cipolla, R.: X-MAN: explaining multiple sources of anomalies in video. In: CVPRW, pp. 3224–3232 (2021)

    Google Scholar 

  20. Szymanowicz, S., Charles, J., Cipolla, R.: Discrete neural representations for explainable anomaly detection. In: WACV, pp. 1506–1514 (2022)

    Google Scholar 

  21. Tian, Y., Pang, G., Chen, Y., Singh, R., Verjans, J.W., Carneiro, G.: Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In: CVPR, p. 4975–4986 (2021)

    Google Scholar 

  22. Tur, A.O., Dall’Asen, N., Beyan, C., Ricci, E.: Exploring diffusion models for unsupervised video anomaly detection. In: ICIP, pp. 2540–2544 (2023)

    Google Scholar 

  23. Wang, J., Cherian, A.: GODS: generalized one-class discriminative subspaces for anomaly detection. In: ICCV, pp. 8200–8210 (2019)

    Google Scholar 

  24. Wang, L., et al.: Videomae V2: scaling video masked autoencoders with dual masking. In: CVPR, pp. 14549–14560 (2023)

    Google Scholar 

  25. Wang, X., et al.: Robust unsupervised video anomaly detection by multipath frame prediction. IEEE Trans. Neural Netw. Learn. Syst. 33(6), 2301–2312 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  26. Wu, J., Hsieh, H., Chen, D., Fuh, C., Liu, T.: Self-supervised sparse representation for video anomaly detection. In: Avidan, S., Brostow, G.J., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV, vol. 13673, pp. 729–745 (2022). https://doi.org/10.1007/978-3-031-19778-9_42

  27. Yu, G., Wang, S., Cai, Z., Liu, X., Xu, C., Wu, C.: Deep anomaly discovery from unlabeled videos via normality advantage and self-paced refinement. In: CVPR, pp. 13967–13978 (2022)

    Google Scholar 

  28. Yuan, H., Cai, Z., Zhou, H., Wang, Y., Chen, X.: Transanomaly: video anomaly detection using video vision transformer. IEEE Access 9, 123977–123986 (2021)

    Article  Google Scholar 

  29. Zaheer, M.Z., Lee, J., Astrid, M., Lee, S.: Old is gold: redefining the adversarially learned one-class classifier training paradigm. In: CVPR, pp. 14171–14181 (2020)

    Google Scholar 

  30. Zaheer, M.Z., Mahmood, A., Astrid, M., Lee, S.-I.: CLAWS: clustering assisted weakly supervised learning with normalcy suppression for anomalous event detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 358–376. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_22

    Chapter  Google Scholar 

  31. Zaheer, M.Z., Mahmood, A., Khan, M.H., Segù, M., Yu, F., Lee, S.: Generative cooperative learning for unsupervised video anomaly detection. In: CVPR, pp. 14724–14734 (2022)

    Google Scholar 

  32. Zaheer, M.Z., Mahmood, A., Shin, H., Lee, S.I.: A self-reasoning framework for anomaly detection using video-level labels. IEEE Signal Process. Lett. 27, 1705–1709 (2020)

    Article  MATH  Google Scholar 

  33. Zhong, J.X., Li, N., Kong, W., Liu, S., Li, T.H., Li, G.: Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. In: CVPR, p. 1237–1246 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, H., Khouadjia, M., Seghouani, N., Ma, Y., Delmas, S. (2025). Explainable Action-Recognition Based Approach for Unsupervised Video Anomaly Detection. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2024. Lecture Notes in Computer Science, vol 15046. Springer, Cham. https://doi.org/10.1007/978-3-031-77392-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-77392-1_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-77391-4

  • Online ISBN: 978-3-031-77392-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics