Explainable Action-Recognition Based Approach for Unsupervised Video Anomaly Detection

Yang, Hui; Khouadjia, Mostepha; Seghouani, Nacéra; Ma, Yue; Delmas, Serge

doi:10.1007/978-3-031-77392-1_14

Hui Yang^16,17,
Mostepha Khouadjia¹⁶,
Nacéra Seghouani¹⁷,
Yue Ma¹⁷ &
…
Serge Delmas¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15046))

Included in the following conference series:

International Symposium on Visual Computing

158 Accesses

Abstract

In recent years, there has been a growing focus on video anomaly detection from unlabeled data, raising the task of Unsupervised Video Abnormal Detection (UVAD). However, most of the existing approaches predominantly focus on global features derived from entire frames, while overlooking local features associated with individual objects. This oversight can result in sub-optimal performance and loss of semantics and explainability of the video scene provided by local features. In this paper, for the task UVAD, we introduce a Global-Local Explainable Network (GLE), which focuses on local features and is based on a Multi-Instance Learning (MIL) method. The proposed approach not only outperforms UVAD state-of-the-art approaches, but also provides explanations about the anomaly by leveraging the rich information within local features. Our experiments demonstrate that GLE achieves the state-of-the-art performance in both detection and explanation. GLE achieves up to 6.22 improvement in the AUC for abnormal event detection on two widely used datasets, UCF-crime and ShanghaiTech. Moreover, GLE offers up to 62.94% enhancement in explanatory capabilities as validated on the X-MAN dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, D., Yue, L., Chang, X., Xu, M., Jia, T.: Nm-gan: noise-modulated generative adversarial network for video anomaly detection. Pattern Recogn. 116, 107969 (2021)
Article Google Scholar
Fan, H., et al.: PyTorchVideo: a deep learning library for video understanding. In: Proceedings of the 29th ACM International Conference on Multimedia (2021)
Google Scholar
Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition. In: ICCV, pp. 6201–6210 (2019)
Google Scholar
Gao, C., Xu, J., Zou, Y., Huang, J.-B.: DRG: dual relation graph for human-object interaction detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 696–712. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_41
Chapter Google Scholar
Gritsenko, A.A., et al.: End-to-end spatio-temporal action localisation with video transformers. CoRR abs/ arXiv: 2304.12160 (2023)
Gu, C., et al.: AVA: a video dataset of spatio-temporally localized atomic visual actions. In: CVPR, pp. 6047–6056 (2018)
Google Scholar
Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning temporal regularity in video sequences. In: CVPR, pp. 733–742 (2016)
Google Scholar
Kim, J., Kim, D., Yi, S., Lee, T.: Semi-orthogonal embedding for efficient unsupervised anomaly segmentation. CoRR arXiv: 2105.14737 (2021)
Lee, J., Nam, W.J., Lee, S.W.: Multi-contextual predictions with vision transformer for video anomaly detection. In: ICPR, pp. 1012–1018. IEEE (2022)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter MATH Google Scholar
Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 FPS in MATLAB. In: ICCV, pp. 2720–2727 (2013)
Google Scholar
Park, H., Noh, J., Ham, B.: Learning memory-guided normality for anomaly detection. In: CVPR, pp. 14372–14381 (2020)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, vol. 139, pp. 8748–8763 (2021)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. CoRR abs/ arXiv: 1804.02767 (2018)
Reiss, T., Hoshen, Y.: Attribute-based representations for accurate and interpretable video anomaly detection. CoRR abs/ arXiv: 2212.00789 (2022)
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression, pp. 658–666 (2019)
Google Scholar
Ryali, C., et al.: Hiera: a hierarchical vision transformer without the bells-and-whistles. In: ICML, vol. 202, pp. 29441–29454 (2023)
Google Scholar
Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: CVPR, pp. 6479–6488 (2018)
Google Scholar
Szymanowicz, S., Charles, J., Cipolla, R.: X-MAN: explaining multiple sources of anomalies in video. In: CVPRW, pp. 3224–3232 (2021)
Google Scholar
Szymanowicz, S., Charles, J., Cipolla, R.: Discrete neural representations for explainable anomaly detection. In: WACV, pp. 1506–1514 (2022)
Google Scholar
Tian, Y., Pang, G., Chen, Y., Singh, R., Verjans, J.W., Carneiro, G.: Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In: CVPR, p. 4975–4986 (2021)
Google Scholar
Tur, A.O., Dall’Asen, N., Beyan, C., Ricci, E.: Exploring diffusion models for unsupervised video anomaly detection. In: ICIP, pp. 2540–2544 (2023)
Google Scholar
Wang, J., Cherian, A.: GODS: generalized one-class discriminative subspaces for anomaly detection. In: ICCV, pp. 8200–8210 (2019)
Google Scholar
Wang, L., et al.: Videomae V2: scaling video masked autoencoders with dual masking. In: CVPR, pp. 14549–14560 (2023)
Google Scholar
Wang, X., et al.: Robust unsupervised video anomaly detection by multipath frame prediction. IEEE Trans. Neural Netw. Learn. Syst. 33(6), 2301–2312 (2022)
Article MathSciNet MATH Google Scholar
Wu, J., Hsieh, H., Chen, D., Fuh, C., Liu, T.: Self-supervised sparse representation for video anomaly detection. In: Avidan, S., Brostow, G.J., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV, vol. 13673, pp. 729–745 (2022). https://doi.org/10.1007/978-3-031-19778-9_42
Yu, G., Wang, S., Cai, Z., Liu, X., Xu, C., Wu, C.: Deep anomaly discovery from unlabeled videos via normality advantage and self-paced refinement. In: CVPR, pp. 13967–13978 (2022)
Google Scholar
Yuan, H., Cai, Z., Zhou, H., Wang, Y., Chen, X.: Transanomaly: video anomaly detection using video vision transformer. IEEE Access 9, 123977–123986 (2021)
Article Google Scholar
Zaheer, M.Z., Lee, J., Astrid, M., Lee, S.: Old is gold: redefining the adversarially learned one-class classifier training paradigm. In: CVPR, pp. 14171–14181 (2020)
Google Scholar
Zaheer, M.Z., Mahmood, A., Astrid, M., Lee, S.-I.: CLAWS: clustering assisted weakly supervised learning with normalcy suppression for anomalous event detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 358–376. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_22
Chapter Google Scholar
Zaheer, M.Z., Mahmood, A., Khan, M.H., Segù, M., Yu, F., Lee, S.: Generative cooperative learning for unsupervised video anomaly detection. In: CVPR, pp. 14724–14734 (2022)
Google Scholar
Zaheer, M.Z., Mahmood, A., Shin, H., Lee, S.I.: A self-reasoning framework for anomaly detection using video-level labels. IEEE Signal Process. Lett. 27, 1705–1709 (2020)
Article MATH Google Scholar
Zhong, J.X., Li, N., Kong, W., Liu, S., Li, T.H., Li, G.: Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. In: CVPR, p. 1237–1246 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

IRT SystemX, Palaiseau, France
Hui Yang & Mostepha Khouadjia
LISN, CNRS, Université Paris-Saclay, Gif-sur-Yvette, France
Hui Yang, Nacéra Seghouani & Yue Ma
Airbus DS SLC, Toulouse, France
Serge Delmas

Authors

Hui Yang
View author publications
You can also search for this author in PubMed Google Scholar
Mostepha Khouadjia
View author publications
You can also search for this author in PubMed Google Scholar
Nacéra Seghouani
View author publications
You can also search for this author in PubMed Google Scholar
Yue Ma
View author publications
You can also search for this author in PubMed Google Scholar
Serge Delmas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Yang .

Editor information

Editors and Affiliations

University of Nevada Reno, Reno, NV, USA
George Bebis
Johns Hopkins University, Baltimore, MD, USA
Vishal Patel
Chinese University of Hong Kong, Shatin, Hong Kong
Jinwei Gu
University of California, Davis, CA, USA
Julian Panetta
George Mason University, Fairfax, VA, USA
Yotam Gingold
University of Georgia, Athens, GA, USA
Kyle Johnsen
Colorado State University, Fort Collins, CO, USA
Mohammed Safayet Arefin
Indian Institute of Technology, Kanpur, Uttar Pradesh, India
Soumya Dutta
Los Alamos National Lab., Los Alamos, NM, USA
Ayan Biswas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, H., Khouadjia, M., Seghouani, N., Ma, Y., Delmas, S. (2025). Explainable Action-Recognition Based Approach for Unsupervised Video Anomaly Detection. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2024. Lecture Notes in Computer Science, vol 15046. Springer, Cham. https://doi.org/10.1007/978-3-031-77392-1_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-77392-1_14
Published: 22 January 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-77391-4
Online ISBN: 978-3-031-77392-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Explainable Action-Recognition Based Approach for Unsupervised Video Anomaly Detection