research-article

Multi-instance learning anomaly event detection based on Transformer

Authors:

Yuelei XiaoAuthors Info & Claims

AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

Pages 846 - 851

https://doi.org/10.1145/3573942.3574104

Published: 16 May 2023 Publication History

Abstract

Multi-instance learning (MIL) is the dominant approach for weakly supervised anomaly detection in surveillance videos. The shortcomings of using the features extracted by networks such as Convolutional 3D (C3D) or inflated 3D-ConvNet (I3D) alone to extract video context features have prompted the emergence of various abnormal event detection algorithms based on attention mechanisms. Vision Transformer (ViT) applies transformer to the field of computer vision for the first time and demonstrates its superior performance. In this paper, we propose a multi-instance learning anomaly event detection method based on Transformer, called MIL-ViT, which uses an inflated I3D pre-training model to extract Spatio-temporal features, and then inputs features into the ViT encoder to extract the particular salient pieces of information, and the anomaly scores are obtained. Furthermore, we introduce the MIL ranking loss and the center loss function for better training. The experimental results on two benchmark datasets (i.e. ShanghaiTech and UCF-Crime) show that the AUC value of our method is significantly improved compared with several state-of-the-art methods in recent years.

References

[1]

A World With a Billion Cameras Watching You Is Just Around the . Retrieved May 1, 2022 from https://www.wsj.com/articles/a-billion-surveillance-cameras-forecast-to-be-watching-within-two-years-11575565402

[2]

Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-World Anomaly Detection in Surveillance Videos. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929. Retrieved from https://doi.org/10.48550/arXiv.2010.11929

[4]

Joao Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]

Mujtaba Asad, He Jiang, Jie Yang, Enmei Tu, and Aftab Ahmad Malik. 2021. Multi-Stream 3D latent feature clustering for abnormality detection in videos. Applied Intelligence 52, 1 (2021), 1126-1143.

Digital Library

[6]

Mahmudul Hasan, Jonghyun Choi, Jan Neumann, Amit K. Roy-Chowdhury, and Larry S. Davis. 2016. Learning Temporal Regularity in Video Sequences. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]

Yong Shean Chong and Yong Haur Tay. 2017. Abnormal Event Detection in Videos Using Spatiotemporal Autoencoder. Advances in Neural Networks - ISNN 2017, 189-196.

[8]

Trong Nguyen Nguyen and Jean Meunier. 2019. Anomaly Detection in Video Sequence With Appearance-Motion Correspondence. 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]

Herman Prawiro, Jian-Wei Peng, Tse-Yu Pan, and Min-Chun Hu. 2020. Abnormal Event Detection in Surveillance Videos Using Two-Stream Decoder. 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[10]

Boyang Wan, Yuming Fang, Xue Xia, and Jiajie Mei. 2020. Weakly Supervised Video Anomaly Detection via Center-Guided Discriminative Learning. 2020 IEEE International Conference on Multimedia and Expo (ICME).

[11]

Xiao Jinsheng, Shen Mengyao, Jiang Mingjun, Lei Junfeng, Bao Zhenyu. 2021. Detection of abnormal behavior in surveillance video with packet attention mechanism. Journal of Automation: 1-10[2021-12-18]. https://doi.org/10.16383/j.aas.c190805

[12]

Shikha Dubey, Abhijeet Boragule, and Moongu Jeon. 2019. 3D ResNet with Ranking Loss Function for Abnormal Activity Detection in Videos. 2019 International Conference on Control, Automation and Information Sciences (ICCAIS).

[13]

Yu Tian, Guansong Pang, Yuanhong Chen, Rajvinder Singh, Johan W. Verjans, and Gustavo Carneiro. 2021. Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning. 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]

Zhenghua Zhang, Zhangjie Gong, and Qingqing Hong. 2021. A Survey on: Application of Transformer in Computer Vision. The Proceedings of The 8th International Conference on Intelligent Systems and Image Processing 2021 (2021).

[15]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021).

[16]

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. Computer Vision – ECCV 2020, 213-229.

Digital Library

[17]

Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017.The kinetics human action video dataset. arXiv.1705.06950. Retrieved from https://doi.org/10.48550/arXiv.1705.06950

[18]

Weichao Zhang, Guanjun Wang, Mengxing Huang, Hongyu Wang, and Shaoping Wen. 2021. Generative Adversarial Networks for Abnormal Event Detection in Videos Based on Self-Attention Mechanism. IEEE Access 9, 124847-124860.

[19]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems(NIPS). Curran Associates Inc., Red Hook, NY, USA, 6000–6010.

[20]

Wen Liu, Weixin Luo, Dongze Lian, and Shenghua Gao. 2018. Future Frame Prediction for Anomaly Detection - A New Baseline. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]

Jia-Xing Zhong, Nannan Li, Weijie Kong, Shan Liu, Thomas H. Li, and Ge Li. 2019. Graph Convolutional Label Noise Cleaner: Train a Plug-And-Play Action Classifier for Anomaly Detection. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]

Jiangong Zhang, Laiyun Qing, and Jun Miao. 2019. Temporal Convolutional Network with Complementary Inner Bag Loss for Weakly Supervised Anomaly Detection. 2019 IEEE International Conference on Image Processing (ICIP).

[23]

Ammar Mansoor Kamoona, Amirali Khodadadian Gosta, Alireza Bab-Hadiashar, and Reza Hoseinnezhad. 2020. Multiple Instance-Based Video Anomaly Detection using Deep Temporal Encoding-Decoding. arXiv:2007.01548 Retrieved from https://doi.org/10.48550/arXiv.2007.01548

[24]

Peng Wu, Jing Liu, Yujia Shi, Yujia Sun, Fangtao Shao, Zhaoyang Wu, and Zhiwei Yang. 2020. Not only Look, But Also Listen: Learning Multimodal Violence Detection Under Weak Supervision. Computer Vision – ECCV 2020 (2020), 322-339.

Digital Library

[25]

Paul Michel, Omer Levy, and Graham Neubig. 2019. Are Sixteen Heads Really Better than One? arXiv:1905.10650 Retrieved from https://doi.org/10.48550/arXiv.1905.10650

[26]

Snehashis Majhi, Srijan Das, and Francois Bremond. 2021. DAM: Dissimilarity Attention Module for Weakly-supervised Video Anomaly Detection. 2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

Index Terms

Multi-instance learning anomaly event detection based on Transformer
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene anomaly detection

Recommendations

Learning from Positive and Unlabeled Multi-Instance Bags in Anomaly Detection
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

In the multi-instance learning (MIL) setting instances are grouped together into bags. Labels are provided only for the bags and not on the level of individual instances. A positive bag label means that at least one instance inside the bag is positive, ...
Multiple instance-based video anomaly detection using deep temporal encoding–decoding
Abstract
In this paper, we propose a weakly supervised deep temporal encoding–decoding solution for anomaly detection in surveillance videos using multiple instance learning. The proposed approach uses both abnormal and normal video clips ...
Highlights
- A deep weakly supervised anomaly detection in videos is proposed.
- Weak ...
Abnormal event detection via multi-instance dictionary learning
IDEAL'12: Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning

In this paper, we present a method for detecting abnormal events in videos. In the proposed method, we define an event containing several sub-events. Sub-events can be viewed as instances and an event as a bag of instances in the multi-instance learning ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

September 2022

1221 pages

ISBN:9781450396899

DOI:10.1145/3573942

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 May 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

AIPR 2022

AIPR 2022: 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

September 23 - 25, 2022

Xiamen, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
59
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)2

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten