research-article

Scene-Aware Context Reasoning for Unsupervised Abnormal Event Detection in Videos

Authors:
Che Sun

Beijing Institute of Technology, Beijing, China

Beijing Institute of Technology, Beijing, China
View Profile

,
Yunde Jia

Beijing Institute of Technology, Beijing, China

Beijing Institute of Technology, Beijing, China
View Profile

,
Yao Hu

Alibaba Youku Cognitive and Intelligent Lab, Beijing, China

Alibaba Youku Cognitive and Intelligent Lab, Beijing, China
View Profile

,
Yuwei Wu

Beijing Institute of Technology, Beijing, China

Beijing Institute of Technology, Beijing, China
View Profile

MM '20: Proceedings of the 28th ACM International Conference on MultimediaOctober 2020Pages 184–192https://doi.org/10.1145/3394171.3413887

Published:12 October 2020Publication History

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Pages 184–192

ABSTRACT

In this paper, we propose a scene-aware context reasoning method that exploits context information from visual features for unsupervised abnormal event detection in videos, which bridges the semantic gap between visual context and the meaning of abnormal events. In particular, we build na spatio-temporal context graph to model visual context information including appearances of objects, spatio-temporal relationships among objects and scene types. The context information is encoded into the nodes and edges of the graph, and their states are iteratively updated by using multiple RNNs with message passing for context reasoning. To infer the spatio-temporal context graph in various scenes, we develop a graph-based deep Gaussian mixture model for scene clustering in an unsupervised manner. We then compute frame-level anomaly scores based on the context information to discriminate abnormal events in various scenes. Evaluations on three challenging datasets, including the UCF-Crime, Avenue, and ShanghaiTech datasets, demonstrate the effectiveness of our method.

Supplemental Material

3394171.3413887.mp4

mp4

160.3 MB

Download

References

Bar and Moshe. 2004. Visual objects in context. Nature Rev. Neurosci., Vol. 5, 8 (2004), 617--629.Google ScholarCross Ref
Myung Jin Choi, Antonio Torralba, and Alan S Willsky. 2012. Context models and out-of-context objects. Pattern Recognit. Lett., Vol. 33 (2012), 853--862.Google ScholarDigital Library
Yong Shean Chong and Yong Haur Tay. 2017. Abnormal Event Detection in Videos Using Spatiotemporal Autoencoder. In Proc. Adv. Neural Net. 189--196.Google ScholarCross Ref
Yachuang Feng, Yuan Yuan, and Xiaoqiang Lu. 2016. Deep Representation for Abnormal Event Detection in Crowded Scenes. In Proc. ACM Conf. Multimedia. 591--595.Google ScholarDigital Library
Dong Gong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Mansour, Svetha Venkatesh, and Anton van den Hengel. 2019. Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection. In Proc. IEEE Int. Conf. Comput. Vis. 1705--1714.Google Scholar
Mahmudul Hasan, Jonghyun Choi, Jan Neumann, Amit K. Roy-Chowdhury, and Larry S. Davis. 2016. Learning Temporal Regularity in Video Sequences. In Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. 733--742.Google Scholar
Mahmudul Hasan, Sujoy Paul, Anastasios I. Mourikis, and Amit K. Roy-Chowdhury. 2020. Context-Aware Query Selection for Active Learning in Event Recognition. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 42, 3 (2020), 554--567.Google ScholarCross Ref
Radu Tudor Ionescu, Fahad Shahbaz Khan, Mariana-Iuliana Georgescu, and Ling Shao. 2019. Object-Centric Auto-Encoders and Dummy Anomalies for Abnormal Event Detection in Video. In Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. 7842--7851.Google ScholarCross Ref
Ashesh Jain, Amir Roshan Zamir, Silvio Savarese, and Ashutosh Saxena. 2016. Structural-RNN: Deep Learning on Spatio-Temporal Graphs. In Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. 5308--5317.Google ScholarCross Ref
Eric Jardim, Lucas A. Thomaz, Eduardo A. B. da Silva, and Sergio L. Netto. 2020. Domain-Transformable Sparse Representation for Anomaly Detection in Moving-Camera Videos. IEEE Trans. Image Processing, Vol. 29 (2020), 1329--1343.Google ScholarCross Ref
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google Scholar
Yeara Kozlov and Tino Weinkauf. 2015. Persistence1D: Extracting and filtering minima and maxima of 1d functions. http://www.csc.kth.se/ weinkauf/notes/persistence1d.html .Google Scholar
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. (2017), 32--73.Google Scholar
Michael J. V. Leach, Ed P. Sparks, and Neil Martin Robertson. 2014. Contextual anomaly detection in crowded surveillance scenes. Pattern Recognit. Lett., Vol. 44 (2014), 71--79.Google ScholarDigital Library
Weixin Li, Vijay Mahadevan, and Nuno Vasconcelos. 2013. Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 36 (2013), 18--32.Google Scholar
Kun Liu and Huadong Ma. 2019. Exploring Background-bias for Anomaly Detection in Surveillance Videos. In Proc. ACM Conf. Multimedia. 1490--1499.Google ScholarDigital Library
Wen Liu, Weixin Luo, Dongze Lian, and Shenghua Gao. 2018. Future Frame Prediction for Anomaly Detection--A New Baseline. (2018), 6536--6545.Google Scholar
Cewu Lu, Jianping Shi, and Jiaya Jia. 2013. Abnormal Event Detection at 150 FPS in MATLAB. In Proc. IEEE Int. Conf. Comput. Vis. 2720--2727.Google ScholarDigital Library
Weixin Luo, Wen Liu, and Shenghua Gao. 2017. A Revisit of Sparse Coding Based Anomaly Detection in Stacked RNN Framework. In Proc. IEEE Int. Conf. Comput. Vis. 341--349.Google ScholarCross Ref
Jefferson Ryan Medel and Andreas E. Savakis. 2016. Anomaly Detection in Video Using Predictive Convolutional Long Short-Term Memory Networks. arXiv preprint arXiv:1612.00390 (2016).Google Scholar
Romero Morais, Vuong Le, Truyen Tran, Budhaditya Saha, Moussa Reda Mansour, and Svetha Venkatesh. 2019. Learning Regularity in Skeleton Trajectories for Anomaly Detection in Videos. In Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. 11996--12004.Google ScholarCross Ref
James Munkres. 1957. Algorithms for the assignment and transportation problems. J. soc. ind. appl. Math., Vol. 5 (1957), 32--38.Google Scholar
Jongsuk Oh, Hong-In Kim, and Rae-Hong Park. 2017. Context-based abnormal object detection using the fully-connected conditional random fields. Pattern Recognit. Lett., Vol. 98 (2017), 16--25.Google ScholarDigital Library
Guansong Pang, Cheng Yan, Chunhua Shen, Anton van den Hengel, and Xiao Bai. 2020. Self-trained Deep Ordinal Regression for End-to-End Video Anomaly Detection. In Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. 12173--12182.Google Scholar
Sangdon Park, Wonsik Kim, and Kyoung Mu Lee. 2012. Abnormal object detection by canonical scene-based contextual model. In Proc. Eur. Conf. Comput. Vis. 651--664.Google ScholarDigital Library
Mengshi Qi, Yunhong Wang, Jie Qin, Annan Li, Jiebo Luo, and Luc Van Gool. 2020. stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition. IEEE Trans. Circuits Syst. Video Techn., Vol. 30, 2 (2020), 549--565.Google ScholarCross Ref
Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 39, 6 (2017), 1137--1149.Google ScholarDigital Library
Mohammad Sabokrou, Mohsen Fayyaz, Mahmood Fathy, and Reinhard Klette. 2017. Deep-Cascade: Cascading 3D Deep Neural Networks for Fast Anomaly Detection and Localization in Crowded Scenes. IEEE Trans. Image Processing, Vol. 26, 4 (2017), 1992--2004.Google ScholarDigital Library
Hao Song, Che Sun, Xinxiao Wu, Mei Chen, and Yunde Jia. 2020 b. Learning Normal Patterns via Adversarial Attention-Based Autoencoder for Abnormal Event Detection in Videos. IEEE Trans. Multimedia, Vol. 22, 8 (2020), 2138--2148.Google ScholarCross Ref
Wenfeng Song, Shuai Li, Tao Chang, Aimin Hao, Qinping Zhao, and Hong Qin. 2020 a. Context-Interactive CNN for Person Re-Identification. IEEE Trans. Image Processing, Vol. 29 (2020), 2860--2874.Google ScholarCross Ref
Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-World Anomaly Detection in Surveillance Videos. In Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. 6479--6488.Google ScholarCross Ref
Che Sun, Hao Song, Xinxiao Wu, and Yunde Jia. 2019 a. Learning Weighted Video Segments for Temporal Action Localization. In Proc. Pattern Recognit. Comput. Vis. 181--192.Google ScholarCross Ref
Jiangxin Sun, Jiafeng Xie, Jianfang Hu, Zihang Lin, Jianhuang Lai, Wenjun Zeng, and Wei-Shi Zheng. 2019 b. Predicting Future Instance Segmentation with Contextual Pyramid ConvLSTMs. In Proc. ACM Conf. Multimedia. 2043--2051.Google ScholarDigital Library
Kaihua Tang, Hanwang Zhang, Baoyuan Wu, Wenhan Luo, and Wei Liu. 2019. Learning to compose dynamic tree structures for visual contexts. In Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. 6619--6628.Google ScholarCross Ref
Radu Tudor Ionescu, Sorina Smeureanu, Bogdan Alexe, and Marius Popescu. 2017. Unmasking the Abnormal Events in Video. In Proc. IEEE Int. Conf. Vis. Pattern Recognit. 2895--2903.Google ScholarCross Ref
Siqi Wang, Yijie Zeng, Qiang Liu, Chengzhang Zhu, En Zhu, and Jianping Yin. 2018. Detecting Abnormality without Knowing Normality: A Two-stage Approach for Unsupervised Video Abnormal Event Detection. In Proc. ACM Conf. Multimedia. 636--644.Google ScholarDigital Library
Dan Xu, Yan Yan, Elisa Ricci, and Nicu Sebe. 2017a. Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput. Vis. Image Underst., Vol. 156 (2017), 117--127.Google ScholarDigital Library
Danfei Xu, Yuke Zhu, Christopher B Choy, and Li Fei-Fei. 2017b. Scene graph generation by iterative message passing. In Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. 5410--5419.Google ScholarCross Ref
Ke Xu, Tanfeng Sun, and Xinghao Jiang. 2020. Video Anomaly Detection and Localization Based on an Adaptive Intra-Frame Classification Network. IEEE Trans. Multimedia, Vol. 22, 2 (2020), 394--406.Google ScholarDigital Library
Muchao Ye, Xiaojiang Peng, Weihao Gan, Wei Wu, and Yu Qiao. 2019. AnoPCN: Video Anomaly Detection via Deep Predictive Coding Network. In Proc. ACM Conf. Multimedia. 1805--1813.Google ScholarDigital Library
Yingying Zhu, Nandita M. Nayak, and Amit K. Roy-Chowdhury. 2013. Context-Aware Activity Recognition and Anomaly Detection in Video. J. Sel. Topics Signal Processing, Vol. 7 (2013), 91--101.Google ScholarCross Ref
Bo Zong, Qi Song, Martin Renqiang Min, Wei Cheng, Cristian Lumezanu, Dae-ki Cho, and Haifeng Chen. 2018. Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection. In Proc. Int. Conf. Learn. Repren.Google Scholar

Index Terms

Scene-Aware Context Reasoning for Unsupervised Abnormal Event Detection in Videos
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding
        Scene anomaly detection

Recommendations

A survey of context modelling and reasoning techniques

Development of context-aware applications is inherently complex. These applications adapt to changing context information: physical context, computational context, and user context/tasks. Context information is gathered from a variety of sources that ...
Read More
Context Reasoning Using Contextual Graph
CITWORKSHOPS '08: Proceedings of the 2008 IEEE 8th International Conference on Computer and Information Technology Workshops

Nowadays the combination of virtually free computation and ubiquitous environment has formed the new domain of pervasive computing. Context reasoning part in the context awareness seems to become one of the most important goals of that computing trend. ...
Read More
A framework for visual-context-aware object detection in still images

Visual context provides cues about an object's presence, position and size within the observed scene, which should be used to increase the performance of object detection techniques. However, in computer vision, object detectors typically ignore this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
General Chairs:
Chang Wen Chen
Chinese University of Hong Kong, Shenzhen, China
,
Rita Cucchiara
UNIMORE, Italy
,
Xian-Sheng Hua
Alibaba Group, China
,
Program Chairs:
Guo-Jun Qi
Futurewei Technologies, USA
,
Elisa Ricci
UNITN & Fondazione Bruno Kessler, Italy
,
Zhengyou Zhang
Tencent, China
,
Roger Zimmermann
National University of Singapore, Singapore
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
abnormal event detection
context reasoning
spatio-temporal context graph
visual context
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 43
  Total Citations
  View Citations
- 925
  Total Downloads
- Downloads (Last 12 months)160
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scene-Aware Context Reasoning for Unsupervised Abnormal Event Detection in Videos

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

A survey of context modelling and reasoning techniques

Context Reasoning Using Contextual Graph

A framework for visual-context-aware object detection in still images