ABSTRACT
In this paper, we propose a scene-aware context reasoning method that exploits context information from visual features for unsupervised abnormal event detection in videos, which bridges the semantic gap between visual context and the meaning of abnormal events. In particular, we build na spatio-temporal context graph to model visual context information including appearances of objects, spatio-temporal relationships among objects and scene types. The context information is encoded into the nodes and edges of the graph, and their states are iteratively updated by using multiple RNNs with message passing for context reasoning. To infer the spatio-temporal context graph in various scenes, we develop a graph-based deep Gaussian mixture model for scene clustering in an unsupervised manner. We then compute frame-level anomaly scores based on the context information to discriminate abnormal events in various scenes. Evaluations on three challenging datasets, including the UCF-Crime, Avenue, and ShanghaiTech datasets, demonstrate the effectiveness of our method.
Supplemental Material
- Bar and Moshe. 2004. Visual objects in context. Nature Rev. Neurosci., Vol. 5, 8 (2004), 617--629.Google ScholarCross Ref
- Myung Jin Choi, Antonio Torralba, and Alan S Willsky. 2012. Context models and out-of-context objects. Pattern Recognit. Lett., Vol. 33 (2012), 853--862.Google ScholarDigital Library
- Yong Shean Chong and Yong Haur Tay. 2017. Abnormal Event Detection in Videos Using Spatiotemporal Autoencoder. In Proc. Adv. Neural Net. 189--196.Google ScholarCross Ref
- Yachuang Feng, Yuan Yuan, and Xiaoqiang Lu. 2016. Deep Representation for Abnormal Event Detection in Crowded Scenes. In Proc. ACM Conf. Multimedia. 591--595.Google ScholarDigital Library
- Dong Gong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Mansour, Svetha Venkatesh, and Anton van den Hengel. 2019. Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection. In Proc. IEEE Int. Conf. Comput. Vis. 1705--1714.Google Scholar
- Mahmudul Hasan, Jonghyun Choi, Jan Neumann, Amit K. Roy-Chowdhury, and Larry S. Davis. 2016. Learning Temporal Regularity in Video Sequences. In Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. 733--742.Google Scholar
- Mahmudul Hasan, Sujoy Paul, Anastasios I. Mourikis, and Amit K. Roy-Chowdhury. 2020. Context-Aware Query Selection for Active Learning in Event Recognition. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 42, 3 (2020), 554--567.Google ScholarCross Ref
- Radu Tudor Ionescu, Fahad Shahbaz Khan, Mariana-Iuliana Georgescu, and Ling Shao. 2019. Object-Centric Auto-Encoders and Dummy Anomalies for Abnormal Event Detection in Video. In Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. 7842--7851.Google ScholarCross Ref
- Ashesh Jain, Amir Roshan Zamir, Silvio Savarese, and Ashutosh Saxena. 2016. Structural-RNN: Deep Learning on Spatio-Temporal Graphs. In Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. 5308--5317.Google ScholarCross Ref
- Eric Jardim, Lucas A. Thomaz, Eduardo A. B. da Silva, and Sergio L. Netto. 2020. Domain-Transformable Sparse Representation for Anomaly Detection in Moving-Camera Videos. IEEE Trans. Image Processing, Vol. 29 (2020), 1329--1343.Google ScholarCross Ref
- Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google Scholar
- Yeara Kozlov and Tino Weinkauf. 2015. Persistence1D: Extracting and filtering minima and maxima of 1d functions. http://www.csc.kth.se/ weinkauf/notes/persistence1d.html .Google Scholar
- Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. (2017), 32--73.Google Scholar
- Michael J. V. Leach, Ed P. Sparks, and Neil Martin Robertson. 2014. Contextual anomaly detection in crowded surveillance scenes. Pattern Recognit. Lett., Vol. 44 (2014), 71--79.Google ScholarDigital Library
- Weixin Li, Vijay Mahadevan, and Nuno Vasconcelos. 2013. Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 36 (2013), 18--32.Google Scholar
- Kun Liu and Huadong Ma. 2019. Exploring Background-bias for Anomaly Detection in Surveillance Videos. In Proc. ACM Conf. Multimedia. 1490--1499.Google ScholarDigital Library
- Wen Liu, Weixin Luo, Dongze Lian, and Shenghua Gao. 2018. Future Frame Prediction for Anomaly Detection--A New Baseline. (2018), 6536--6545.Google Scholar
- Cewu Lu, Jianping Shi, and Jiaya Jia. 2013. Abnormal Event Detection at 150 FPS in MATLAB. In Proc. IEEE Int. Conf. Comput. Vis. 2720--2727.Google ScholarDigital Library
- Weixin Luo, Wen Liu, and Shenghua Gao. 2017. A Revisit of Sparse Coding Based Anomaly Detection in Stacked RNN Framework. In Proc. IEEE Int. Conf. Comput. Vis. 341--349.Google ScholarCross Ref
- Jefferson Ryan Medel and Andreas E. Savakis. 2016. Anomaly Detection in Video Using Predictive Convolutional Long Short-Term Memory Networks. arXiv preprint arXiv:1612.00390 (2016).Google Scholar
- Romero Morais, Vuong Le, Truyen Tran, Budhaditya Saha, Moussa Reda Mansour, and Svetha Venkatesh. 2019. Learning Regularity in Skeleton Trajectories for Anomaly Detection in Videos. In Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. 11996--12004.Google ScholarCross Ref
- James Munkres. 1957. Algorithms for the assignment and transportation problems. J. soc. ind. appl. Math., Vol. 5 (1957), 32--38.Google Scholar
- Jongsuk Oh, Hong-In Kim, and Rae-Hong Park. 2017. Context-based abnormal object detection using the fully-connected conditional random fields. Pattern Recognit. Lett., Vol. 98 (2017), 16--25.Google ScholarDigital Library
- Guansong Pang, Cheng Yan, Chunhua Shen, Anton van den Hengel, and Xiao Bai. 2020. Self-trained Deep Ordinal Regression for End-to-End Video Anomaly Detection. In Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. 12173--12182.Google Scholar
- Sangdon Park, Wonsik Kim, and Kyoung Mu Lee. 2012. Abnormal object detection by canonical scene-based contextual model. In Proc. Eur. Conf. Comput. Vis. 651--664.Google ScholarDigital Library
- Mengshi Qi, Yunhong Wang, Jie Qin, Annan Li, Jiebo Luo, and Luc Van Gool. 2020. stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition. IEEE Trans. Circuits Syst. Video Techn., Vol. 30, 2 (2020), 549--565.Google ScholarCross Ref
- Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 39, 6 (2017), 1137--1149.Google ScholarDigital Library
- Mohammad Sabokrou, Mohsen Fayyaz, Mahmood Fathy, and Reinhard Klette. 2017. Deep-Cascade: Cascading 3D Deep Neural Networks for Fast Anomaly Detection and Localization in Crowded Scenes. IEEE Trans. Image Processing, Vol. 26, 4 (2017), 1992--2004.Google ScholarDigital Library
- Hao Song, Che Sun, Xinxiao Wu, Mei Chen, and Yunde Jia. 2020 b. Learning Normal Patterns via Adversarial Attention-Based Autoencoder for Abnormal Event Detection in Videos. IEEE Trans. Multimedia, Vol. 22, 8 (2020), 2138--2148.Google ScholarCross Ref
- Wenfeng Song, Shuai Li, Tao Chang, Aimin Hao, Qinping Zhao, and Hong Qin. 2020 a. Context-Interactive CNN for Person Re-Identification. IEEE Trans. Image Processing, Vol. 29 (2020), 2860--2874.Google ScholarCross Ref
- Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-World Anomaly Detection in Surveillance Videos. In Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. 6479--6488.Google ScholarCross Ref
- Che Sun, Hao Song, Xinxiao Wu, and Yunde Jia. 2019 a. Learning Weighted Video Segments for Temporal Action Localization. In Proc. Pattern Recognit. Comput. Vis. 181--192.Google ScholarCross Ref
- Jiangxin Sun, Jiafeng Xie, Jianfang Hu, Zihang Lin, Jianhuang Lai, Wenjun Zeng, and Wei-Shi Zheng. 2019 b. Predicting Future Instance Segmentation with Contextual Pyramid ConvLSTMs. In Proc. ACM Conf. Multimedia. 2043--2051.Google ScholarDigital Library
- Kaihua Tang, Hanwang Zhang, Baoyuan Wu, Wenhan Luo, and Wei Liu. 2019. Learning to compose dynamic tree structures for visual contexts. In Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. 6619--6628.Google ScholarCross Ref
- Radu Tudor Ionescu, Sorina Smeureanu, Bogdan Alexe, and Marius Popescu. 2017. Unmasking the Abnormal Events in Video. In Proc. IEEE Int. Conf. Vis. Pattern Recognit. 2895--2903.Google ScholarCross Ref
- Siqi Wang, Yijie Zeng, Qiang Liu, Chengzhang Zhu, En Zhu, and Jianping Yin. 2018. Detecting Abnormality without Knowing Normality: A Two-stage Approach for Unsupervised Video Abnormal Event Detection. In Proc. ACM Conf. Multimedia. 636--644.Google ScholarDigital Library
- Dan Xu, Yan Yan, Elisa Ricci, and Nicu Sebe. 2017a. Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput. Vis. Image Underst., Vol. 156 (2017), 117--127.Google ScholarDigital Library
- Danfei Xu, Yuke Zhu, Christopher B Choy, and Li Fei-Fei. 2017b. Scene graph generation by iterative message passing. In Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. 5410--5419.Google ScholarCross Ref
- Ke Xu, Tanfeng Sun, and Xinghao Jiang. 2020. Video Anomaly Detection and Localization Based on an Adaptive Intra-Frame Classification Network. IEEE Trans. Multimedia, Vol. 22, 2 (2020), 394--406.Google ScholarDigital Library
- Muchao Ye, Xiaojiang Peng, Weihao Gan, Wei Wu, and Yu Qiao. 2019. AnoPCN: Video Anomaly Detection via Deep Predictive Coding Network. In Proc. ACM Conf. Multimedia. 1805--1813.Google ScholarDigital Library
- Yingying Zhu, Nandita M. Nayak, and Amit K. Roy-Chowdhury. 2013. Context-Aware Activity Recognition and Anomaly Detection in Video. J. Sel. Topics Signal Processing, Vol. 7 (2013), 91--101.Google ScholarCross Ref
- Bo Zong, Qi Song, Martin Renqiang Min, Wei Cheng, Cristian Lumezanu, Dae-ki Cho, and Haifeng Chen. 2018. Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection. In Proc. Int. Conf. Learn. Repren.Google Scholar
Index Terms
- Scene-Aware Context Reasoning for Unsupervised Abnormal Event Detection in Videos
Recommendations
A survey of context modelling and reasoning techniques
Development of context-aware applications is inherently complex. These applications adapt to changing context information: physical context, computational context, and user context/tasks. Context information is gathered from a variety of sources that ...
Context Reasoning Using Contextual Graph
CITWORKSHOPS '08: Proceedings of the 2008 IEEE 8th International Conference on Computer and Information Technology WorkshopsNowadays the combination of virtually free computation and ubiquitous environment has formed the new domain of pervasive computing. Context reasoning part in the context awareness seems to become one of the most important goals of that computing trend. ...
A framework for visual-context-aware object detection in still images
Visual context provides cues about an object's presence, position and size within the observed scene, which should be used to increase the performance of object detection techniques. However, in computer vision, object detectors typically ignore this ...
Comments