research-article

Hierarchical Scene Normality-Binding Modeling for Anomaly Detection in Surveillance Videos

Authors:
Qianyue Bao

Xidian University & Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education & International Research Center for Intelligent Perception and Computation & Joint International Research Laboratory of Intelligent Perception and Computation, with the Joint International Research Laboratory of Intelligent Perception and Computation, Xi'an, China

Xidian University & Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education & International Research Center for Intelligent Perception and Computation & Joint International Research Laboratory of Intelligent Perception and Computation, with the Joint International Research Laboratory of Intelligent Perception and Computation, Xi'an, China
View Profile

,
Fang Liu

Xidian University & Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education & International Research Center for Intelligent Perception and Computation & Joint International Research Laboratory of Intelligent Perception and Computation, with the Joint International Research Laboratory of Intelligent Perception and Computation, Xi'an, China

Xidian University & Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education & International Research Center for Intelligent Perception and Computation & Joint International Research Laboratory of Intelligent Perception and Computation, with the Joint International Research Laboratory of Intelligent Perception and Computation, Xi'an, China
View Profile

,
Yang Liu

Xidian University & Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education & International Research Center for Intelligent Perception and Computation & Joint International Research Laboratory of Intelligent Perception and Computation, with the Joint International Research Laboratory of Intelligent Perception and Computation, Xi'an, China

Xidian University & Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education & International Research Center for Intelligent Perception and Computation & Joint International Research Laboratory of Intelligent Perception and Computation, with the Joint International Research Laboratory of Intelligent Perception and Computation, Xi'an, China
View Profile

,
Licheng Jiao

Xidian University & Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education & International Research Center for Intelligent Perception and Computation & Joint International Research Laboratory of Intelligent Perception and Computation, with the Joint International Research Laboratory of Intelligent Perception and Computation, Xi'an, China

Xidian University & Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education & International Research Center for Intelligent Perception and Computation & Joint International Research Laboratory of Intelligent Perception and Computation, with the Joint International Research Laboratory of Intelligent Perception and Computation, Xi'an, China
View Profile

,
Xu Liu

Xidian University & Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education & International Research Center for Intelligent Perception and Computation & Joint International Research Laboratory of Intelligent Perception and Computation, with the Joint International Research Laboratory of Intelligent Perception and Computation, Xi'an, China

Xidian University & Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education & International Research Center for Intelligent Perception and Computation & Joint International Research Laboratory of Intelligent Perception and Computation, with the Joint International Research Laboratory of Intelligent Perception and Computation, Xi'an, China
View Profile

,
Lingling Li

Xidian University & Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education & International Research Center for Intelligent Perception and Computation & Joint International Research Laboratory of Intelligent Perception and Computation, with the Joint International Research Laboratory of Intelligent Perception and Computation, Xi'an, China

Xidian University & Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education & International Research Center for Intelligent Perception and Computation & Joint International Research Laboratory of Intelligent Perception and Computation, with the Joint International Research Laboratory of Intelligent Perception and Computation, Xi'an, China
View Profile

MM '22: Proceedings of the 30th ACM International Conference on MultimediaOctober 2022Pages 6103–6112https://doi.org/10.1145/3503161.3548199

Published:10 October 2022Publication History

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 6103–6112

ABSTRACT

Anomaly detection in surveillance videos is an important topic in the multimedia community, which requires efficient scene context extraction and the capture of temporal information as a basis for decision. From the perspective of hierarchical modeling, we parse the surveillance scene from global to local and propose a Hierarchical Scene Normality-Binding Modeling framework (HSNBM) to handle anomaly detection. For the static background hierarchy, we design a Region Clustering-driven Multi-task Memory Autoencoder (RCM-MemAE), which can simultaneously perform region segmentation and scene reconstruction. The normal prototypes of each local region are stored, and the frame reconstruction error is subsequently amplified by global memory augmentation. For the dynamic foreground object hierarchy, we employ a Scene-Object Binding Frame Prediction module (SOB-FP) to bind all foreground objects in the frame with the prototypes stored in the background hierarchy according their positions, thus fully exploit the normality relationship between foreground and background. The bound features are then fed into the decoder to predict the future movement of the objects. With the binding mechanism between foreground and background, HSNBM effectively integrates the "reconstruction" and "prediction" tasks and builds a semantic bridge between the two hierarchies. Finally, HSNBM fuses the anomaly scores of the two hierarchies to make a comprehensive decision. Extensive empirical studies on three standard video anomaly detection datasets demonstrate the effectiveness of the proposed HSNBM framework.

Supplemental Material

Available for Download

mp4

MM22-fp1861.mp4 (15.2 MB)

References

Ruichu Cai, Hao Zhang, Wen Liu, Shenghua Gao, and Zhifeng Hao. 2021. Appearance-motion memory consistency network for video anomaly detection. In Proc. AAAI. 938--946.Google ScholarCross Ref
Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. 2018. Deep clustering for unsupervised learning of visual features. In Proceedings of the European conference on computer vision (ECCV). 132--149.Google ScholarDigital Library
Raghavendra Chalapathy and Sanjay Chawla. 2019. Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407 (2019).Google Scholar
Yunpeng Chang, Zhigang Tu, Wei Xie, and Junsong Yuan. 2020. Clustering driven deep autoencoder for video anomaly detection. In European Conference on Computer Vision. Springer, 329--345.Google ScholarDigital Library
Dongyue Chen, Lingyi Yue, Xingya Chang, Ming Xu, and Tong Jia. 2021. NM-GAN: Noise-modulated generative adversarial network for video anomaly detection. Pattern Recognition, Vol. 116 (2021), 107969.Google ScholarDigital Library
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV). 801--818.Google ScholarDigital Library
Jang Hyun Cho, Utkarsh Mall, Kavita Bala, and Bharath Hariharan. 2021. Picie: Unsupervised semantic segmentation using invariance and equivariance in clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16794--16804.Google Scholar
Jia-Chang Feng, Fa-Ting Hong, and Wei-Shi Zheng. 2021a. Mist: Multiple instance self-training framework for video anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14009--14018.Google ScholarCross Ref
Xinyang Feng, Dongjin Song, Yuncong Chen, Zhengzhang Chen, Jingchao Ni, and Haifeng Chen. 2021b. Convolutional Transformer based Dual Discriminator Generative Adversarial Networks for Video Anomaly Detection. In Proceedings of the 29th ACM International Conference on Multimedia. 5546--5554.Google ScholarDigital Library
Jie Gao, Licheng Jiao, Fang Liu, Shuyuan Yang, Biao Hou, and Xu Liu. 2021. Multiscale Curvelet Scattering Network. IEEE Transactions on Neural Networks and Learning Systems (2021).Google ScholarCross Ref
Mariana-Iuliana Georgescu, Antonio Barbalau, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, and Mubarak Shah. 2021. Anomaly detection in video via self-supervised and multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12742--12752.Google ScholarCross Ref
Dong Gong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Mansour, Svetha Venkatesh, and Anton van den Hengel. 2019. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1705--1714.Google Scholar
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems, Vol. 27 (2014).Google ScholarDigital Library
Alex Graves, Greg Wayne, Malcolm Reynolds, Tim Harley, Ivo Danihelka, Agnieszka Grabska-Barwi'nska, Sergio Gómez Colmenarejo, Edward Grefenstette, Tiago Ramalho, John Agapiou, et al. 2016. Hybrid computing using a neural network with dynamic external memory. Nature, Vol. 538, 7626 (2016), 471--476.Google Scholar
Zhicheng Guo, Jiaxuan Zhao, Licheng Jiao, Xu Liu, and Fang Liu. 2021. A Universal Quaternion Hypergraph Network for Multimodal Video Question Answering. IEEE Transactions on Multimedia (2021).Google ScholarCross Ref
Mahmudul Hasan, Jonghyun Choi, Jan Neumann, Amit K Roy-Chowdhury, and Larry S Davis. 2016. Learning temporal regularity in video sequences. In Proceedings of the IEEE conference on computer vision and pattern recognition. 733--742.Google ScholarCross Ref
Ryota Hinami, Tao Mei, and Shin'ichi Satoh. 2017. Joint detection and recounting of abnormal events by learning deep generic knowledge. In Proceedings of the IEEE international conference on computer vision. 3619--3627.Google ScholarCross Ref
Radu Tudor Ionescu, Fahad Shahbaz Khan, Mariana-Iuliana Georgescu, and Ling Shao. 2019. Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7842--7851.Google ScholarCross Ref
Licheng Jiao, Ronghua Shang, Fang Liu, and Weitong Zhang. 2020. Brain and Nature-Inspired Learning, Computation and Recognition. Elsevier.Google Scholar
Licheng Jiao, Ruohan Zhang, Fang Liu, Shuyuan Yang, Biao Hou, Lingling Li, and Xu Tang. 2021. New generation deep learning for video object detection: A survey. IEEE Transactions on Neural Networks and Learning Systems (2021).Google Scholar
Jaechul Kim and Kristen Grauman. 2009. Observe locally, infer globally: a space-time MRF for detecting abnormal activities with incremental updates. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 2921--2928.Google ScholarCross Ref
Sangmin Lee, Hak Gu Kim, Dae Hwi Choi, Hyung-Il Kim, and Yong Man Ro. 2021. Video prediction recalling long-term motion context via memory alignment learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3054--3063.Google ScholarCross Ref
Sangho Lee, Jinyoung Sung, Youngjae Yu, and Gunhee Kim. 2018. A memory network approach for story-based temporal summarization of 360 videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1410--1419.Google ScholarCross Ref
Shuo Li, Fang Liu, and Licheng Jiao. 2022. Self-training multi-sequence learning with Transformer for weakly supervised video anomaly detection. Proceedings of the AAAI, Virtual, Vol. 24 (2022).Google ScholarCross Ref
Weixin Li, Vijay Mahadevan, and Nuno Vasconcelos. 2013. Anomaly detection and localization in crowded scenes. IEEE transactions on pattern analysis and machine intelligence, Vol. 36, 1 (2013), 18--32.Google Scholar
Fang Liu, Xiaoxue Qian, Licheng Jiao, Xiangrong Zhang, Lingling Li, and Yuanhao Cui. 2022. Contrastive Learning-Based Dual Dynamic GCN for SAR Image Scene Classification. IEEE Transactions on Neural Networks and Learning Systems (2022).Google ScholarCross Ref
Wen Liu, Weixin Luo, Dongze Lian, and Shenghua Gao. 2018. Future frame prediction for anomaly detection--a new baseline. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6536--6545.Google ScholarCross Ref
Zhian Liu, Yongwei Nie, Chengjiang Long, Qing Zhang, and Guiqing Li. 2021. A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13588--13597.Google ScholarCross Ref
Cewu Lu, Jianping Shi, and Jiaya Jia. 2013. Abnormal event detection at 150 fps in matlab. In Proceedings of the IEEE international conference on computer vision. 2720--2727.Google ScholarDigital Library
Yiwei Lu, K Mahesh Kumar, Seyed shahabeddin Nabavi, and Yang Wang. 2019. Future frame prediction using convolutional vrnn for anomaly detection. In 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1--8.Google ScholarCross Ref
Weixin Luo, Wen Liu, and Shenghua Gao. 2017a. Remembering history with convolutional lstm for anomaly detection. In 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 439--444.Google ScholarCross Ref
Weixin Luo, Wen Liu, and Shenghua Gao. 2017b. A revisit of sparse coding based anomaly detection in stacked rnn framework. In Proceedings of the IEEE international conference on computer vision. 341--349.Google ScholarCross Ref
Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2019. Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2507--2516.Google ScholarCross Ref
Vijay Mahadevan, Weixin Li, Viral Bhalodia, and Nuno Vasconcelos. 2010. Anomaly detection in crowded scenes. In 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 1975--1981.Google Scholar
Trong-Nguyen Nguyen and Jean Meunier. 2019. Anomaly detection in video sequence with appearance-motion correspondence. In Proceedings of the IEEE/CVF international conference on computer vision. 1273--1283.Google ScholarCross Ref
Hyunjong Park, Jongyoun Noh, and Bumsub Ham. 2020. Learning memory-guided normality for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14372--14381.Google ScholarCross Ref
Xiaoxue Qian, Fang Liu, Licheng Jiao, Xiangrong Zhang, Puhua Chen, Lingling Li, Jing Gu, and Yuanhao Cui. 2021. A Hybrid Network With Structural Constraints for SAR Image Scene Classification. IEEE Transactions on Geoscience and Remote Sensing, Vol. 60 (2021), 1--17.Google Scholar
Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6479--6488.Google ScholarCross Ref
Che Sun, Yunde Jia, Yao Hu, and Yuwei Wu. 2020. Scene-aware context reasoning for unsupervised abnormal event detection in videos. In Proceedings of the 28th ACM International Conference on Multimedia. 184--192.Google ScholarDigital Library
Yao Tang, Lin Zhao, Shanshan Zhang, Chen Gong, Guangyu Li, and Jian Yang. 2020. Integrating prediction and reconstruction for anomaly detection. Pattern Recognition Letters, Vol. 129 (2020), 123--130.Google ScholarCross Ref
Yu Tian, Guansong Pang, Yuanhong Chen, Rajvinder Singh, Johan W Verjans, and Gustavo Carneiro. 2021. Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4975--4986.Google ScholarCross Ref
Xuanzhao Wang, Zhengping Che, Bo Jiang, Ning Xiao, Ke Yang, Jian Tang, Jieping Ye, Jingyu Wang, and Qi Qi. 2021. Robust unsupervised video anomaly detection by multipath frame prediction. IEEE Transactions on Neural Networks and Learning Systems (2021).Google Scholar
Zitong Wu, Biao Hou, and Licheng Jiao. 2020. Multiscale CNN with autoencoder regularization joint contextual attention network for SAR image classification. IEEE Transactions on Geoscience and Remote Sensing, Vol. 59, 2 (2020), 1200--1213.Google ScholarCross Ref
Muchao Ye, Xiaojiang Peng, Weihao Gan, Wei Wu, and Yu Qiao. 2019. Anopcn: Video anomaly detection via deep predictive coding network. In Proceedings of the 27th ACM International Conference on Multimedia. 1805--1813.Google ScholarDigital Library
Guang Yu, Siqi Wang, Zhiping Cai, En Zhu, Chuanfu Xu, Jianping Yin, and Marius Kloft. 2020. Cloze test helps: Effective video anomaly detection via learning to complete video events. In Proceedings of the 28th ACM International Conference on Multimedia. 583--591.Google ScholarDigital Library
Jongmin Yu, Younkwan Lee, Kin Choong Yow, Moongu Jeon, and Witold Pedrycz. 2021. Abnormal event detection and localization via adversarial event prediction. IEEE Transactions on Neural Networks and Learning Systems (2021).Google Scholar
Muhammad Zaigham Zaheer, Jin-ha Lee, Marcella Astrid, and Seung-Ik Lee. 2020. Old is gold: Redefining the adversarially learned one-class classifier training paradigm. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14183--14193.Google Scholar
Fan Zhang, Yanqin Chen, Zhihang Li, Zhibin Hong, Jingtuo Liu, Feifei Ma, Junyu Han, and Errui Ding. 2019. Acfnet: Attentional class feature network for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6798--6807.Google ScholarCross Ref
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017b. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2881--2890.Google ScholarCross Ref
Yiru Zhao, Bing Deng, Chen Shen, Yao Liu, Hongtao Lu, and Xian-Sheng Hua. 2017a. Spatio-temporal autoencoder for video anomaly detection. In Proceedings of the 25th ACM international conference on Multimedia. 1933--1941.Google ScholarDigital Library
Yuanhong Zhong, Xia Chen, Jinyang Jiang, and Fan Ren. 2022. A cascade reconstruction model with generalization ability evaluation for anomaly detection in videos. Pattern Recognition, Vol. 122 (2022), 108336.Google ScholarDigital Library
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition. 633--641.Google ScholarCross Ref
Joey Tianyi Zhou, Le Zhang, Zhiwen Fang, Jiawei Du, Xi Peng, and Yang Xiao. 2019. Attention-driven loss for anomaly detection in video surveillance. IEEE transactions on circuits and systems for video technology, Vol. 30, 12 (2019), 4639--4647.Google Scholar

Index Terms

Hierarchical Scene Normality-Binding Modeling for Anomaly Detection in Surveillance Videos
1. Computing methodologies

Recommendations

Saliency detection integrating both background and foreground information

In this paper, we propose a novel saliency detection algorithm. The saliency of an image element is defined not only as its contrast to the background but as its similarity to the foreground. First, we extract background seeds as well as their spatial ...
Read More
Image Segmentation Using Proportion of Foreground to Background Algorithm
IS3C '12: Proceedings of the 2012 International Symposium on Computer, Consumer and Control

The medical image is widely used in medicine diagnosis for the science-sound qualitative and quantitative analysis. To achieve a clear image, the technology of the image segmentation is the key point. Therefore, this paper proposes a new image ...
Read More
Multiscale background modelling and segmentation
DSP'09: Proceedings of the 16th international conference on Digital Signal Processing

A new multiscale approach to motion based segmentation of objects in video sequences is presented. While image features extracted at multiple scales are commonly used within the pattern recognition community, they have seldom been employed for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
background
foreground
hierarchical modeling
video anomaly detection
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 354
  Total Downloads
- Downloads (Last 12 months)194
- Downloads (Last 6 weeks)22
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hierarchical Scene Normality-Binding Modeling for Anomaly Detection in Surveillance Videos

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Saliency detection integrating both background and foreground information

Image Segmentation Using Proportion of Foreground to Background Algorithm

Multiscale background modelling and segmentation