ABSTRACT
Anomaly detection in surveillance videos is an important topic in the multimedia community, which requires efficient scene context extraction and the capture of temporal information as a basis for decision. From the perspective of hierarchical modeling, we parse the surveillance scene from global to local and propose a Hierarchical Scene Normality-Binding Modeling framework (HSNBM) to handle anomaly detection. For the static background hierarchy, we design a Region Clustering-driven Multi-task Memory Autoencoder (RCM-MemAE), which can simultaneously perform region segmentation and scene reconstruction. The normal prototypes of each local region are stored, and the frame reconstruction error is subsequently amplified by global memory augmentation. For the dynamic foreground object hierarchy, we employ a Scene-Object Binding Frame Prediction module (SOB-FP) to bind all foreground objects in the frame with the prototypes stored in the background hierarchy according their positions, thus fully exploit the normality relationship between foreground and background. The bound features are then fed into the decoder to predict the future movement of the objects. With the binding mechanism between foreground and background, HSNBM effectively integrates the "reconstruction" and "prediction" tasks and builds a semantic bridge between the two hierarchies. Finally, HSNBM fuses the anomaly scores of the two hierarchies to make a comprehensive decision. Extensive empirical studies on three standard video anomaly detection datasets demonstrate the effectiveness of the proposed HSNBM framework.
Supplemental Material
Available for Download
- Ruichu Cai, Hao Zhang, Wen Liu, Shenghua Gao, and Zhifeng Hao. 2021. Appearance-motion memory consistency network for video anomaly detection. In Proc. AAAI. 938--946.Google ScholarCross Ref
- Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. 2018. Deep clustering for unsupervised learning of visual features. In Proceedings of the European conference on computer vision (ECCV). 132--149.Google ScholarDigital Library
- Raghavendra Chalapathy and Sanjay Chawla. 2019. Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407 (2019).Google Scholar
- Yunpeng Chang, Zhigang Tu, Wei Xie, and Junsong Yuan. 2020. Clustering driven deep autoencoder for video anomaly detection. In European Conference on Computer Vision. Springer, 329--345.Google ScholarDigital Library
- Dongyue Chen, Lingyi Yue, Xingya Chang, Ming Xu, and Tong Jia. 2021. NM-GAN: Noise-modulated generative adversarial network for video anomaly detection. Pattern Recognition, Vol. 116 (2021), 107969.Google ScholarDigital Library
- Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV). 801--818.Google ScholarDigital Library
- Jang Hyun Cho, Utkarsh Mall, Kavita Bala, and Bharath Hariharan. 2021. Picie: Unsupervised semantic segmentation using invariance and equivariance in clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16794--16804.Google Scholar
- Jia-Chang Feng, Fa-Ting Hong, and Wei-Shi Zheng. 2021a. Mist: Multiple instance self-training framework for video anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14009--14018.Google ScholarCross Ref
- Xinyang Feng, Dongjin Song, Yuncong Chen, Zhengzhang Chen, Jingchao Ni, and Haifeng Chen. 2021b. Convolutional Transformer based Dual Discriminator Generative Adversarial Networks for Video Anomaly Detection. In Proceedings of the 29th ACM International Conference on Multimedia. 5546--5554.Google ScholarDigital Library
- Jie Gao, Licheng Jiao, Fang Liu, Shuyuan Yang, Biao Hou, and Xu Liu. 2021. Multiscale Curvelet Scattering Network. IEEE Transactions on Neural Networks and Learning Systems (2021).Google ScholarCross Ref
- Mariana-Iuliana Georgescu, Antonio Barbalau, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, and Mubarak Shah. 2021. Anomaly detection in video via self-supervised and multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12742--12752.Google ScholarCross Ref
- Dong Gong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Mansour, Svetha Venkatesh, and Anton van den Hengel. 2019. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1705--1714.Google Scholar
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems, Vol. 27 (2014).Google ScholarDigital Library
- Alex Graves, Greg Wayne, Malcolm Reynolds, Tim Harley, Ivo Danihelka, Agnieszka Grabska-Barwi'nska, Sergio Gómez Colmenarejo, Edward Grefenstette, Tiago Ramalho, John Agapiou, et al. 2016. Hybrid computing using a neural network with dynamic external memory. Nature, Vol. 538, 7626 (2016), 471--476.Google Scholar
- Zhicheng Guo, Jiaxuan Zhao, Licheng Jiao, Xu Liu, and Fang Liu. 2021. A Universal Quaternion Hypergraph Network for Multimodal Video Question Answering. IEEE Transactions on Multimedia (2021).Google ScholarCross Ref
- Mahmudul Hasan, Jonghyun Choi, Jan Neumann, Amit K Roy-Chowdhury, and Larry S Davis. 2016. Learning temporal regularity in video sequences. In Proceedings of the IEEE conference on computer vision and pattern recognition. 733--742.Google ScholarCross Ref
- Ryota Hinami, Tao Mei, and Shin'ichi Satoh. 2017. Joint detection and recounting of abnormal events by learning deep generic knowledge. In Proceedings of the IEEE international conference on computer vision. 3619--3627.Google ScholarCross Ref
- Radu Tudor Ionescu, Fahad Shahbaz Khan, Mariana-Iuliana Georgescu, and Ling Shao. 2019. Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7842--7851.Google ScholarCross Ref
- Licheng Jiao, Ronghua Shang, Fang Liu, and Weitong Zhang. 2020. Brain and Nature-Inspired Learning, Computation and Recognition. Elsevier.Google Scholar
- Licheng Jiao, Ruohan Zhang, Fang Liu, Shuyuan Yang, Biao Hou, Lingling Li, and Xu Tang. 2021. New generation deep learning for video object detection: A survey. IEEE Transactions on Neural Networks and Learning Systems (2021).Google Scholar
- Jaechul Kim and Kristen Grauman. 2009. Observe locally, infer globally: a space-time MRF for detecting abnormal activities with incremental updates. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 2921--2928.Google ScholarCross Ref
- Sangmin Lee, Hak Gu Kim, Dae Hwi Choi, Hyung-Il Kim, and Yong Man Ro. 2021. Video prediction recalling long-term motion context via memory alignment learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3054--3063.Google ScholarCross Ref
- Sangho Lee, Jinyoung Sung, Youngjae Yu, and Gunhee Kim. 2018. A memory network approach for story-based temporal summarization of 360 videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1410--1419.Google ScholarCross Ref
- Shuo Li, Fang Liu, and Licheng Jiao. 2022. Self-training multi-sequence learning with Transformer for weakly supervised video anomaly detection. Proceedings of the AAAI, Virtual, Vol. 24 (2022).Google ScholarCross Ref
- Weixin Li, Vijay Mahadevan, and Nuno Vasconcelos. 2013. Anomaly detection and localization in crowded scenes. IEEE transactions on pattern analysis and machine intelligence, Vol. 36, 1 (2013), 18--32.Google Scholar
- Fang Liu, Xiaoxue Qian, Licheng Jiao, Xiangrong Zhang, Lingling Li, and Yuanhao Cui. 2022. Contrastive Learning-Based Dual Dynamic GCN for SAR Image Scene Classification. IEEE Transactions on Neural Networks and Learning Systems (2022).Google ScholarCross Ref
- Wen Liu, Weixin Luo, Dongze Lian, and Shenghua Gao. 2018. Future frame prediction for anomaly detection--a new baseline. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6536--6545.Google ScholarCross Ref
- Zhian Liu, Yongwei Nie, Chengjiang Long, Qing Zhang, and Guiqing Li. 2021. A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13588--13597.Google ScholarCross Ref
- Cewu Lu, Jianping Shi, and Jiaya Jia. 2013. Abnormal event detection at 150 fps in matlab. In Proceedings of the IEEE international conference on computer vision. 2720--2727.Google ScholarDigital Library
- Yiwei Lu, K Mahesh Kumar, Seyed shahabeddin Nabavi, and Yang Wang. 2019. Future frame prediction using convolutional vrnn for anomaly detection. In 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1--8.Google ScholarCross Ref
- Weixin Luo, Wen Liu, and Shenghua Gao. 2017a. Remembering history with convolutional lstm for anomaly detection. In 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 439--444.Google ScholarCross Ref
- Weixin Luo, Wen Liu, and Shenghua Gao. 2017b. A revisit of sparse coding based anomaly detection in stacked rnn framework. In Proceedings of the IEEE international conference on computer vision. 341--349.Google ScholarCross Ref
- Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2019. Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2507--2516.Google ScholarCross Ref
- Vijay Mahadevan, Weixin Li, Viral Bhalodia, and Nuno Vasconcelos. 2010. Anomaly detection in crowded scenes. In 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 1975--1981.Google Scholar
- Trong-Nguyen Nguyen and Jean Meunier. 2019. Anomaly detection in video sequence with appearance-motion correspondence. In Proceedings of the IEEE/CVF international conference on computer vision. 1273--1283.Google ScholarCross Ref
- Hyunjong Park, Jongyoun Noh, and Bumsub Ham. 2020. Learning memory-guided normality for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14372--14381.Google ScholarCross Ref
- Xiaoxue Qian, Fang Liu, Licheng Jiao, Xiangrong Zhang, Puhua Chen, Lingling Li, Jing Gu, and Yuanhao Cui. 2021. A Hybrid Network With Structural Constraints for SAR Image Scene Classification. IEEE Transactions on Geoscience and Remote Sensing, Vol. 60 (2021), 1--17.Google Scholar
- Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6479--6488.Google ScholarCross Ref
- Che Sun, Yunde Jia, Yao Hu, and Yuwei Wu. 2020. Scene-aware context reasoning for unsupervised abnormal event detection in videos. In Proceedings of the 28th ACM International Conference on Multimedia. 184--192.Google ScholarDigital Library
- Yao Tang, Lin Zhao, Shanshan Zhang, Chen Gong, Guangyu Li, and Jian Yang. 2020. Integrating prediction and reconstruction for anomaly detection. Pattern Recognition Letters, Vol. 129 (2020), 123--130.Google ScholarCross Ref
- Yu Tian, Guansong Pang, Yuanhong Chen, Rajvinder Singh, Johan W Verjans, and Gustavo Carneiro. 2021. Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4975--4986.Google ScholarCross Ref
- Xuanzhao Wang, Zhengping Che, Bo Jiang, Ning Xiao, Ke Yang, Jian Tang, Jieping Ye, Jingyu Wang, and Qi Qi. 2021. Robust unsupervised video anomaly detection by multipath frame prediction. IEEE Transactions on Neural Networks and Learning Systems (2021).Google Scholar
- Zitong Wu, Biao Hou, and Licheng Jiao. 2020. Multiscale CNN with autoencoder regularization joint contextual attention network for SAR image classification. IEEE Transactions on Geoscience and Remote Sensing, Vol. 59, 2 (2020), 1200--1213.Google ScholarCross Ref
- Muchao Ye, Xiaojiang Peng, Weihao Gan, Wei Wu, and Yu Qiao. 2019. Anopcn: Video anomaly detection via deep predictive coding network. In Proceedings of the 27th ACM International Conference on Multimedia. 1805--1813.Google ScholarDigital Library
- Guang Yu, Siqi Wang, Zhiping Cai, En Zhu, Chuanfu Xu, Jianping Yin, and Marius Kloft. 2020. Cloze test helps: Effective video anomaly detection via learning to complete video events. In Proceedings of the 28th ACM International Conference on Multimedia. 583--591.Google ScholarDigital Library
- Jongmin Yu, Younkwan Lee, Kin Choong Yow, Moongu Jeon, and Witold Pedrycz. 2021. Abnormal event detection and localization via adversarial event prediction. IEEE Transactions on Neural Networks and Learning Systems (2021).Google Scholar
- Muhammad Zaigham Zaheer, Jin-ha Lee, Marcella Astrid, and Seung-Ik Lee. 2020. Old is gold: Redefining the adversarially learned one-class classifier training paradigm. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14183--14193.Google Scholar
- Fan Zhang, Yanqin Chen, Zhihang Li, Zhibin Hong, Jingtuo Liu, Feifei Ma, Junyu Han, and Errui Ding. 2019. Acfnet: Attentional class feature network for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6798--6807.Google ScholarCross Ref
- Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017b. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2881--2890.Google ScholarCross Ref
- Yiru Zhao, Bing Deng, Chen Shen, Yao Liu, Hongtao Lu, and Xian-Sheng Hua. 2017a. Spatio-temporal autoencoder for video anomaly detection. In Proceedings of the 25th ACM international conference on Multimedia. 1933--1941.Google ScholarDigital Library
- Yuanhong Zhong, Xia Chen, Jinyang Jiang, and Fan Ren. 2022. A cascade reconstruction model with generalization ability evaluation for anomaly detection in videos. Pattern Recognition, Vol. 122 (2022), 108336.Google ScholarDigital Library
- Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition. 633--641.Google ScholarCross Ref
- Joey Tianyi Zhou, Le Zhang, Zhiwen Fang, Jiawei Du, Xi Peng, and Yang Xiao. 2019. Attention-driven loss for anomaly detection in video surveillance. IEEE transactions on circuits and systems for video technology, Vol. 30, 12 (2019), 4639--4647.Google Scholar
Index Terms
- Hierarchical Scene Normality-Binding Modeling for Anomaly Detection in Surveillance Videos
Recommendations
Saliency detection integrating both background and foreground information
In this paper, we propose a novel saliency detection algorithm. The saliency of an image element is defined not only as its contrast to the background but as its similarity to the foreground. First, we extract background seeds as well as their spatial ...
Image Segmentation Using Proportion of Foreground to Background Algorithm
IS3C '12: Proceedings of the 2012 International Symposium on Computer, Consumer and ControlThe medical image is widely used in medicine diagnosis for the science-sound qualitative and quantitative analysis. To achieve a clear image, the technology of the image segmentation is the key point. Therefore, this paper proposes a new image ...
Multiscale background modelling and segmentation
DSP'09: Proceedings of the 16th international conference on Digital Signal ProcessingA new multiscale approach to motion based segmentation of objects in video sequences is presented. While image features extracted at multiple scales are commonly used within the pattern recognition community, they have seldom been employed for ...
Comments