Abstract
Video anomaly detection (VAD) in intelligent surveillance systems is a crucial yet highly challenging task. Since appearance and motion information is vital for identifying anomalies, existing unsupervised VAD methods usually learn normality from them. However, these approaches tend to consider appearance and motion separately or simply integrate them while ignoring the consistency between them, resulting in sub-optimal performance. To address this problem, we propose a Memory-Augmented Spatial-Temporal Consistency Network, aiming to model the latent consistency between spatial appearance and temporal motion by learning the unified spatiotemporal representation. Additionally, we introduce a spatial-temporal memory fusion module to record spatial and temporal prototypes of regular patterns from the unified spatiotemporal representation, increasing the gap between normal and abnormal events in the feature space. Experimental results on three benchmarks demonstrate the effectiveness of the spatial-temporal consistency for VAD tasks. Our method performs comparably to the state-of-the-art methods with AUCs of 97.6%, 89.3%, and 73.3% on the UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets, respectively.
This work is partially supported by the National Natural Science Foundation of China (Grant No. 61972016) and the Science and Technology Commission of Shanghai Municipality Research Fund (Grant No. 21JC1405300).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Liu, Y., Yang, D., Wang, Y., Liu, J., Song, L.: Generalized video anomaly event detection: systematic taxonomy and comparison of deep models. arXiv preprint arXiv:2302.05087 (2023)
Liu, W., Luo, W., Lian, D., Gao, S.: Future frame prediction for anomaly detection - a new baseline. In: CVPR, pp. 6536–6545 (2018)
Gong, D., Liu, L., Le, V., Saha, B., Mansour, M.R., Venkatesh, S., Van Den Hengel, A.: Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. In: ICCV, pp. 1705–1714 (2019)
Nguyen, T.N., Meunier, J.: Anomaly detection in video sequence with appearance-motion correspondence. In: ICCV, pp. 1273–1283 (2019)
Liu, Y., Liu, J., Zhao, M., Yang, D., Zhu, X., Song, L.: Learning appearance-motion normality for video anomaly detection. In: ICME, pp. 1–6 (2022)
Liu, Y., Liu, J., Lin, J., Zhao, M., Song, L.: Appearance-motion united auto-encoder framework for video anomaly detection. IEEE Trans. Circ. Syst. II Express Briefs 69(5), 2498–2502 (2022)
Luo, W., Liu, W., Gao, S.: A revisit of sparse coding based anomaly detection in stacked rnn framework. In: ICCV, pp. 341–349 (2017)
Cai, R., Zhang, H., Liu, W., Gao, S., Hao, Z.: Appearance-motion memory consistency network for video anomaly detection. In: AAAI, pp. 938–946 (2021)
Chang, Y., et al.: Video anomaly detection with spatio-temporal dissociation. Pattern Recogn. 122, 108213 (2022)
Wang, Y., Long, M., Wang, J., Gao, Z., Yu, P.S.: Predrnn: recurrent neural networks for predictive learning using spatiotemporal LSTMs. In: NeurIPS, pp. 879–888 (2017)
Park, H., Noh, J., Ham, B.: Learning memory-guided normality for anomaly detection. In: CVPR, pp. 14360–14369 (2020)
Ravanbakhsh, M., Sangineto, E., Nabi, M., Sebe, N.: Training adversarial discriminators for cross-channel abnormal event detection in crowds. In: WACV, pp. 1896–1904 (2019)
Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning temporal regularity in video sequences. In: CVPR, pp. 733–742 (2016)
Fang, Z., Zhou, J.T., Xiao, Y., Li, Y., Yang, F.: Multi-encoder towards effective anomaly detection in videos. IEEE Trans. Multimedia 23, 4106–4116 (2021)
Lee, S., Kim, H.G., Ro, Y.M.: Stan: spatio-temporal adversarial networks for abnormal event detection. In: ICASSP, pp. 1323–1327 (2018)
Zhao, M., Liu, Y., Liu, J., Zeng, X.: Exploiting spatial-temporal correlations for video anomaly detection. In: ICPR, pp. 1727–1733 (2022)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: CVPR, pp. 6848–6856 (2018)
Li, W., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 18–32 (2014)
Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 fps in matlab. In: ICCV, pp. 2720–2727 (2013)
Hao, Y., Li, J., Wang, N., Wang, X., Gao, X.: Spatiotemporal consistency-enhanced network for video anomaly detection. Pattern Recogn. 121, 108232 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, Z., Zhao, M., Zeng, X., Wang, T., Pang, C. (2024). Memory-Augmented Spatial-Temporal Consistency Network for Video Anomaly Detection. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14430. Springer, Singapore. https://doi.org/10.1007/978-981-99-8537-1_8
Download citation
DOI: https://doi.org/10.1007/978-981-99-8537-1_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8536-4
Online ISBN: 978-981-99-8537-1
eBook Packages: Computer ScienceComputer Science (R0)