Abstract
Anomaly detection in videos is a challenging issue that identifies unexpected occurrences if normal training examples are provided. Most approaches focus on designing elaborate models to mine normal patterns using content-dependent information. Nevertheless, content-dependent information, which includes various details, would have both positive and negative effects. The abundant details around abnormal events are conducive to anomaly detection by producing obvious errors, while complex details around normal events may lead to the erroneous detection of anomalies in challenging normal samples. To alleviate the problem of challenging normal samples, we propose a content-independent image without complex details for all normal samples during the training phase. It represents the pseudo label of regular patterns as a normal supporter. Accordingly, a content-dependent and -independent cross-attention network, termed C\(^2\)Net, is introduced by jointly considering the advantage of content-dependent information and content-independent pseudo-label simultaneously. C\(^2\)Net employs a fusion-first-then-separation strategy, where it injects the content-independent pseudo-label supporter into content-dependent frames using an auto-encoder network. It then reconstructs the content-independent pseudo-label supporter and the content-dependent frame respectively using siamese sub-networks. Additionally, a novel cross-attention module is designed between the siamese sub-networks to separate the information of the content-independent pseudo-label supporter and the content-dependent frame. The experimental results on three publicly outdoor datasets and a publicly indoor dataset about cognitive disorder rehabilitation assessment verify the effectiveness of C\(^2\)Net.
Similar content being viewed by others
Data availability
All the data used in this study are publicly available.
References
Zhou JT, Du J, Zhu H, Peng X, Liu Y, Goh RSM (2019) Anomalynet: an anomaly detection network for video surveillance. IEEE Trans Inf Forensics Secur 14(10):2537–2550
Morais R, Le V, Tran T, Saha B, Mansour M, Venkatesh S (2019) Learning regularity in skeleton trajectories for anomaly detection in videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11996–12004
Chen C-FR, Panda R, Ramakrishnan K, Feris R, Cohn J, Oliva A, Fan Q (2021) Deep analysis of cnn-based spatio-temporal representations for action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6165–6175
Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. In: 2016 IEEE Conference on computer vision and pattern recognition, pp 733–742
Chong YS, Tay YH (2017) Abnormal event detection in videos using spatiotemporal autoencoder. In: 2017 International symposium on neural networks, Springer, pp 189–196
Mathieu M, Couprie C, LeCun Y (2015) Deep multi-scale video prediction beyond mean square error. CoRR arXiv:1511.05440
Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection – a new baseline. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition
Cong Y, Yuan J, Liu J (2011) Sparse reconstruction cost for abnormal event detection. In: 2011 IEEE Conference on computer vision and pattern recognition, IEEE, pp 3449–3456
Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 fps in matlab. In: 2013 IEEE International conference on computer vision, pp 2720–2727
Luo W, Liu W, Gao S (2017) Remembering history with convolutional lstm for anomaly detection. In: 2017 IEEE International conference on multimedia and expo (ICME), IEEE, pp 439–444
Lu Y, Kumar KM, Nabavi S, Wang Y (2019) Future frame prediction using convolutional vrnn for anomaly detection. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), IEEE, pp 1–8
Gu T, Liu K, Dolan-Gavitt B, Garg S (2019) Badnets: evaluating backdooring attacks on deep neural networks. IEEE Access 7:47230–47244
Li Y, Li Y, Wu B, Li L, He R, Lyu S (2021) Invisible backdoor attack with sample-specific triggers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 16463–16472
Fang Z, Liang J, Zhou JT, Xiao Y, Yang F (2022) Anomaly detection with bidirectional consistency in videos. IEEE Trans Neural Netw Learn Syst 33(3):1079–1092. https://doi.org/10.1109/TNNLS.2020.3039899
Feichtenhofer C, Fan H, Malik J, He K (2019) Slowfast networks for video recognition. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)
Zhang Y, Xiang T, Hospedales TM, Lu H (2018) Deep mutual learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4320–4328
Wu S, Moore BE, Shah M (2010) Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. In: 2010 IEEE Conference on computer vision and pattern recognition, IEEE, pp 2054–2060
Sun Q, Liu H, Harada T (2017) Online growing neural gas for anomaly detection in changing surveillance scenes. Pattern Recogn 64:187–201
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Conference on Computer vision and pattern recognition, IEEE Computer Society, vol 1, pp 886–893
Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: 2006 European conference on computer vision, Springer, pp 428–441
Zhang D, Gatica-Perez D, Bengio S, McCowan I (2005) Semi-supervised adapted hmms for unusual event detection. In: 2005 IEEE Conference on computer vision and pattern recognition, IEEE, vol 1, pp 611–618
Kim J, Grauman K (2009) Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates. In: 2009 IEEE Conference on computer vision and pattern recognition, IEEE, pp 2921–2928
Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: 2010 IEEE Computer society conference on computer vision and pattern recognition, IEEE, pp 1975–1981
Nallaivarothayan H, Fookes C, Denman S, Sridharan S (2014) An mrf based abnormal event detection approach using motion and appearance features. In: 2014 11th IEEE International conference on advanced video and signal based surveillance (AVSS), pp 343–348. https://doi.org/10.1109/AVSS.2014.6918692
Zhao B, Fei-Fei L, Xing EP (2011) Online detection of unusual events in videos via dynamic sparse coding. In: 2011 IEEE Conference on computer vision and pattern recognition, IEEE, pp 3313–3320
Yang M, Feng Y, Rao AS, Rajasegarar S, Tian S, Zhou Z (2023) Evolving graph-based video crowd anomaly detection. The Visual Computer, pp 1–16
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Luo W, Liu W, Lian D, Gao S (2021) Future frame prediction network for video anomaly detection. IEEE Trans Pattern Anal Mach Intell 44(11):7505–7520
Tang Y, Zhao L, Zhang S, Gong C, Li G, Yang J (2020) Integrating prediction and reconstruction for anomaly detection. Pattern Recogn Lett 129:123–130
Gong D, Liu L, Le V, Saha B, Mansour MR, Venkatesh S, Hengel Avd (2019) Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1705–1714
Park H, Noh J, Ham B (2020) Learning memory-guided normality for anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14372–14381
Wang L, Tian J, Zhou S, Shi H, Hua G (2023) Memory-augmented appearance-motion network for video anomaly detection. Pattern Recogn 138:109335
Zaheer MZ, Mahmood A, Khan MH, Segu M, Yu F, Lee S-I (2022) Generative cooperative learning for unsupervised video anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14744–14754
Chang Y, Tu Z, Xie W, Luo B, Zhang S, Sui H, Yuan J (2022) Video anomaly detection with spatio-temporal dissociation. Pattern Recogn 122:108213
Zhang X, Fang J, Yang B, Chen S, Li B (2022) Hybrid attention and motion constraint for anomaly detection in crowded scenes. IEEE Trans Circ Syst Vid Technol pp 1–1. https://doi.org/10.1109/TCSVT.2022.3221622
Le V-T, Kim Y-G (2023) Attention-based residual autoencoder for video anomaly detection. Appl Intell 53(3):3240–3254
Fang Z, Zhou JT, Xiao Y, Li Y, Yang F (2021) Multi-encoder towards effective anomaly detection in videos. IEEE Trans Multimedia 23:4106–4116
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE Conf Comput Vis Pattern Recogn pp 3146–3154
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, Springer, pp 213–229
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Springer, pp 234–241
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Mathieu M, Couprie C, LeCun Y (2017) Deep multi-scale video prediction beyond mean square error. In: 2017 IEEE Int Conf Comput Vis pp 2813–2821
Yu Y, Gong Z, Zhong P, Shan J (2017) Unsupervised representation learning with deep convolutional neural network for remote sensing images. In: International conference on image and graphics, Springer, pp 97–108
Vondrick C, Pirsiavash H, Torralba A (2016) Generating videos with scene dynamics. In: Advances in Neural Inform Process Syst pp 613–621
Luo W, Liu W, Gao SH (2017) A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the IEEE international conference on computer vision, pp 341–349
Leyva R, Sanchez V, Li C-T (2017) The lv dataset: a realistic surveillance video dataset for abnormal event detection. In: 2017 5th International workshop on biometrics and forensics (IWBF), IEEE, pp 1–6
Leyva R, Sanchez V, Li CT (2017) Video anomaly detection with compact feature sets for online performance. IEEE Trans Image Process 26(7):3463–3478
Negin F, Rodriguez P, Koperski M, Kerboua A, González J, Bourgeois J, Chapoulie E, Robert P, Bremond F (2018) Praxis: towards automatic cognitive assessment using gesture recognition. Expert Syst Appl 106:21–35
Luo W, Liu W, Lian D, Tang J, Duan L, Peng X, Gao S (2019) Video anomaly detection with sparse coding inspired deep neural networks. IEEE Trans Pattern Anal Mach Intell
Deepak K, Srivathsan G, Roshan S, Chandrakala S (2021) Deep multi-view representation learning for video anomaly detection using spatiotemporal autoencoders. Circuits Systems Signal Process 40(3):1333–1349
Doshi K, Yilmaz Y (2021) Online anomaly detection in surveillance videos with asymptotic bound on false alarm rate. Pattern Recogn 114:107865
Hao Y, Li J, Wang N, Wang X, Gao X (2022) Spatiotemporal consistency-enhanced network for video anomaly detection. Pattern Recogn 121:108232
Kommanduri R, Ghorai M (2023) Bi-read: bi-residual autoencoder based feature enhancement for video anomaly detection. J Vis Commun Image Representat pp 103860
Ionescu RT, Khan FS, Georgescu M-I, Shao L (2019) Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7842–7851
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant No. 62371219, 62271221, and 61771233, Guangdong Basic and Applied Basic Research Foundation under Grant No. 2023A1515011260, Science and Technology Program of Guangzhou under Grant No. 202201011672, SERC Central Research Fund (Use-inspired Basic Research).
Author information
Authors and Affiliations
Contributions
Jiafei Liang: Investigation, Software, Writing-original draft, Formal analysis. Yang Xiao: Methodology, Funding acquisition. Joey Tianyi Zhou: Conceptualization, Funding acquisition. Feng Yang: Resources, Funding acquisition. Ting Li: Investigation, Conceptualization, Writing-review. Zhiwen Fang: Conceptualization, Methodology, Writing-review & editing, Funding acquisition, Supervision.
Corresponding authors
Ethics declarations
Ethical standard
All the data are publicly published. No ethical data in this paper.
Competing interests
All authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liang, J., Xiao, Y., Zhou, J.T. et al. C\(^{2}\)Net: content-dependent and -independent cross-attention network for anomaly detection in videos. Appl Intell 54, 1980–1996 (2024). https://doi.org/10.1007/s10489-023-05252-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05252-6