C $$^{2}$$ Net: content-dependent and -independent cross-attention network for anomaly detection in videos

Liang, Jiafei; Xiao, Yang; Zhou, Joey Tianyi; Yang, Feng; Li, Ting; Fang, Zhiwen

doi:10.1007/s10489-023-05252-6

C$^{2}$Net: content-dependent and -independent cross-attention network for anomaly detection in videos

Published: 26 January 2024

Volume 54, pages 1980–1996, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jiafei Liang^1,2,3,
Yang Xiao⁴,
Joey Tianyi Zhou^5,6,
Feng Yang^1,2,3,
Ting Li⁷ &
…
Zhiwen Fang ORCID: orcid.org/0000-0002-9314-5262^1,2,3,8

191 Accesses
1 Altmetric
Explore all metrics

Abstract

Anomaly detection in videos is a challenging issue that identifies unexpected occurrences if normal training examples are provided. Most approaches focus on designing elaborate models to mine normal patterns using content-dependent information. Nevertheless, content-dependent information, which includes various details, would have both positive and negative effects. The abundant details around abnormal events are conducive to anomaly detection by producing obvious errors, while complex details around normal events may lead to the erroneous detection of anomalies in challenging normal samples. To alleviate the problem of challenging normal samples, we propose a content-independent image without complex details for all normal samples during the training phase. It represents the pseudo label of regular patterns as a normal supporter. Accordingly, a content-dependent and -independent cross-attention network, termed C$^2$Net, is introduced by jointly considering the advantage of content-dependent information and content-independent pseudo-label simultaneously. C$^2$Net employs a fusion-first-then-separation strategy, where it injects the content-independent pseudo-label supporter into content-dependent frames using an auto-encoder network. It then reconstructs the content-independent pseudo-label supporter and the content-dependent frame respectively using siamese sub-networks. Additionally, a novel cross-attention module is designed between the siamese sub-networks to separate the information of the content-independent pseudo-label supporter and the content-dependent frame. The experimental results on three publicly outdoor datasets and a publicly indoor dataset about cognitive disorder rehabilitation assessment verify the effectiveness of C$^2$Net.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-Modal Two-Stream Target Focused Network for Video Anomaly Detection

Conjoined triple deep network for video anomaly detection

Article 27 December 2023

Attention-based framework for weakly supervised video anomaly detection

Article 10 January 2022

Data availability

All the data used in this study are publicly available.

References

Zhou JT, Du J, Zhu H, Peng X, Liu Y, Goh RSM (2019) Anomalynet: an anomaly detection network for video surveillance. IEEE Trans Inf Forensics Secur 14(10):2537–2550
Article Google Scholar
Morais R, Le V, Tran T, Saha B, Mansour M, Venkatesh S (2019) Learning regularity in skeleton trajectories for anomaly detection in videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11996–12004
Chen C-FR, Panda R, Ramakrishnan K, Feris R, Cohn J, Oliva A, Fan Q (2021) Deep analysis of cnn-based spatio-temporal representations for action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6165–6175
Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. In: 2016 IEEE Conference on computer vision and pattern recognition, pp 733–742
Chong YS, Tay YH (2017) Abnormal event detection in videos using spatiotemporal autoencoder. In: 2017 International symposium on neural networks, Springer, pp 189–196
Mathieu M, Couprie C, LeCun Y (2015) Deep multi-scale video prediction beyond mean square error. CoRR arXiv:1511.05440
Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection – a new baseline. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition
Cong Y, Yuan J, Liu J (2011) Sparse reconstruction cost for abnormal event detection. In: 2011 IEEE Conference on computer vision and pattern recognition, IEEE, pp 3449–3456
Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 fps in matlab. In: 2013 IEEE International conference on computer vision, pp 2720–2727
Luo W, Liu W, Gao S (2017) Remembering history with convolutional lstm for anomaly detection. In: 2017 IEEE International conference on multimedia and expo (ICME), IEEE, pp 439–444
Lu Y, Kumar KM, Nabavi S, Wang Y (2019) Future frame prediction using convolutional vrnn for anomaly detection. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), IEEE, pp 1–8
Gu T, Liu K, Dolan-Gavitt B, Garg S (2019) Badnets: evaluating backdooring attacks on deep neural networks. IEEE Access 7:47230–47244
Article Google Scholar
Li Y, Li Y, Wu B, Li L, He R, Lyu S (2021) Invisible backdoor attack with sample-specific triggers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 16463–16472
Fang Z, Liang J, Zhou JT, Xiao Y, Yang F (2022) Anomaly detection with bidirectional consistency in videos. IEEE Trans Neural Netw Learn Syst 33(3):1079–1092. https://doi.org/10.1109/TNNLS.2020.3039899
Article ADS PubMed Google Scholar
Feichtenhofer C, Fan H, Malik J, He K (2019) Slowfast networks for video recognition. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)
Zhang Y, Xiang T, Hospedales TM, Lu H (2018) Deep mutual learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4320–4328
Wu S, Moore BE, Shah M (2010) Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. In: 2010 IEEE Conference on computer vision and pattern recognition, IEEE, pp 2054–2060
Sun Q, Liu H, Harada T (2017) Online growing neural gas for anomaly detection in changing surveillance scenes. Pattern Recogn 64:187–201
Article ADS Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Conference on Computer vision and pattern recognition, IEEE Computer Society, vol 1, pp 886–893
Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: 2006 European conference on computer vision, Springer, pp 428–441
Zhang D, Gatica-Perez D, Bengio S, McCowan I (2005) Semi-supervised adapted hmms for unusual event detection. In: 2005 IEEE Conference on computer vision and pattern recognition, IEEE, vol 1, pp 611–618
Kim J, Grauman K (2009) Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates. In: 2009 IEEE Conference on computer vision and pattern recognition, IEEE, pp 2921–2928
Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: 2010 IEEE Computer society conference on computer vision and pattern recognition, IEEE, pp 1975–1981
Nallaivarothayan H, Fookes C, Denman S, Sridharan S (2014) An mrf based abnormal event detection approach using motion and appearance features. In: 2014 11th IEEE International conference on advanced video and signal based surveillance (AVSS), pp 343–348. https://doi.org/10.1109/AVSS.2014.6918692
Zhao B, Fei-Fei L, Xing EP (2011) Online detection of unusual events in videos via dynamic sparse coding. In: 2011 IEEE Conference on computer vision and pattern recognition, IEEE, pp 3313–3320
Yang M, Feng Y, Rao AS, Rajasegarar S, Tian S, Zhou Z (2023) Evolving graph-based video crowd anomaly detection. The Visual Computer, pp 1–16
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article CAS PubMed Google Scholar
Luo W, Liu W, Lian D, Gao S (2021) Future frame prediction network for video anomaly detection. IEEE Trans Pattern Anal Mach Intell 44(11):7505–7520
Article Google Scholar
Tang Y, Zhao L, Zhang S, Gong C, Li G, Yang J (2020) Integrating prediction and reconstruction for anomaly detection. Pattern Recogn Lett 129:123–130
Article ADS Google Scholar
Gong D, Liu L, Le V, Saha B, Mansour MR, Venkatesh S, Hengel Avd (2019) Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1705–1714
Park H, Noh J, Ham B (2020) Learning memory-guided normality for anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14372–14381
Wang L, Tian J, Zhou S, Shi H, Hua G (2023) Memory-augmented appearance-motion network for video anomaly detection. Pattern Recogn 138:109335
Article Google Scholar
Zaheer MZ, Mahmood A, Khan MH, Segu M, Yu F, Lee S-I (2022) Generative cooperative learning for unsupervised video anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14744–14754
Chang Y, Tu Z, Xie W, Luo B, Zhang S, Sui H, Yuan J (2022) Video anomaly detection with spatio-temporal dissociation. Pattern Recogn 122:108213
Article Google Scholar
Zhang X, Fang J, Yang B, Chen S, Li B (2022) Hybrid attention and motion constraint for anomaly detection in crowded scenes. IEEE Trans Circ Syst Vid Technol pp 1–1. https://doi.org/10.1109/TCSVT.2022.3221622
Le V-T, Kim Y-G (2023) Attention-based residual autoencoder for video anomaly detection. Appl Intell 53(3):3240–3254
Article Google Scholar
Fang Z, Zhou JT, Xiao Y, Li Y, Yang F (2021) Multi-encoder towards effective anomaly detection in videos. IEEE Trans Multimedia 23:4106–4116
Article Google Scholar
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE Conf Comput Vis Pattern Recogn pp 3146–3154
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, Springer, pp 213–229
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Springer, pp 234–241
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Mathieu M, Couprie C, LeCun Y (2017) Deep multi-scale video prediction beyond mean square error. In: 2017 IEEE Int Conf Comput Vis pp 2813–2821
Yu Y, Gong Z, Zhong P, Shan J (2017) Unsupervised representation learning with deep convolutional neural network for remote sensing images. In: International conference on image and graphics, Springer, pp 97–108
Vondrick C, Pirsiavash H, Torralba A (2016) Generating videos with scene dynamics. In: Advances in Neural Inform Process Syst pp 613–621
Luo W, Liu W, Gao SH (2017) A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the IEEE international conference on computer vision, pp 341–349
Leyva R, Sanchez V, Li C-T (2017) The lv dataset: a realistic surveillance video dataset for abnormal event detection. In: 2017 5th International workshop on biometrics and forensics (IWBF), IEEE, pp 1–6
Leyva R, Sanchez V, Li CT (2017) Video anomaly detection with compact feature sets for online performance. IEEE Trans Image Process 26(7):3463–3478
Article ADS MathSciNet PubMed Google Scholar
Negin F, Rodriguez P, Koperski M, Kerboua A, González J, Bourgeois J, Chapoulie E, Robert P, Bremond F (2018) Praxis: towards automatic cognitive assessment using gesture recognition. Expert Syst Appl 106:21–35
Luo W, Liu W, Lian D, Tang J, Duan L, Peng X, Gao S (2019) Video anomaly detection with sparse coding inspired deep neural networks. IEEE Trans Pattern Anal Mach Intell
Deepak K, Srivathsan G, Roshan S, Chandrakala S (2021) Deep multi-view representation learning for video anomaly detection using spatiotemporal autoencoders. Circuits Systems Signal Process 40(3):1333–1349
Article Google Scholar
Doshi K, Yilmaz Y (2021) Online anomaly detection in surveillance videos with asymptotic bound on false alarm rate. Pattern Recogn 114:107865
Hao Y, Li J, Wang N, Wang X, Gao X (2022) Spatiotemporal consistency-enhanced network for video anomaly detection. Pattern Recogn 121:108232
Article Google Scholar
Kommanduri R, Ghorai M (2023) Bi-read: bi-residual autoencoder based feature enhancement for video anomaly detection. J Vis Commun Image Representat pp 103860
Ionescu RT, Khan FS, Georgescu M-I, Shao L (2019) Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7842–7851

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant No. 62371219, 62271221, and 61771233, Guangdong Basic and Applied Basic Research Foundation under Grant No. 2023A1515011260, Science and Technology Program of Guangzhou under Grant No. 202201011672, SERC Central Research Fund (Use-inspired Basic Research).

Author information

Authors and Affiliations

School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China
Jiafei Liang, Feng Yang & Zhiwen Fang
Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China
Jiafei Liang, Feng Yang & Zhiwen Fang
Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China
Jiafei Liang, Feng Yang & Zhiwen Fang
National Key Laboratory of Multispectral Information Intelligent Processing Technology, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
Yang Xiao
Centre for Frontier AI Research (CFAR), Agency for Science, Technology, and Research (A*STAR), Singapore, 138632, Singapore
Joey Tianyi Zhou
Institute of High Performance Computing (IHPC), Research Agency for Science, Technology, and Research (A*STAR), Singapore, 138632, Singapore
Joey Tianyi Zhou
School of Nursing, Southern Medical University, Guangzhou, 510515, China
Ting Li
Department of Rehabilitation Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China
Zhiwen Fang

Authors

Jiafei Liang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Joey Tianyi Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Feng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ting Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwen Fang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Jiafei Liang: Investigation, Software, Writing-original draft, Formal analysis. Yang Xiao: Methodology, Funding acquisition. Joey Tianyi Zhou: Conceptualization, Funding acquisition. Feng Yang: Resources, Funding acquisition. Ting Li: Investigation, Conceptualization, Writing-review. Zhiwen Fang: Conceptualization, Methodology, Writing-review & editing, Funding acquisition, Supervision.

Corresponding authors

Correspondence to Ting Li or Zhiwen Fang.

Ethics declarations

Ethical standard

All the data are publicly published. No ethical data in this paper.

Competing interests

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liang, J., Xiao, Y., Zhou, J.T. et al. C$^{2}$Net: content-dependent and -independent cross-attention network for anomaly detection in videos. Appl Intell 54, 1980–1996 (2024). https://doi.org/10.1007/s10489-023-05252-6

Download citation

Accepted: 24 December 2023
Published: 26 January 2024
Issue Date: January 2024
DOI: https://doi.org/10.1007/s10489-023-05252-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

C\(^{2}\)Net: content-dependent and -independent cross-attention network for anomaly detection in videos

Abstract

Access this article

Similar content being viewed by others

Cross-Modal Two-Stream Target Focused Network for Video Anomaly Detection

Conjoined triple deep network for video anomaly detection

Attention-based framework for weakly supervised video anomaly detection

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethical standard

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

C\(^{2}\)Net: content-dependent and -independent cross-attention network for anomaly detection in videos

Abstract

Access this article

Similar content being viewed by others

Cross-Modal Two-Stream Target Focused Network for Video Anomaly Detection

Conjoined triple deep network for video anomaly detection

Attention-based framework for weakly supervised video anomaly detection

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethical standard

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation