Skip to main content
Log in

C\(^{2}\)Net: content-dependent and -independent cross-attention network for anomaly detection in videos

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Anomaly detection in videos is a challenging issue that identifies unexpected occurrences if normal training examples are provided. Most approaches focus on designing elaborate models to mine normal patterns using content-dependent information. Nevertheless, content-dependent information, which includes various details, would have both positive and negative effects. The abundant details around abnormal events are conducive to anomaly detection by producing obvious errors, while complex details around normal events may lead to the erroneous detection of anomalies in challenging normal samples. To alleviate the problem of challenging normal samples, we propose a content-independent image without complex details for all normal samples during the training phase. It represents the pseudo label of regular patterns as a normal supporter. Accordingly, a content-dependent and -independent cross-attention network, termed C\(^2\)Net, is introduced by jointly considering the advantage of content-dependent information and content-independent pseudo-label simultaneously. C\(^2\)Net employs a fusion-first-then-separation strategy, where it injects the content-independent pseudo-label supporter into content-dependent frames using an auto-encoder network. It then reconstructs the content-independent pseudo-label supporter and the content-dependent frame respectively using siamese sub-networks. Additionally, a novel cross-attention module is designed between the siamese sub-networks to separate the information of the content-independent pseudo-label supporter and the content-dependent frame. The experimental results on three publicly outdoor datasets and a publicly indoor dataset about cognitive disorder rehabilitation assessment verify the effectiveness of C\(^2\)Net.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

All the data used in this study are publicly available.

References

  1. Zhou JT, Du J, Zhu H, Peng X, Liu Y, Goh RSM (2019) Anomalynet: an anomaly detection network for video surveillance. IEEE Trans Inf Forensics Secur 14(10):2537–2550

    Article  Google Scholar 

  2. Morais R, Le V, Tran T, Saha B, Mansour M, Venkatesh S (2019) Learning regularity in skeleton trajectories for anomaly detection in videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11996–12004

  3. Chen C-FR, Panda R, Ramakrishnan K, Feris R, Cohn J, Oliva A, Fan Q (2021) Deep analysis of cnn-based spatio-temporal representations for action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6165–6175

  4. Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. In: 2016 IEEE Conference on computer vision and pattern recognition, pp 733–742

  5. Chong YS, Tay YH (2017) Abnormal event detection in videos using spatiotemporal autoencoder. In: 2017 International symposium on neural networks, Springer, pp 189–196

  6. Mathieu M, Couprie C, LeCun Y (2015) Deep multi-scale video prediction beyond mean square error. CoRR arXiv:1511.05440

  7. Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection – a new baseline. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition

  8. Cong Y, Yuan J, Liu J (2011) Sparse reconstruction cost for abnormal event detection. In: 2011 IEEE Conference on computer vision and pattern recognition, IEEE, pp 3449–3456

  9. Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 fps in matlab. In: 2013 IEEE International conference on computer vision, pp 2720–2727

  10. Luo W, Liu W, Gao S (2017) Remembering history with convolutional lstm for anomaly detection. In: 2017 IEEE International conference on multimedia and expo (ICME), IEEE, pp 439–444

  11. Lu Y, Kumar KM, Nabavi S, Wang Y (2019) Future frame prediction using convolutional vrnn for anomaly detection. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), IEEE, pp 1–8

  12. Gu T, Liu K, Dolan-Gavitt B, Garg S (2019) Badnets: evaluating backdooring attacks on deep neural networks. IEEE Access 7:47230–47244

    Article  Google Scholar 

  13. Li Y, Li Y, Wu B, Li L, He R, Lyu S (2021) Invisible backdoor attack with sample-specific triggers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 16463–16472

  14. Fang Z, Liang J, Zhou JT, Xiao Y, Yang F (2022) Anomaly detection with bidirectional consistency in videos. IEEE Trans Neural Netw Learn Syst 33(3):1079–1092. https://doi.org/10.1109/TNNLS.2020.3039899

    Article  ADS  PubMed  Google Scholar 

  15. Feichtenhofer C, Fan H, Malik J, He K (2019) Slowfast networks for video recognition. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)

  16. Zhang Y, Xiang T, Hospedales TM, Lu H (2018) Deep mutual learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4320–4328

  17. Wu S, Moore BE, Shah M (2010) Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. In: 2010 IEEE Conference on computer vision and pattern recognition, IEEE, pp 2054–2060

  18. Sun Q, Liu H, Harada T (2017) Online growing neural gas for anomaly detection in changing surveillance scenes. Pattern Recogn 64:187–201

    Article  ADS  Google Scholar 

  19. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Conference on Computer vision and pattern recognition, IEEE Computer Society, vol 1, pp 886–893

  20. Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: 2006 European conference on computer vision, Springer, pp 428–441

  21. Zhang D, Gatica-Perez D, Bengio S, McCowan I (2005) Semi-supervised adapted hmms for unusual event detection. In: 2005 IEEE Conference on computer vision and pattern recognition, IEEE, vol 1, pp 611–618

  22. Kim J, Grauman K (2009) Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates. In: 2009 IEEE Conference on computer vision and pattern recognition, IEEE, pp 2921–2928

  23. Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: 2010 IEEE Computer society conference on computer vision and pattern recognition, IEEE, pp 1975–1981

  24. Nallaivarothayan H, Fookes C, Denman S, Sridharan S (2014) An mrf based abnormal event detection approach using motion and appearance features. In: 2014 11th IEEE International conference on advanced video and signal based surveillance (AVSS), pp 343–348. https://doi.org/10.1109/AVSS.2014.6918692

  25. Zhao B, Fei-Fei L, Xing EP (2011) Online detection of unusual events in videos via dynamic sparse coding. In: 2011 IEEE Conference on computer vision and pattern recognition, IEEE, pp 3313–3320

  26. Yang M, Feng Y, Rao AS, Rajasegarar S, Tian S, Zhou Z (2023) Evolving graph-based video crowd anomaly detection. The Visual Computer, pp 1–16

  27. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  CAS  PubMed  Google Scholar 

  28. Luo W, Liu W, Lian D, Gao S (2021) Future frame prediction network for video anomaly detection. IEEE Trans Pattern Anal Mach Intell 44(11):7505–7520

    Article  Google Scholar 

  29. Tang Y, Zhao L, Zhang S, Gong C, Li G, Yang J (2020) Integrating prediction and reconstruction for anomaly detection. Pattern Recogn Lett 129:123–130

    Article  ADS  Google Scholar 

  30. Gong D, Liu L, Le V, Saha B, Mansour MR, Venkatesh S, Hengel Avd (2019) Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1705–1714

  31. Park H, Noh J, Ham B (2020) Learning memory-guided normality for anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14372–14381

  32. Wang L, Tian J, Zhou S, Shi H, Hua G (2023) Memory-augmented appearance-motion network for video anomaly detection. Pattern Recogn 138:109335

    Article  Google Scholar 

  33. Zaheer MZ, Mahmood A, Khan MH, Segu M, Yu F, Lee S-I (2022) Generative cooperative learning for unsupervised video anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14744–14754

  34. Chang Y, Tu Z, Xie W, Luo B, Zhang S, Sui H, Yuan J (2022) Video anomaly detection with spatio-temporal dissociation. Pattern Recogn 122:108213

    Article  Google Scholar 

  35. Zhang X, Fang J, Yang B, Chen S, Li B (2022) Hybrid attention and motion constraint for anomaly detection in crowded scenes. IEEE Trans Circ Syst Vid Technol pp 1–1. https://doi.org/10.1109/TCSVT.2022.3221622

  36. Le V-T, Kim Y-G (2023) Attention-based residual autoencoder for video anomaly detection. Appl Intell 53(3):3240–3254

    Article  Google Scholar 

  37. Fang Z, Zhou JT, Xiao Y, Li Y, Yang F (2021) Multi-encoder towards effective anomaly detection in videos. IEEE Trans Multimedia 23:4106–4116

    Article  Google Scholar 

  38. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE Conf Comput Vis Pattern Recogn pp 3146–3154

  39. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, Springer, pp 213–229

  40. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  41. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Springer, pp 234–241

  42. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  43. Mathieu M, Couprie C, LeCun Y (2017) Deep multi-scale video prediction beyond mean square error. In: 2017 IEEE Int Conf Comput Vis pp 2813–2821

  44. Yu Y, Gong Z, Zhong P, Shan J (2017) Unsupervised representation learning with deep convolutional neural network for remote sensing images. In: International conference on image and graphics, Springer, pp 97–108

  45. Vondrick C, Pirsiavash H, Torralba A (2016) Generating videos with scene dynamics. In: Advances in Neural Inform Process Syst pp 613–621

  46. Luo W, Liu W, Gao SH (2017) A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the IEEE international conference on computer vision, pp 341–349

  47. Leyva R, Sanchez V, Li C-T (2017) The lv dataset: a realistic surveillance video dataset for abnormal event detection. In: 2017 5th International workshop on biometrics and forensics (IWBF), IEEE, pp 1–6

  48. Leyva R, Sanchez V, Li CT (2017) Video anomaly detection with compact feature sets for online performance. IEEE Trans Image Process 26(7):3463–3478

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  49. Negin F, Rodriguez P, Koperski M, Kerboua A, González J, Bourgeois J, Chapoulie E, Robert P, Bremond F (2018) Praxis: towards automatic cognitive assessment using gesture recognition. Expert Syst Appl 106:21–35

  50. Luo W, Liu W, Lian D, Tang J, Duan L, Peng X, Gao S (2019) Video anomaly detection with sparse coding inspired deep neural networks. IEEE Trans Pattern Anal Mach Intell

  51. Deepak K, Srivathsan G, Roshan S, Chandrakala S (2021) Deep multi-view representation learning for video anomaly detection using spatiotemporal autoencoders. Circuits Systems Signal Process 40(3):1333–1349

    Article  Google Scholar 

  52. Doshi K, Yilmaz Y (2021) Online anomaly detection in surveillance videos with asymptotic bound on false alarm rate. Pattern Recogn 114:107865

  53. Hao Y, Li J, Wang N, Wang X, Gao X (2022) Spatiotemporal consistency-enhanced network for video anomaly detection. Pattern Recogn 121:108232

    Article  Google Scholar 

  54. Kommanduri R, Ghorai M (2023) Bi-read: bi-residual autoencoder based feature enhancement for video anomaly detection. J Vis Commun Image Representat pp 103860

  55. Ionescu RT, Khan FS, Georgescu M-I, Shao L (2019) Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7842–7851

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant No. 62371219, 62271221, and 61771233, Guangdong Basic and Applied Basic Research Foundation under Grant No. 2023A1515011260, Science and Technology Program of Guangzhou under Grant No. 202201011672, SERC Central Research Fund (Use-inspired Basic Research).

Author information

Authors and Affiliations

Authors

Contributions

Jiafei Liang: Investigation, Software, Writing-original draft, Formal analysis. Yang Xiao: Methodology, Funding acquisition. Joey Tianyi Zhou: Conceptualization, Funding acquisition. Feng Yang: Resources, Funding acquisition. Ting Li: Investigation, Conceptualization, Writing-review. Zhiwen Fang: Conceptualization, Methodology, Writing-review & editing, Funding acquisition, Supervision.

Corresponding authors

Correspondence to Ting Li or Zhiwen Fang.

Ethics declarations

Ethical standard

All the data are publicly published. No ethical data in this paper.

Competing interests

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, J., Xiao, Y., Zhou, J.T. et al. C\(^{2}\)Net: content-dependent and -independent cross-attention network for anomaly detection in videos. Appl Intell 54, 1980–1996 (2024). https://doi.org/10.1007/s10489-023-05252-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05252-6

Keywords

Navigation