Skip to main content
Log in

Attention-based framework for weakly supervised video anomaly detection

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Video anomaly detection automatically recognizes abnormal events in surveillance videos. Existing works have made advances in recognizing whether a video contains abnormal events; however, they cannot temporally localize the abnormal events within videos. This paper presents a novel anomaly attention-based framework for accurately temporally localize the abnormal events. Benefiting from the proposed framework, we can achieve frame-level VAD using video-level labels, which significantly reduces the burden of data annotation. Our method is an end-to-end deep neural network-based approach, which contains three modules: anomaly attention module (AAM), discriminative anomaly attention module (DAAM) and generative anomaly attention module (GAAM). Specifically, AAM is trained to generate the anomaly attention, which is used to measure the abnormal degree of each frame. Whereas, DAAM and GAAM are used to alternately augmenting AAM from two different aspects. On the one hand, DAAM enhancing AAM by optimizing the video-level video classification. On the other hand, GAAM adopts a conditional variational autoencoder to model the likelihood of each frame given the attention for refining AAM. As a result, AAM can generate higher anomaly scores for abnormal frames while lower anomaly scores for normal frames. Experimental results show that our proposed approach outperforms state-of-the-art methods, which validates the superiority of our AAVAD.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Adam A, Rivlin E, Shimshoni I, Reinitz D (2008) Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans Pattern Anal Mach Intell 30(3):555–560

    Article  Google Scholar 

  2. Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4724–4733

  3. Chaudhry R, Ravichandran A, Hager G, Vidal R (2009) Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1932–1939

  4. Chen CM, Chen L, Gan W, Qiu L, Ding W (2021) Discovering high utility-occupancy patterns from uncertain data. Inf Sci 546:1208–1229

    Article  MathSciNet  Google Scholar 

  5. Chen X, Li A, Zeng X, Guo W, Huang G (2015) Runtime model based approach to IoT application development. Front Comput Sci 9(4):540–553

    Article  Google Scholar 

  6. Chen X, Chen S, Ma Y, Liu B, Zhang Y, Huang G (2019) An adaptive offloading framework for android applications in mobile edge computing. Sci China Inf Sci 62(8):1–17

    Google Scholar 

  7. Chen X, Lin J, Ma Y, Lin B, Wang H, Huang G (2019) Self-adaptive resource allocation for cloud-based software services based on progressive qos prediction model. Sci China Inf Sci 62(11):1–3

    Google Scholar 

  8. Chen X, Wang H, Ma Y, Zheng X, Guo L (2020) Self-adaptive resource allocation for cloud-based software services based on iterative qos prediction model. Futur Gener Comput Syst 105:287–296

    Article  Google Scholar 

  9. Chen X, Zhu F, Chen Z, Min G, Zheng X, Rong C (2020) Resource allocation for cloud-based software services using prediction-enabled feedback control with reinforcement learning. IEEE Trans Cloud Comput

  10. Chen X, Li M, Zhong H, Ma Y, Hsu CH (2021) DNNOff: offloading DNN-based intelligent IoT applications in mobile edge computing. IEEE Trans Ind Inf

  11. Chen Y, Li W, Wang Y (2020) Robust Gaussian approximate fixed-interval smoother with outlier detection. IEEE Signal Process Lett 27:1505–1509

    Article  Google Scholar 

  12. Chen YQ, Zhou B, Zhang M, Chen CM (2020) Using IoT technology for computer-integrated manufacturing systems in the semiconductor industry. Appl Soft Comput 89:106065

    Article  Google Scholar 

  13. Chen Y, Guo J, Yang H, Wang Z, Liu H (2021) Research on navigation of bidirectional a* algorithm based on ant colony algorithm. J Supercomput 77:1958–1975

    Article  Google Scholar 

  14. Colque RVHM, Caetano C, de Andrade MTL, Schwartz WR (2017) Histograms of optical flow orientation and magnitude and entropy to detect anomalous events in videos. IEEE Trans Circuits Syst Video Technol 27(3):673–682

    Article  Google Scholar 

  15. Cong Y, Yuan J, Liu J (2011) Sparse reconstruction cost for abnormal event detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 3449–3456

  16. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp 886–893

  17. Feng JC, Hong FT, Zheng WS (2021) Mist: multiple instance self-training framework for video anomaly detection. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp 14009–14018

  18. Gong D, Liu L, Le V, Saha B, Mansour MR, Venkatesh S, Hengel A (2019) Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp 1705–1714

  19. Guo JL, Chen YQ, Yang HD, Chen CM, Chen YC, Zhang H, Zhang Z (2019) Study on secrecy capacity of wireless sensor networks in internet of things based on the amplify-and-forward compressed sensing scheme. IEEE Access 7:185580–185589

    Article  Google Scholar 

  20. Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 733–742

  21. He C, Shao J, Sun J (2018) An anomaly-introduced learning method for abnormal event detection. Multimed Tools Appl 77(22):29573–29588

    Article  Google Scholar 

  22. Hu W, Xiao X, Fu Z, Xie D, Tan T, Maybank S (2006) A system for learning statistical motion patterns. IEEE Trans Pattern Anal Mach Intell 28(9):1450–1464

    Article  Google Scholar 

  23. Huang C, Peng Z, Chen F, Jiang Q, Jiang G, Hu Q (2018) Efficient CU and PU decision based on neural network and gray level co-occurrence matrix for intra prediction of screen content coding. IEEE Access 6:46643–46655

    Article  Google Scholar 

  24. Huang C, Peng Z, Chen F, Jiang Q, Cui X, Jiang G (2019) Encoding complexity control for live video applications: an interpretable machine learning approach. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp 1456–1461

  25. Huang C, Peng Z, Xu Y, Chen F, Jiang Q, Zhang Y, Jiang G, Ho YS (2021) Online learning-based multi-stage complexity control for live video coding. IEEE Trans Image Process 30:641–656

    Article  MathSciNet  Google Scholar 

  26. Huang C, Wu Z, Wen J, Xu Y, Jiang Q, Wang Y (2021) Abnormal event detection using deep contrastive learning for intelligent video surveillance system. IEEE Trans Ind Inf. https://doi.org/10.1109/TII.2021.3122801

    Article  Google Scholar 

  27. Huang G, Ma Y, Liu X, Luo Y, Lu X, Blake MB (2014) Model-based automated navigation and composition of complex service mashups. IEEE Trans Serv Comput 8(3):494–506

    Article  Google Scholar 

  28. Huang G, Liu X, Ma Y, Lu X, Zhang Y, Xiong Y (2016) Programming situational mobile web applications with cloud-mobile convergence: an internetware-oriented approach. IEEE Trans Serv Comput 12(1):6–19

    Article  Google Scholar 

  29. Huang G, Xu M, Lin FX, Liu Y, Ma Y, Pushp S, Liu X (2017) Shuffledog: characterizing and adapting user-perceived latency of android apps. IEEE Trans Mob Comput 16(10):2913–2926

    Article  Google Scholar 

  30. Huang G, Luo C, Wu K, Ma Y, Zhang Y, Liu X (2019) Software-defined infrastructure for decentralized data lifecycle governance: principled design and open challenges. In: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), IEEE, pp 1674–1683

  31. Ionescu RT, Khan FS, Georgescu M, Shao L (2019) Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7834–7843

  32. Jiang F, Wu Y, Katsaggelos AK (2009) A dynamic hierarchical clustering method for trajectory-based unusual video event detection. IEEE Trans Image Process 18(4):907–913

    Article  MathSciNet  Google Scholar 

  33. Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: International Conference on Learning Representations (ICLR)

  34. Lee S, Kim HG, Ro YM (2020) Bman: bidirectional multi-scale aggregation networks for abnormal event detection. IEEE Trans Image Process 29:2395–2408

    Article  Google Scholar 

  35. Li N, Chang F, Liu C (2020) Spatial-temporal cascade autoencoder for video anomaly detection in crowded scenes. IEEE Trans Multimed 23:203–215

    Article  Google Scholar 

  36. Lin B, Huang Y, Zhang J, Hu J, Chen X, Li J (2019) Cost-driven off-loading for DNN-based applications over cloud, edge, and end devices. IEEE Trans Ind Inf 16(8):5456–5466

    Article  Google Scholar 

  37. Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection—a new baseline. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 6536–6545

  38. Liu X, Huang G, Zhao Q, Mei H, Blake MB (2014) iMashup: a mashup-based framework for service composition. Sci China Inf Sci 57(1):1–20

    Article  Google Scholar 

  39. Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 fps in matlab. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp 2720–2727

  40. Luo W, Liu W, Gao S (2017) Remembering history with convolutional lstm for anomaly detection. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp 439–444

  41. Luo W, Liu W, Lian D, Tang J, Duan L, Peng X, Gao S (2019) Video anomaly detection with sparse coding inspired deep neural networks. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, p 1

  42. Lv H, Chen C, Cui Z, Xu C, Li Y, Yang J (2021) Learning normal dynamics in videos with meta prototype network. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp 15425–15434

  43. Lv H, Zhou C, Cui Z, Xu C, Li Y, Yang J (2021) Localizing anomalies from weakly-labeled videos. IEEE Trans Image Process 30:4505–4515

    Article  Google Scholar 

  44. Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 1975–1981

  45. Mathieu M, Couprie C, LeCun Y (2016) Deep multi-scale video prediction beyond mean square error. In: 4th International Conference on Learning Representations (ICLR)

  46. Mestav KR, Tong L (2020) Universal data anomaly detection via inverse generative adversary network. IEEE Signal Process Lett 27:511–515

    Article  Google Scholar 

  47. Mo X, Monga V, Bala R, Fan Z (2014) Adaptive sparse representations for video anomaly detection. IEEE Trans Circuits Syst Video Technol 24(4):631–645

    Article  Google Scholar 

  48. Morais R, Le V, Tran T, Saha B, Mansour M, Venkatesh S (2019) Learning regularity in skeleton trajectories for anomaly detection in videos. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 11988–11996

  49. Nawaratne R, Alahakoon D, De Silva D, Yu X (2020) Spatiotemporal anomaly detection using deep learning for real-time video surveillance. IEEE Trans Ind Inf 16(1):393–402

    Article  Google Scholar 

  50. Nguyen P, Ramanan D, Fowlkes C (2019) Weakly-supervised action localization with background modeling. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp 5501–5510

  51. Nguyen TN, Meunier J (2019) Anomaly detection in video sequence with appearance-motion correspondence. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 1273–1283

  52. Park H, Noh J, Ham B (2020) Learning memory-guided normality for anomaly detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 14360–14369

  53. Peng Z, Huang C, Chen F, Jiang G, Cui X, Yu M (2019) Multiple classifier-based fast coding unit partition for intra coding in future video coding. Signal Process Image Commun 78:171–179

    Article  Google Scholar 

  54. Peng Z, Chen F, Jiang D, Huang C, Jiang G, Yu M, Li J (2021) Inter-layer correlation-based adaptive bit allocation for enhancement layer in scalable high efficiency video coding. Signal Process Image Commun 95:116256

    Article  Google Scholar 

  55. Saligrama V, Chen Z (2012) Video anomaly detection based on local statistical aggregates. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2112–2119

  56. Santhosh KK, Dogra DP, Roy PP, Chaudhuri BB (2019) Trajectory-based scene understanding using Dirichlet process mixture model. IEEE Trans Cybern 53:1–26

    Google Scholar 

  57. Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. In: Advances in Neural Information Processing Systems (NeurIPS), pp 3483–3491

  58. Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6479–6488

  59. Wang J, Cherian A (2019) Gods: generalized one-class discriminative subspaces for anomaly detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 8200–8210

  60. Wang X, Che Z, Yang K, Jiang B, Tang J, Ye J, Wang J, Qi Q (2020) Robust unsupervised video anomaly detection by multi-path frame prediction. arXiv preprint arXiv:201102763

  61. Wang X, Che Z, Yang K, Jiang B, Tang J, Ye J, Wang J, Qi Q (2020) Robust unsupervised video anomaly detection by multi-path frame prediction. arXiv preprint arXiv:201102763v1

  62. Wu P, Liu J (2021) Learning causal temporal relation and feature discrimination for anomaly detection. IEEE Trans Image Process 30:3513–3527

    Article  Google Scholar 

  63. Wu P, Liu J, Shi Y, Sun Y, Shao F, Wu Z, Yang Z (2020) Not only look, but also listen: learning multimodal violence detection under weak supervision. In: European Conference on Computer Vision (ECCV)

  64. Ye M, Zhang X, Yuen PC, Chang S (2019) Unsupervised embedding learning via invariant and spreading instance feature. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 6203–6212

  65. Yu Q, Aizawa K (2019) Unsupervised out-of-distribution detection by maximum classifier discrepancy. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp 9517–9525

  66. Zaheer MZ, Mahmood A, Astrid M, Lee SI (2020) Claws: clustering assisted weakly supervised learning with normalcy suppression for anomalous event detection. In: European Conference on Computer Vision (ECCV)

  67. Zaheer MZ, Mahmood A, Shin H, Lee SI (2020) A self-reasoning framework for anomaly detection using video-level labels. IEEE Signal Process Lett 27:1705–1709

    Article  Google Scholar 

  68. Zaigham Zaheer M, Lee JH, Astrid M, Lee SI (2020) Old is gold: redefining the adversarially learned one-class classifier training paradigm. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 14171–14181

  69. Zhang J, Qing L, Miao J (2019) Temporal convolutional network with complementary inner bag loss for weakly supervised anomaly detection. In: IEEE International Conference on Image Processing (ICIP), pp 4030–4034

  70. Zhao Y, Deng B, Shen C, Liu Y, Lu H, Hua X (2017) Spatio-temporal autoencoder for video anomaly detection. In: the 25th ACM International Conference on Multimedia (ACM MM), pp 1933–1941

  71. Zhong J, Li N, Kong W, Liu S, Li TH, Li G (2019) Graph convolutional label noise cleaner: train a plug-and-play action classifier for anomaly detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 1237–1246

  72. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2921–2929

  73. Zhu Y, Newsam S (2019) Motion-aware feature for improved video anomaly detection. In: British Machine Vision Conference (BMVC)

Download references

Acknowledgements

This research was funded by the Scientific Research and Innovation Team Foundation of Zhejiang Business Technology Institute under Grant KYTD202103; and the Scientific research Project of Zhejiang Provincial Department of Education, Y202147736

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hualin Ma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, H., Zhang, L. Attention-based framework for weakly supervised video anomaly detection. J Supercomput 78, 8409–8429 (2022). https://doi.org/10.1007/s11227-021-04190-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-04190-9

Keywords

Navigation