Skip to main content
Log in

A novel two-stream structure for video anomaly detection in smart city management

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Video anomaly detection is the problem of detecting unusual events in videos. The challenges of this task lie mainly in the following aspects: first, unusual events tend to make up only a very small portion of a video, which means a large amount of useless information needs to be culled. It further aggravates the test of algorithm performance and the computing ability of devices. Second, anomaly detection techniques are always used in the surveillance system, which contains massive video data. The analysis of such large video data is difficult. Last, the feature extraction ability of the algorithm appears a high performance since unusual video streams may lie close to normal video. Benefiting from the development of deep learning-based in computer vision fields, the accuracy and the efficiency of video anomaly detection has been improved a lot during recent years. In this paper, we present a newly developed two-stream deep learning model, which uses a 3D convolutional neural network (C3D) structure as the feature extraction part, to handle this task. Both the sequence of frames and the optical flow are required as the input of the model. Then, features of these two streams will be extracted from C3D and traditional convolutional neural network (CNN). Finally, a fusion layer will be used to fuse both results of streams and generate a final detection. Our experimental results on UCF-Crime video dataset outperform other benchmark results such as traditional deep CNN and long short-term memory (LSTM) in terms of area under curve (AUC). As the result, our proposed method achieves the AUC of 85.18%, which is 3% higher than the second highest method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Melvin AAR, Kathrine GJW, Ilango SS, Vimal S, Rho S, Xiong NN, Nam Y (2021) Dynamic malware attack dataset leveraging virtual machine monitor audit data for the detection of intrusions in cloud. Transactions on Emerging Telecommunications Technologies

  2. Jiang F, Yuan J, Tsaftaris SA, Katsaggelos AK (2011) Anomalous video event detection using spatiotemporal context. Comput Vision Image Underst 115(3):323–333

    Article  Google Scholar 

  3. Tung F, Zelek JS, Clausi DA (2011) Goal-based trajectory analysis for unusual behaviour detection in intelligent surveillance. Image Vis Comput 29(4):230–240

    Article  Google Scholar 

  4. Calderara S, Heinemann U, Prati A, Cucchiara R, Tishby N (2011) Detecting anomalies in people’s trajectories using spectral graph analysis. Comput Vision Image Underst 115(8):1099–1111

    Article  Google Scholar 

  5. Narasimhan MG, Kamath S (2018) Dynamic video anomaly detection and localization using sparse denoising autoencoders. Multimed Tools Appl 77(11):13173–13195

    Article  Google Scholar 

  6. Adam A, Rivlin E, Shimshoni I, Reinitz D (2008) Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans Pattern Anal Mach Intell 30(3):555–560

    Article  Google Scholar 

  7. Wang S, Zhu E, Yin J, Porikli F (2018) Video anomaly detection and localization by local motion based joint video representation and ocelm. Neurocomputing 277:161–175

    Article  Google Scholar 

  8. Gong D, Liu L, Le V, Saha B, Mansour M. R, Venkatesh S, Hengel A. v. d (2019) Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 1705–1714

  9. Abati D, Porrello A, Calderara S, Cucchiara R (2019) Latent space autoregression for novelty detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 481–490

  10. Chong Y. S, Tay Y. H (2017) Abnormal event detection in videos using spatiotemporal autoencoder. in International Symposium on Neural Networks, pp. 189–196, Springer

  11. Zhou JT, Du J, Zhu H, Peng X, Liu Y, Goh RSM (2019) Anomalynet: an anomaly detection network for video surveillance. IEEE Trans Inf Forensics Secur 14(10):2537–2550

    Article  Google Scholar 

  12. Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection-a new baseline. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 6536–6545

  13. Medel JR, Savakis A (2016) Anomaly detection in video using predictive convolutional long short-term memory networks,” arXiv preprint arXiv:1612.00390

  14. Jiang F, Chen Z, Nazir A, Shi W, Lim W, Liu S, Rho S (2021) Combining fields of experts (foe) and k-svd methods in pursuing natural image priors. Journal of Visual Communication and Image Representation 78:103142

  15. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  16. Zhao Y, Zhang J, Man K. L (2020) Lstm-based model for unforeseeable event detection from video data,” in CICET 2020. p. 41

  17. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  18. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. in Proceedings of the IEEE International Conference on Computer Vision. pp 4489–4497

  19. Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 6479–6488

  20. Suarez JJP, Naval Jr PC (2020) A survey on deep learning techniques for video anomaly detection. arXiv preprint arXiv:2009.14146

  21. Maqsood M, Bukhari M, Ali Z, Gillani S, Mehmood I, Rho S, Jung Y (2021) A residual-learning-based multi-scale parallel-convolutions-assisted efficient cad system for liver tumor detection. Mathematics 9(10):1133

    Article  Google Scholar 

  22. Pang G, Yan C, Shen C, A. v. d. Hengel, X. Bai, (2020) Self-trained deep ordinal regression for end-to-end video anomaly detection. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 12173–12182

  23. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199

  24. Del Giorno A, Bagnell JA, Hebert M (2016) A discriminative framework for anomaly detection in large videos. Springer. in European Conference on Computer Vision. pp. 334–349, Springer, 2016

  25. Zhao Y, Man KL, Smith J, Siddique K, Guan S-U (2020) Improved two-stream model for human action recognition. EURASIP J Image Video Process 2020(1):1–9

    Article  Google Scholar 

  26. Parmar P, Tran Morris B (2017) Learning to score olympic events. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp 20–28

  27. Horn BK, Schunck BG (1981) Determining optical flow. Artif Intell 17(1–3):185–203

    Article  Google Scholar 

  28. Lucas BD, Kanade T et al (1981) An iterative image registration technique with an application to stereo vision. Vancouver, British Columbia

    Google Scholar 

  29. Baker S, Matthews I (2004) Lucas-kanade 20 years on: a unifying framework. Int J Comput Vision 56(3):221–255

    Article  Google Scholar 

  30. Noble WS (2006) What is a support vector machine? Nat Biotechnol 24(12):1565–1567

    Article  Google Scholar 

  31. Ketkar N (2017) Introduction to pytorch. In Deep learning with python. pp. 195–208, Springer

  32. Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27(8):861–874

    Article  MathSciNet  Google Scholar 

  33. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 2818–2826

  34. Lu C, Shi J, Jia J (2013) Abnormal event detection at 150. In Proceedings of the IEEE International Conference on Computer Vision. pp 2720–2727

  35. Zhong J-X, Li N, Kong W, Liu S, Li TH, Li G (2019) Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 1237–1246

  36. Gianchandani U, Tirupattur P, Shah M (2019) Weakly-supervised spatiotemporal anomaly detection. University of Central Florida Center for Research in Computer Vision REU

  37. Rathore MM, Paul A, Rho S, Khan M, Vimal S, Shah SA (2021) Smart traffic control: identifying driving-violations using fog devices with vehicular cameras in smart cities. Sustain Cities Soc 71:102986

    Article  Google Scholar 

  38. Bukhari M, Bajwa KB, Gillani S, Maqsood M, Durrani MY, Mehmood I, Ugail H, Rho S (2020) An efficient gait recognition method for known and unknown covariate conditions. IEEE Access 9:6465–6477

    Article  Google Scholar 

  39. Bilal M, Maqsood M, Yasmin S, Hasan NU, Rho S (2021) A transfer learning-based efficient spatiotemporal human action recognition framework for long and overlapping action classes. J Supercomput pp 1–36

  40. Dosovitskiy A, Beyer L, KolesnikovA, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al. (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

Download references

Acknowledgements

This article is supported by Xi’an Jiaotong-Liverpool University (XJTLU), Suzhou, China, with the Research Development Fund (RDF-15-01-01). Ka Lok Man wishes to thank the AI University Research Centre (AI-URC), Xi’an Jiaotong-Liverpool University, Suzhou, China, for supporting his related research contributions to this article through the XJTLU Key Programme Special Fund (KSF-E-65) and Suzhou-Leuven IoT & AI Cluster Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ka Lok Man.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, Y., Man, K.L., Smith, J. et al. A novel two-stream structure for video anomaly detection in smart city management. J Supercomput 78, 3940–3954 (2022). https://doi.org/10.1007/s11227-021-04007-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-04007-9

Keywords

Navigation