Skip to main content
Log in

Perceptual metric learning for video anomaly detection

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

This work introduces a new approach to localize anomalies in surveillance video. The main novelty is the idea of using a Siamese convolutional neural network to learn a metric between a pair of video patches (spatiotemporal regions of video). The learned metric, which is not specific to the target video, is used to measure the perceptual distance between each video patch in the testing video and the video patches found in normal training video. If a testing video patch is far from all normal video patches, then it must be anomalous. We further generalize the approach from operating on video patches from a fixed grid to arbitrary-sized region proposals. We compare our approaches to previously published algorithms using four evaluation measures and three challenging target benchmark datasets. Experiments show that our approaches either surpass or perform comparably to current state-of-the-art methods while enjoying other favorable properties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Adam, A., Rivlin, E., Shimshoni, I., Reinitz, D.: Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans. Pattern Anal. Mach. Intell. 30(3), 555–560 (2008). https://doi.org/10.1109/TPAMI.2007.70825

    Article  Google Scholar 

  2. Antic, B., Ommer, B.: Video parsing for abnormality detection. In: IEEE International Conference on Computer Vision, Barcelona, Spain, pp. 2415–2422 (2011). https://doi.org/10.1109/ICCV.2011.6126525

  3. Antić, B., Ommer, B.: Spatio-temporal Video Parsing for Abnormality Detection. arXiv preprint arXiv:1502.06235 (2015)

  4. Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a siamese time delay neural network. In: Advances in Neural Information Processing Systems, pp. 737–744 (1994)

  5. Chong, Y.S., Tay, Y.H.: Modeling Representation of Videos for Anomaly Detection Using Deep Learning: A Review. arXiv preprint arXiv:1505.00523 (2015)

  6. Chong, Y.S., Tay, Y.H.: Abnormal Event Detection in Videos using Spatiotemporal Autoencoder. In: Advances in Neural Networks—ISNN 2017 Lecture Notes in Computer Science (2017)

  7. Cong, Y., Yuan, J., Liu, J.: Abnormal event detection in crowded scenes using sparse representation. Pattern Recognit 46(7), 1851–1864 (2013). https://doi.org/10.1016/j.patcog.2012.11.021

    Article  Google Scholar 

  8. Del Giorno, A., Bagnell, J.A., Hebert, M.: A Discriminative framework for anomaly detection in large videos. In: European Conference on Computer Vision (ECCV), pp. 334–349 (2016). https://doi.org/10.1007/978-3-319-46454-1_21

  9. Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior Recognition via Sparse Spatio-temporal Features. VS-PETS, Beijing (2005)

    Book  Google Scholar 

  10. Feng, Y., Yuan, Y., Lu, X.: Learning deep event models for crowd anomaly detection. Neurocomputing 219, 548–556 (2017). https://doi.org/10.1016/j.neucom.2016.09.063

    Article  Google Scholar 

  11. Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2030–2096 (2016)

    MathSciNet  MATH  Google Scholar 

  12. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of Machine Learning Research (PMLR), pp. 249–256 (2010). http://proceedings.mlr.press/v9/glorot10a.html

  13. Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: Matchnet: unifying feature and metric learning for pbased matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3279–3286 (2015)

  14. Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning temporal regularity in video sequences. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 733–742 (2016). https://doi.org/10.1109/CVPR.2016.86

  15. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)

    Article  Google Scholar 

  16. Hinami, R., Mei, T., Satoh, S.: Joint detection and recounting of abnormal events by learning deep generic knowledge. In: IEEE International Conference on Computer Vision (ICCV), Venice, pp. 3639–3647 (2017) https://doi.org/10.1109/ICCV.2017.391

  17. Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: International Workshop on Similarity-Based Pattern Recognition. Springer, pp. 84–92 (2015)

  18. Ioffe, S.: Batch renormalization: towards reducing minibatch dependence in batch-normalized models. In: Advances in Neural Information Processing Systems, pp. 1945–1953 (2017)

  19. Ionescu, R.T., Smeureanu, S., Alexe, B., Popescu, M.: Unmasking the abnormal events in video. In: IEEE International Conference on Computer Vision (ICCV), Venice, pp. 2914–2922 (2017). https://doi.org/10.1109/ICCV.2017.315

  20. Ionescu, R.T., Khan, F.S., Georgescu, M.I., Shao, L.: Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7842–7851 (2019a)

  21. Ionescu, R.T., Smeureanu, S., Popescu, M., Alexe, B.: Detecting abnormal events in video using narrowed normality clusters. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1951–1960 (2019b). https://doi.org/10.1109/WACV.2019.00212

  22. Jones, M., Nikovski, D., Imamura, M., Hirata, T.: Exemplar learning for extremely efficient anomaly detection in real-valued time series. Data Min. Knowl. Discov. (DMKD) 30(6), 1427–1454 (2016)

    Article  MathSciNet  Google Scholar 

  23. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  24. Leyva, R., Sanchez, V., Li, C.T.: Video anomaly detection with compact feature sets for online performance. IEEE Trans. Image Process. 26(7), 3463–3478 (2017). https://doi.org/10.1109/TIP.2017.2695105

    Article  MathSciNet  MATH  Google Scholar 

  25. Liu, C.: Beyond Pixels: Exploring New Representations and Applications for Motion Analysis. MIT PhD Thesis (2009)

  26. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision (ECCV). Springer, pp. 21–37 (2016)

  27. Liu, W., Luo, W., Lian, D., Gao, S.: Future frame prediction for anomaly detection–a new baseline. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6536–6545 (2018a)

  28. Liu, Y., Li, C.L., Póczos, B.: Classifier two-sample test for video anomaly detections. In: British Machine Vision Conference (BMVC) (2018b)

  29. Lobo, J.M., Jiménez-Valverde, A., Real, R.: AUC: a misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr 17(2), 145–151 (2008). https://doi.org/10.1111/j.1466-8238.2007.00358.x

    Article  Google Scholar 

  30. Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 FPS in MATLAB. In: IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, pp. 2720–2727 (2013). https://doi.org/10.1109/ICCV.2013.338

  31. Luo, W., Liu, W., Gao, S.: A revisit of sparse coding based anomaly detection in stacked RNN framework. In: IEEE International Conference on Computer Vision (ICCV), Venice, pp. 341–349 (2017). https://doi.org/10.1109/ICCV.2017.45

  32. Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1975–1981 (2010). https://doi.org/10.1109/CVPR.2010.5539872

  33. Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detection using social force model. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 935–942 (2009). http://ieeexplore.ieee.org/abstract/document/5206641/

  34. Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: International Conference on Computer Vision Theory and Application VISSAPP’09). INSTICC Press, pp. 331–340 (2009)

  35. Popoola, O.P., Wang, K.: Video-based abnormal human behavior recognition: a review. IEEE Trans. Syst., Man, Cybern., Part C (Appl. Rev.) 42(6), 865–878 (2012). https://doi.org/10.1109/TSMCC.2011.2178594

    Article  Google Scholar 

  36. Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset Shift in Machine Learning. The MIT Press, Cambridge (2009)

    Google Scholar 

  37. Ramachandra, B., Jones, M.: Street scene: a new dataset and evaluation protocol for video anomaly detection. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2020)

  38. Ramachandra, B., Jones, M., Vatsavai, R.: Learning a distance function with a siamese network to localize anomalies in videos. In: The IEEE Winter Conference on Applications of Computer Vision, pp. 2598–2607 (2020a)

  39. Ramachandra, B., Jones, M., Vatsavai, R.R.: A survey of single-scene video anomaly detection. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2020b)

  40. Ravanbakhsh, M., Nabi, M., Sangineto, E., Marcenaro, L., Regazzoni, C., Sebe, N.: Abnormal event detection in videos using generative adversarial nets. In: IEEE International Conference on Image Processing (ICIP), pp. 1577–1581 (2017). https://doi.org/10.1109/ICIP.2017.8296547

  41. Ravanbakhsh, M., Nabi, M., Mousavi, H., Sangineto, E., Sebe, N.: Plug-and-play CNN for crowd motion analysis: an application in abnormal event detection. In: IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, pp. 1689–1698 (2018). https://doi.org/10.1109/WACV.2018.00188

  42. Saligrama, V., Chen, Z.: Video anomaly detection based on local statistical aggregates. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 2112–2119 (2012)

  43. Sjarif, N.N.A., Shamsuddin, S.M., Hashim, S.Z.: Detection of abnormal behaviors in crowd scene: a review. Int. J. Adv. Soft Comput. Appl. 4(1), 1–33 (2012)

    Google Scholar 

  44. Smeureanu, S., Ionescu, R.T., Popescu, M., Alexe, B.: Deep appearance features for abnormal behavior detection in video. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) International Conference on Image Analysis and Processing (ICIAP), pp. 779–789. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68548-9_70

    Chapter  Google Scholar 

  45. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  46. Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, pp. 6479–6488 (2018). https://doi.org/10.1109/CVPR.2018.00678

  47. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 2818–2826 (2016). https://doi.org/10.1109/CVPR.2016.308

  48. Turchini, F., Seidenari, L., Del Bimbo, A.: Convex polytope ensembles for spatio-temporal anomaly detection. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) International Conference on Image Analysis and Processing (ICIAP), Lecture Notes in Computer Science, pp. 174–184. Springer, Berlin (2017)

    Google Scholar 

  49. Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)

    Article  Google Scholar 

  50. Weixin, L., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IIEEE Trans. Pattern Anal. Mach. Intell. 36(1), 18–32 (2014). https://doi.org/10.1109/TPAMI.2013.111

    Article  Google Scholar 

  51. Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2840–2848 (2017)

  52. Wu, S., Moore, B.E., Shah, M.: Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2054–2060 (2010)

  53. Xu, D., Ricci, E., Yan, Y., Song, J., Sebe, N.: Learning deep representations of appearance and motion for anomalous event detection. arXiv preprint arXiv:1510.01553 (2015)

  54. Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4353–4361 (2015). https://doi.org/10.1109/CVPR.2015.7299064

  55. Zhan, B., Monekosso, D.N., Remagnino, P., Velastin, S.A., Xu, L.Q.: Crowd analysis: a survey. Machine Vis. Appl. 19(5–6), 345–357 (2008). https://doi.org/10.1007/s00138-008-0132-4

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Zexi Chen and Benjamin Dutton of the STAC lab at NC State University for relevant stimulating discussions. Funding was provided by Mitsubishi Electric Research Laboratories.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Jones.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ramachandra, B., Jones, M. & Vatsavai, R.R. Perceptual metric learning for video anomaly detection. Machine Vision and Applications 32, 63 (2021). https://doi.org/10.1007/s00138-021-01187-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-021-01187-5

Keywords

Navigation