Perceptual metric learning for video anomaly detection

Ramachandra, Bharathkumar; Jones, Michael; Vatsavai, Ranga Raju

doi:10.1007/s00138-021-01187-5

Perceptual metric learning for video anomaly detection

Original Paper
Published: 22 March 2021

Volume 32, article number 63, (2021)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Bharathkumar Ramachandra¹,
Michael Jones ORCID: orcid.org/0000-0001-5215-2346² &
Ranga Raju Vatsavai¹

538 Accesses
5 Citations
Explore all metrics

Abstract

This work introduces a new approach to localize anomalies in surveillance video. The main novelty is the idea of using a Siamese convolutional neural network to learn a metric between a pair of video patches (spatiotemporal regions of video). The learned metric, which is not specific to the target video, is used to measure the perceptual distance between each video patch in the testing video and the video patches found in normal training video. If a testing video patch is far from all normal video patches, then it must be anomalous. We further generalize the approach from operating on video patches from a fixed grid to arbitrary-sized region proposals. We compare our approaches to previously published algorithms using four evaluation measures and three challenging target benchmark datasets. Experiments show that our approaches either surpass or perform comparably to current state-of-the-art methods while enjoying other favorable properties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

Anomaly detection in video surveillance: a supervised inception encoder approach

Article 26 February 2024

Conjoined triple deep network for video anomaly detection

Article 27 December 2023

Multi-scale Siamese prediction network for video anomaly detection

Article 18 June 2022

References

Adam, A., Rivlin, E., Shimshoni, I., Reinitz, D.: Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans. Pattern Anal. Mach. Intell. 30(3), 555–560 (2008). https://doi.org/10.1109/TPAMI.2007.70825
Article Google Scholar
Antic, B., Ommer, B.: Video parsing for abnormality detection. In: IEEE International Conference on Computer Vision, Barcelona, Spain, pp. 2415–2422 (2011). https://doi.org/10.1109/ICCV.2011.6126525
Antić, B., Ommer, B.: Spatio-temporal Video Parsing for Abnormality Detection. arXiv preprint arXiv:1502.06235 (2015)
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a siamese time delay neural network. In: Advances in Neural Information Processing Systems, pp. 737–744 (1994)
Chong, Y.S., Tay, Y.H.: Modeling Representation of Videos for Anomaly Detection Using Deep Learning: A Review. arXiv preprint arXiv:1505.00523 (2015)
Chong, Y.S., Tay, Y.H.: Abnormal Event Detection in Videos using Spatiotemporal Autoencoder. In: Advances in Neural Networks—ISNN 2017 Lecture Notes in Computer Science (2017)
Cong, Y., Yuan, J., Liu, J.: Abnormal event detection in crowded scenes using sparse representation. Pattern Recognit 46(7), 1851–1864 (2013). https://doi.org/10.1016/j.patcog.2012.11.021
Article Google Scholar
Del Giorno, A., Bagnell, J.A., Hebert, M.: A Discriminative framework for anomaly detection in large videos. In: European Conference on Computer Vision (ECCV), pp. 334–349 (2016). https://doi.org/10.1007/978-3-319-46454-1_21
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior Recognition via Sparse Spatio-temporal Features. VS-PETS, Beijing (2005)
Book Google Scholar
Feng, Y., Yuan, Y., Lu, X.: Learning deep event models for crowd anomaly detection. Neurocomputing 219, 548–556 (2017). https://doi.org/10.1016/j.neucom.2016.09.063
Article Google Scholar
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2030–2096 (2016)
MathSciNet MATH Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of Machine Learning Research (PMLR), pp. 249–256 (2010). http://proceedings.mlr.press/v9/glorot10a.html
Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: Matchnet: unifying feature and metric learning for pbased matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3279–3286 (2015)
Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning temporal regularity in video sequences. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 733–742 (2016). https://doi.org/10.1109/CVPR.2016.86
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
Hinami, R., Mei, T., Satoh, S.: Joint detection and recounting of abnormal events by learning deep generic knowledge. In: IEEE International Conference on Computer Vision (ICCV), Venice, pp. 3639–3647 (2017) https://doi.org/10.1109/ICCV.2017.391
Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: International Workshop on Similarity-Based Pattern Recognition. Springer, pp. 84–92 (2015)
Ioffe, S.: Batch renormalization: towards reducing minibatch dependence in batch-normalized models. In: Advances in Neural Information Processing Systems, pp. 1945–1953 (2017)
Ionescu, R.T., Smeureanu, S., Alexe, B., Popescu, M.: Unmasking the abnormal events in video. In: IEEE International Conference on Computer Vision (ICCV), Venice, pp. 2914–2922 (2017). https://doi.org/10.1109/ICCV.2017.315
Ionescu, R.T., Khan, F.S., Georgescu, M.I., Shao, L.: Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7842–7851 (2019a)
Ionescu, R.T., Smeureanu, S., Popescu, M., Alexe, B.: Detecting abnormal events in video using narrowed normality clusters. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1951–1960 (2019b). https://doi.org/10.1109/WACV.2019.00212
Jones, M., Nikovski, D., Imamura, M., Hirata, T.: Exemplar learning for extremely efficient anomaly detection in real-valued time series. Data Min. Knowl. Discov. (DMKD) 30(6), 1427–1454 (2016)
Article MathSciNet Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Leyva, R., Sanchez, V., Li, C.T.: Video anomaly detection with compact feature sets for online performance. IEEE Trans. Image Process. 26(7), 3463–3478 (2017). https://doi.org/10.1109/TIP.2017.2695105
Article MathSciNet MATH Google Scholar
Liu, C.: Beyond Pixels: Exploring New Representations and Applications for Motion Analysis. MIT PhD Thesis (2009)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision (ECCV). Springer, pp. 21–37 (2016)
Liu, W., Luo, W., Lian, D., Gao, S.: Future frame prediction for anomaly detection–a new baseline. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6536–6545 (2018a)
Liu, Y., Li, C.L., Póczos, B.: Classifier two-sample test for video anomaly detections. In: British Machine Vision Conference (BMVC) (2018b)
Lobo, J.M., Jiménez-Valverde, A., Real, R.: AUC: a misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr 17(2), 145–151 (2008). https://doi.org/10.1111/j.1466-8238.2007.00358.x
Article Google Scholar
Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 FPS in MATLAB. In: IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, pp. 2720–2727 (2013). https://doi.org/10.1109/ICCV.2013.338
Luo, W., Liu, W., Gao, S.: A revisit of sparse coding based anomaly detection in stacked RNN framework. In: IEEE International Conference on Computer Vision (ICCV), Venice, pp. 341–349 (2017). https://doi.org/10.1109/ICCV.2017.45
Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1975–1981 (2010). https://doi.org/10.1109/CVPR.2010.5539872
Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detection using social force model. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 935–942 (2009). http://ieeexplore.ieee.org/abstract/document/5206641/
Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: International Conference on Computer Vision Theory and Application VISSAPP’09). INSTICC Press, pp. 331–340 (2009)
Popoola, O.P., Wang, K.: Video-based abnormal human behavior recognition: a review. IEEE Trans. Syst., Man, Cybern., Part C (Appl. Rev.) 42(6), 865–878 (2012). https://doi.org/10.1109/TSMCC.2011.2178594
Article Google Scholar
Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset Shift in Machine Learning. The MIT Press, Cambridge (2009)
Google Scholar
Ramachandra, B., Jones, M.: Street scene: a new dataset and evaluation protocol for video anomaly detection. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2020)
Ramachandra, B., Jones, M., Vatsavai, R.: Learning a distance function with a siamese network to localize anomalies in videos. In: The IEEE Winter Conference on Applications of Computer Vision, pp. 2598–2607 (2020a)
Ramachandra, B., Jones, M., Vatsavai, R.R.: A survey of single-scene video anomaly detection. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2020b)
Ravanbakhsh, M., Nabi, M., Sangineto, E., Marcenaro, L., Regazzoni, C., Sebe, N.: Abnormal event detection in videos using generative adversarial nets. In: IEEE International Conference on Image Processing (ICIP), pp. 1577–1581 (2017). https://doi.org/10.1109/ICIP.2017.8296547
Ravanbakhsh, M., Nabi, M., Mousavi, H., Sangineto, E., Sebe, N.: Plug-and-play CNN for crowd motion analysis: an application in abnormal event detection. In: IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, pp. 1689–1698 (2018). https://doi.org/10.1109/WACV.2018.00188
Saligrama, V., Chen, Z.: Video anomaly detection based on local statistical aggregates. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 2112–2119 (2012)
Sjarif, N.N.A., Shamsuddin, S.M., Hashim, S.Z.: Detection of abnormal behaviors in crowd scene: a review. Int. J. Adv. Soft Comput. Appl. 4(1), 1–33 (2012)
Google Scholar
Smeureanu, S., Ionescu, R.T., Popescu, M., Alexe, B.: Deep appearance features for abnormal behavior detection in video. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) International Conference on Image Analysis and Processing (ICIAP), pp. 779–789. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68548-9_70
Chapter Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar
Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, pp. 6479–6488 (2018). https://doi.org/10.1109/CVPR.2018.00678
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 2818–2826 (2016). https://doi.org/10.1109/CVPR.2016.308
Turchini, F., Seidenari, L., Del Bimbo, A.: Convex polytope ensembles for spatio-temporal anomaly detection. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) International Conference on Image Analysis and Processing (ICIAP), Lecture Notes in Computer Science, pp. 174–184. Springer, Berlin (2017)
Google Scholar
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Article Google Scholar
Weixin, L., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IIEEE Trans. Pattern Anal. Mach. Intell. 36(1), 18–32 (2014). https://doi.org/10.1109/TPAMI.2013.111
Article Google Scholar
Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2840–2848 (2017)
Wu, S., Moore, B.E., Shah, M.: Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2054–2060 (2010)
Xu, D., Ricci, E., Yan, Y., Song, J., Sebe, N.: Learning deep representations of appearance and motion for anomalous event detection. arXiv preprint arXiv:1510.01553 (2015)
Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4353–4361 (2015). https://doi.org/10.1109/CVPR.2015.7299064
Zhan, B., Monekosso, D.N., Remagnino, P., Velastin, S.A., Xu, L.Q.: Crowd analysis: a survey. Machine Vis. Appl. 19(5–6), 345–357 (2008). https://doi.org/10.1007/s00138-008-0132-4
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Zexi Chen and Benjamin Dutton of the STAC lab at NC State University for relevant stimulating discussions. Funding was provided by Mitsubishi Electric Research Laboratories.

Author information

Authors and Affiliations

Department of Computer Science, North Carolina State University, 890 Oval Dr, Box 8206, Raleigh, NC, 27695, USA
Bharathkumar Ramachandra & Ranga Raju Vatsavai
Mitsubishi Electric Research Laboratories, 201 Broadway, 8th floor, Cambridge, MA, 02139, USA
Michael Jones

Authors

Bharathkumar Ramachandra
View author publications
You can also search for this author in PubMed Google Scholar
Michael Jones
View author publications
You can also search for this author in PubMed Google Scholar
Ranga Raju Vatsavai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Jones.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ramachandra, B., Jones, M. & Vatsavai, R.R. Perceptual metric learning for video anomaly detection. Machine Vision and Applications 32, 63 (2021). https://doi.org/10.1007/s00138-021-01187-5

Download citation

Received: 07 May 2020
Revised: 02 December 2020
Accepted: 12 February 2021
Published: 22 March 2021
DOI: https://doi.org/10.1007/s00138-021-01187-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Perceptual metric learning for video anomaly detection

Abstract

Access this article

Similar content being viewed by others

Anomaly detection in video surveillance: a supervised inception encoder approach

Conjoined triple deep network for video anomaly detection

Multi-scale Siamese prediction network for video anomaly detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Perceptual metric learning for video anomaly detection

Abstract

Access this article

Similar content being viewed by others

Anomaly detection in video surveillance: a supervised inception encoder approach

Conjoined triple deep network for video anomaly detection

Multi-scale Siamese prediction network for video anomaly detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation