Abstract
This work introduces a new approach to localize anomalies in surveillance video. The main novelty is the idea of using a Siamese convolutional neural network to learn a metric between a pair of video patches (spatiotemporal regions of video). The learned metric, which is not specific to the target video, is used to measure the perceptual distance between each video patch in the testing video and the video patches found in normal training video. If a testing video patch is far from all normal video patches, then it must be anomalous. We further generalize the approach from operating on video patches from a fixed grid to arbitrary-sized region proposals. We compare our approaches to previously published algorithms using four evaluation measures and three challenging target benchmark datasets. Experiments show that our approaches either surpass or perform comparably to current state-of-the-art methods while enjoying other favorable properties.
Similar content being viewed by others
References
Adam, A., Rivlin, E., Shimshoni, I., Reinitz, D.: Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans. Pattern Anal. Mach. Intell. 30(3), 555–560 (2008). https://doi.org/10.1109/TPAMI.2007.70825
Antic, B., Ommer, B.: Video parsing for abnormality detection. In: IEEE International Conference on Computer Vision, Barcelona, Spain, pp. 2415–2422 (2011). https://doi.org/10.1109/ICCV.2011.6126525
Antić, B., Ommer, B.: Spatio-temporal Video Parsing for Abnormality Detection. arXiv preprint arXiv:1502.06235 (2015)
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a siamese time delay neural network. In: Advances in Neural Information Processing Systems, pp. 737–744 (1994)
Chong, Y.S., Tay, Y.H.: Modeling Representation of Videos for Anomaly Detection Using Deep Learning: A Review. arXiv preprint arXiv:1505.00523 (2015)
Chong, Y.S., Tay, Y.H.: Abnormal Event Detection in Videos using Spatiotemporal Autoencoder. In: Advances in Neural Networks—ISNN 2017 Lecture Notes in Computer Science (2017)
Cong, Y., Yuan, J., Liu, J.: Abnormal event detection in crowded scenes using sparse representation. Pattern Recognit 46(7), 1851–1864 (2013). https://doi.org/10.1016/j.patcog.2012.11.021
Del Giorno, A., Bagnell, J.A., Hebert, M.: A Discriminative framework for anomaly detection in large videos. In: European Conference on Computer Vision (ECCV), pp. 334–349 (2016). https://doi.org/10.1007/978-3-319-46454-1_21
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior Recognition via Sparse Spatio-temporal Features. VS-PETS, Beijing (2005)
Feng, Y., Yuan, Y., Lu, X.: Learning deep event models for crowd anomaly detection. Neurocomputing 219, 548–556 (2017). https://doi.org/10.1016/j.neucom.2016.09.063
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2030–2096 (2016)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of Machine Learning Research (PMLR), pp. 249–256 (2010). http://proceedings.mlr.press/v9/glorot10a.html
Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: Matchnet: unifying feature and metric learning for pbased matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3279–3286 (2015)
Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning temporal regularity in video sequences. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 733–742 (2016). https://doi.org/10.1109/CVPR.2016.86
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Hinami, R., Mei, T., Satoh, S.: Joint detection and recounting of abnormal events by learning deep generic knowledge. In: IEEE International Conference on Computer Vision (ICCV), Venice, pp. 3639–3647 (2017) https://doi.org/10.1109/ICCV.2017.391
Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: International Workshop on Similarity-Based Pattern Recognition. Springer, pp. 84–92 (2015)
Ioffe, S.: Batch renormalization: towards reducing minibatch dependence in batch-normalized models. In: Advances in Neural Information Processing Systems, pp. 1945–1953 (2017)
Ionescu, R.T., Smeureanu, S., Alexe, B., Popescu, M.: Unmasking the abnormal events in video. In: IEEE International Conference on Computer Vision (ICCV), Venice, pp. 2914–2922 (2017). https://doi.org/10.1109/ICCV.2017.315
Ionescu, R.T., Khan, F.S., Georgescu, M.I., Shao, L.: Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7842–7851 (2019a)
Ionescu, R.T., Smeureanu, S., Popescu, M., Alexe, B.: Detecting abnormal events in video using narrowed normality clusters. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1951–1960 (2019b). https://doi.org/10.1109/WACV.2019.00212
Jones, M., Nikovski, D., Imamura, M., Hirata, T.: Exemplar learning for extremely efficient anomaly detection in real-valued time series. Data Min. Knowl. Discov. (DMKD) 30(6), 1427–1454 (2016)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Leyva, R., Sanchez, V., Li, C.T.: Video anomaly detection with compact feature sets for online performance. IEEE Trans. Image Process. 26(7), 3463–3478 (2017). https://doi.org/10.1109/TIP.2017.2695105
Liu, C.: Beyond Pixels: Exploring New Representations and Applications for Motion Analysis. MIT PhD Thesis (2009)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision (ECCV). Springer, pp. 21–37 (2016)
Liu, W., Luo, W., Lian, D., Gao, S.: Future frame prediction for anomaly detection–a new baseline. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6536–6545 (2018a)
Liu, Y., Li, C.L., Póczos, B.: Classifier two-sample test for video anomaly detections. In: British Machine Vision Conference (BMVC) (2018b)
Lobo, J.M., Jiménez-Valverde, A., Real, R.: AUC: a misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr 17(2), 145–151 (2008). https://doi.org/10.1111/j.1466-8238.2007.00358.x
Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 FPS in MATLAB. In: IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, pp. 2720–2727 (2013). https://doi.org/10.1109/ICCV.2013.338
Luo, W., Liu, W., Gao, S.: A revisit of sparse coding based anomaly detection in stacked RNN framework. In: IEEE International Conference on Computer Vision (ICCV), Venice, pp. 341–349 (2017). https://doi.org/10.1109/ICCV.2017.45
Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1975–1981 (2010). https://doi.org/10.1109/CVPR.2010.5539872
Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detection using social force model. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 935–942 (2009). http://ieeexplore.ieee.org/abstract/document/5206641/
Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: International Conference on Computer Vision Theory and Application VISSAPP’09). INSTICC Press, pp. 331–340 (2009)
Popoola, O.P., Wang, K.: Video-based abnormal human behavior recognition: a review. IEEE Trans. Syst., Man, Cybern., Part C (Appl. Rev.) 42(6), 865–878 (2012). https://doi.org/10.1109/TSMCC.2011.2178594
Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset Shift in Machine Learning. The MIT Press, Cambridge (2009)
Ramachandra, B., Jones, M.: Street scene: a new dataset and evaluation protocol for video anomaly detection. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2020)
Ramachandra, B., Jones, M., Vatsavai, R.: Learning a distance function with a siamese network to localize anomalies in videos. In: The IEEE Winter Conference on Applications of Computer Vision, pp. 2598–2607 (2020a)
Ramachandra, B., Jones, M., Vatsavai, R.R.: A survey of single-scene video anomaly detection. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2020b)
Ravanbakhsh, M., Nabi, M., Sangineto, E., Marcenaro, L., Regazzoni, C., Sebe, N.: Abnormal event detection in videos using generative adversarial nets. In: IEEE International Conference on Image Processing (ICIP), pp. 1577–1581 (2017). https://doi.org/10.1109/ICIP.2017.8296547
Ravanbakhsh, M., Nabi, M., Mousavi, H., Sangineto, E., Sebe, N.: Plug-and-play CNN for crowd motion analysis: an application in abnormal event detection. In: IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, pp. 1689–1698 (2018). https://doi.org/10.1109/WACV.2018.00188
Saligrama, V., Chen, Z.: Video anomaly detection based on local statistical aggregates. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 2112–2119 (2012)
Sjarif, N.N.A., Shamsuddin, S.M., Hashim, S.Z.: Detection of abnormal behaviors in crowd scene: a review. Int. J. Adv. Soft Comput. Appl. 4(1), 1–33 (2012)
Smeureanu, S., Ionescu, R.T., Popescu, M., Alexe, B.: Deep appearance features for abnormal behavior detection in video. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) International Conference on Image Analysis and Processing (ICIAP), pp. 779–789. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68548-9_70
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, pp. 6479–6488 (2018). https://doi.org/10.1109/CVPR.2018.00678
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 2818–2826 (2016). https://doi.org/10.1109/CVPR.2016.308
Turchini, F., Seidenari, L., Del Bimbo, A.: Convex polytope ensembles for spatio-temporal anomaly detection. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) International Conference on Image Analysis and Processing (ICIAP), Lecture Notes in Computer Science, pp. 174–184. Springer, Berlin (2017)
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Weixin, L., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IIEEE Trans. Pattern Anal. Mach. Intell. 36(1), 18–32 (2014). https://doi.org/10.1109/TPAMI.2013.111
Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2840–2848 (2017)
Wu, S., Moore, B.E., Shah, M.: Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2054–2060 (2010)
Xu, D., Ricci, E., Yan, Y., Song, J., Sebe, N.: Learning deep representations of appearance and motion for anomalous event detection. arXiv preprint arXiv:1510.01553 (2015)
Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4353–4361 (2015). https://doi.org/10.1109/CVPR.2015.7299064
Zhan, B., Monekosso, D.N., Remagnino, P., Velastin, S.A., Xu, L.Q.: Crowd analysis: a survey. Machine Vis. Appl. 19(5–6), 345–357 (2008). https://doi.org/10.1007/s00138-008-0132-4
Acknowledgements
The authors would like to thank Zexi Chen and Benjamin Dutton of the STAC lab at NC State University for relevant stimulating discussions. Funding was provided by Mitsubishi Electric Research Laboratories.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Rights and permissions
About this article
Cite this article
Ramachandra, B., Jones, M. & Vatsavai, R.R. Perceptual metric learning for video anomaly detection. Machine Vision and Applications 32, 63 (2021). https://doi.org/10.1007/s00138-021-01187-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-021-01187-5