Abstract
Identifying unusual crowd events is highly challenging, laborious, and prone to errors in video surveillance applications. We propose a novel end-to-end deep learning architecture called Stacked Denoising Auto-Encoder (DeepSDAE) to address these challenges, comprising SDAE, VGG16 and Plane-based one-class Support Vector Machine (SVM), abbreviated as PSVM, to detect anomalies such as stationary people in an active scene or loitering activities in a crowded scene. The DeepSDAE framework is a hybrid deep learning architecture. It consists of a four-layered SDAE and an enhanced convolutional neural network (CNN) model. Our framework employs Reinforcement Learning to optimise the learning parameters to detect crowd anomalies. We use the Markov Decision Process (MDP) with Deep Q-learning to find the optimal Q value. We also present a late fusion procedure to combine individual decisions from the intermediate and final layers of the SDAE and VGG16 networks to detect different anomalies. Our experiments on four real-world datasets reveal the superior performance of our proposed framework in detecting (frame-level and pixel-level) anomalies.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Sequence of states, actions and rewards that reach a terminal state
References
Varadarajan J, Odobez J-M (2009) Topic models for scene analysis and abnormality detection. In: 2009 IEEE 12th international conference on computer vision workshops, pp 1338–1345
Luff P, Heath C, Jirotka M (2000) Surveying the scene: technologies for everyday awareness and monitoring in control rooms. Interact Comput 13(2):193–228
Aggarwal JK, Cai Q (1999) Human motion analysis: a review. Comput Vis Image Underst 73(3):428–440
Krüger V, Kragic D, Ude A, Geib C (2007) The meaning of action: a review on action recognition and mapping. Adv Robot 21(13):1473–1501
Rao AS, Gubbi J, Rajasegarar S, Marusic S, Palaniswami M (2014) Detection of anomalous crowd behaviour using hyperspherical clustering. In: 2014 International conference on digital image computing: techniques and applications (DICTA), pp 1–8
Yang M, Rajasegarar S, Erfani SM, Leckie C (2019) Deep learning and one-class svm based anomalous crowd detection. In: 2019 International joint conference on neural networks (IJCNN). IEEE, pp 1–8
Erfani SM, Rajasegarar S, Karunasekera S, Leckie C (2016) High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recogn 58:121–134
(2013). UCSD anomaly detection dataset. http://www.svcl.ucsd.edu/projects/anomaly/dataset.html. Last Accessed 26 Feb 2022
(2013). Avenue dataset for abnormal event detection. http://www.cse.cuhk.edu.hk/leojia/projects/detectabnormal/dataset.html. Last Accessed 26 Feb 2022
Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 fps in matlab. In: ICCV, pp 2720–2727
Adam A, Rivlin E, Shimshoni I, Reinitz D (2008) Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans Pattern Anal Mach Intell 30(3):555–560
Rao AS, Gubbi J, Marusic S, Palaniswami M (2015) Estimation of crowd density by clustering motion cues. Vis Comput 31(11):1533–1552
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Mo X, Monga V, Bala R, Fan Z (2014) Adaptive sparse representations for video anomaly detection. IEEE Trans Circuits Syst Video Technol 24(4):631–645
Bird N, Atev S, Caramelli N, Martin R, Masoud O, Papanikolopoulos N (2006) Real time, online detection of abandoned objects in public areas. In: ICRA 2006. IEEE, pp 3775–3780
Mohammadi S, Perina A, Kiani H, Murino V (2016) Angry crowds: detecting violent events in videos. In: European conference on computer vision. Springer, pp 3–18
Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 1975–1981
Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection–a new baseline. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6536–6545
Luo W, Liu W, Gao S (2017) A revisit of sparse coding based anomaly detection in stacked RNN framework. In: Proceedings of the IEEE international conference on computer vision, pp 341–349
Xu D, Ricci E, Yan Y, Song J, Sebe N (2015) Learning deep representations of appearance and motion for anomalous event detection. arXiv:1510.01553
Feng Y, Yuan Y, Lu X (2016) Deep representation for abnormal event detection in crowded scenes. In: 2016 ACM on multimedia conference, pp 591–595
Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 733–742
Chong YS, Tay YH (2017) Abnormal event detection in videos using spatiotemporal autoencoder. In: International symposium on neural networks. Springer, pp 189–196
Dubey S, Boragule A, Gwak J, Jeon M (2021) Anomalous event recognition in videos based on joint learning of motion and appearance with multiple ranking measures. Appl Sci 11(3):1344
Morais R, Le V, Tran T, Saha B, Mansour M, Venkatesh S (2019) Learning regularity in skeleton trajectories for anomaly detection in videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11996–12004
Sabokrou M, Fayyaz M, Fathy M, Moayed Z, Klette R (2018) Deep-anomaly: fully convolutional neural network for fast anomaly detection in crowded scenes. Comput Vis Image Underst 172:88–97
Ravanbakhsh M, Nabi M, Mousavi H, Sangineto E, Sebe N (2018) Plug-and-play cnn for crowd motion analysis: an application ine abnormal event detection. In: 2018 IEEE winter conference on applications of computer vision (WACV), pp 1689–1698
Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) See more, know more: unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3623–3632
Lu X, Wang W, Shen J, Crandall D, Luo J (2020) Zero-shot video object segmentation with co-attention siamese networks. IEEE transactions on pattern analysis and machine intelligence
Mishra SR, Mishra TK, Sarkar A, Sanyal G (2020) Detection of anomalies in human action using optical flow and gradient tensor. In: Smart intelligent computing and applications. Springer, pp 561–570
Mishra SR, Mishra TK, Sanyal G, Sarkar A, Satapathy SC (2020) Real time human action recognition using triggered frame extraction and a typical cnn heuristic. Pattern Recogn Lett 135:329–336
Jafari MH, Luong C, Tsang M, Gu AN, Van Woudenberg N, Rohling R, Tsang T, Abolmaesumi P (2021) U-land: uncertainty-driven video landmark detection. IEEE Trans Med Imaging 41(4):793–804
Shao J, Loy CC, Wang X (2016) Learning scene-independent group descriptors for crowd understanding. IEEE Trans Circuits Syst Video Technol 27(6):1290–1303
Ghafoori Z, Rajasegarar S, Erfani SM, Karunasekera S, Leckie CA (2016) Unsupervised parameter estimation for one-class support vector machines. In: Pacific-asia conference on knowledge discovery and data mining. Springer, pp 183–195
Snoek CG, Worring M, Smeulders AW (2005) Early versus late fusion in semantic video analysis. In: Proceedings of the 13th annual ACM international conference on multimedia, pp 399–402
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Luo W, Liu W, Lian D, Tang J, Duan L, Peng X, Gao S (2019) Video anomaly detection with sparse coding inspired deep neural networks. IEEE Trans Pattern Anal Mach Intell 43(3):1070–1084
Kim J, Grauman K (2009) Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates. In: 2009 IEEE conference on computer vision and pattern recognition, pp 2921–2928
Reddy V, Sanderson C, Lovell BC (2011) Improved anomaly detection in crowded scenes via cell-based analysis of foreground speed, size and texture. In: CVPRW. IEEE, pp 55–61
Cong Y, Yuan J, Liu J (2011) Sparse reconstruction cost for abnormal event detection. In: CVPR. IEEE, pp 3449–3456
Leyva R, Sanchez V, Li C-T (2017) Video anomaly detection with compact feature sets for online performance. IEEE Trans Image Process 26(7):3463–3478
Turchini F, Seidenari L, Bimbo AD (2017) Convex polytope ensembles for spatio-temporal anomaly detection. In: International conference on image analysis and processing. Springer, pp 174–184
Chaker R, Al Aghbari Z, Junejo IN (2017) Social network model for crowd anomaly detection and localization. Pattern Recogn 61:266–281
Luo W, Liu W, Gao S (2017) Remembering history with convolutional lstm for anomaly detection. In: 2017 IEEE international conference on multimedia and expo (ICME). IEEE, pp 439–444
Ionescu RT, Smeureanu S, Popescu M, Alexe B (2018) Detecting abnormal events in video using narrowed motion clusters. arXiv:1801.05030
Smeureanu S, Ionescu RT, Popescu M, Alexe B (2017) Deep appearance features for abnormal behavior detection in video. In: International conference on image analysis and processing. Springer, pp 779–789
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Acknowledgments
The authors are very grateful to Editor and the anonymous reviewers for their valuable comments and suggestions that improved the presentation and quality of this paper highly. This work was supported by the Natural Science Foundation of China under Grants 12201523, and also supported by the Fundamental Research Funds for the Central Universities under Grants No. 2682021CX078.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, M., Tian, S., Rao, A.S. et al. An efficient deep neural model for detecting crowd anomalies in videos. Appl Intell 53, 15695–15710 (2023). https://doi.org/10.1007/s10489-022-04233-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04233-5