Abstract
Action recognition is an active research area in computer vision as it has enormous applications in today’s world, out of which, recognizing violent action is of great importance since it is closely related to our safety and security. An intelligent surveillance system is the idea of automatically recognizing suspicious activities in surveillance videos and thereby supporting security personals to take up right action on the right time. Under this area, most of the researchers were focused on people detection and tracking, loitering, etc., whereas detecting violent actions or fights is comparatively a less studied area. Previous works considered the local spatiotemporal feature extractors; however, it accompanies the overhead of complex optical flow estimation. Even though the temporal derivative is a fast alternative to optical flow, it alone gives very low accuracy and scales-dependent result. Hence, here we propose a cascaded method of violence detection based on motion boundary SIFT (MoBSIFT) and movement filtering. In this method, the surveillance videos are checked through a movement filtering algorithm based on temporal derivative and avoid most of the nonviolent actions from going through feature extraction. Only the filtered frames may allow going through feature extraction. In addition to scale-invariant feature transform (SIFT) and histogram of optical flow feature, motion boundary histogram is also extracted and combined to form MoBSIFT descriptor. The experimental results show that the proposed MoBSIFT outperforms the existing methods in accuracy by its high tolerance to camera movements. Time complexity has also proved to be reduced by the use of movement filtering along with MoBSIFT.
Similar content being viewed by others
References
de Souza FD, Chavez GC, do Valle EA, de A Araujo A (2010) Violence detection in video using spatio-temporal features. In: 23rd SIBGRAPI conference on graphics, patterns and images, pp 224–230
Deniz O, Serrano I, Bueno G, Tae-Tyun K (2014) Fast violence detection in video. In: VISAPP 2014 proceedings of the 9th international conference on computer vision theory and applications, pp 478–485
Bermejo E, Deni O, Bueno G, Sukthankar R. (2011) Violence detection in video using computer vision techniques. In: Proceedings of the 14th international conference on computer analysis of images and patterns. Springer, pp 332–339
Ke S-R, Thuc H, Lee Y-J et al (2013) A review on video-based human activity recognition. Computers 2:88–131. https://doi.org/10.3390/computers2020088
Chen M, Hauptmann A (2009) MoSIFT: recognizing human actions in surveillance videos. Technical report, Carnegie Mellon University, Pittsburgh, USA
Dalal N, Triggs B, Schmid C (2006) Human Detection using oriented histograms of flow and appearance. In: Proceedings of 9th ECCV, pp 428–441
Giannakopoulos T, Kosmopoulos D, Aristidou A, Theodoridis S (2006) Violence content classification using audio features. In: Proceedings of the 4th helenic conference on advances in artificial intelligence. Springer, pp 502–507
Gong Y, Wang W, Jiang S, Huang Q, Gao W (2008) Detecting violent scenes in movies by auditory and visual cues. In: Proceedings of the 9th Pacific Rim conference on multimedia. Springer, Berlin, Heidelberg, pp 317–326
Lin J, Wang W (2009) Weakly-supervised violence detection in movies with audio and video based cotraining. In: Proceedings of the 10th Pacific Rim conference on multimedia. Springer, Berlin, Heidelberg, pp 930–935
Nam J, Alghoniemy M, Tewfik AH (1998) Audio-visual content-based violent scene characterization. In: Proceedings 1998 international conference on image processing. ICIP98 (Cat. No. 98CB36269). IEEE Comput. Soc, Chicago, USA, pp 353–357
Cheng W, Chu W, Ling J (2003) Semantic context detection based on hierarchical audio models. In: Proceedings of the ACM SIGMM workshop on multimedia information retrieval, pp. 109–115
Giannakopoulos T, Makris A, Kosmopoulos D, Perantonis S, Theodoridis S (2010) Audio-visual fusion for detecting violent scenes in videos. In: Artificial intelligence: theories, models and applications, pp 91–100
Chen L-H, Hsu H-W, Wang L-Y, Su C-W (2011) Violence detection in movies. In: 2011 Eighth international conference computer graphics, imaging and visualization. IEEE Comput. Soc, Washington, DC, USA, pp 119–124
Clarin C, Dionisio J, Echavez M, Naval P (2005) DOVE: Detection of movie violence using motion intensity analysis on skin and blood. Technical report, University of the Philippines
Zajdel W, Krijnders JD, Andringa T, Gavrila DM (2007) CASSANDRA: audio-video sensor fusion for aggression detection. In: 2007 IEEE conference on advanced video and signal based surveillance, pp 200–205
Datta A, Shah M, Da Vitoria Lobo N (2002) Person-on-person violence detection in video data. In: 16th international conference on pattern recognition, pp 433–438
Yun K, Honorio J, Chattopadhyay D, Berg TL, Samaras D (2012) Two-person interaction detection using body-pose features and multiple instance learning. In: IEEE computer society conference on computer vision and pattern recognition workshops, pp 28–35
Gao Z, Nie W, Liu A, Zhang H (2016) Evaluation of local spatial–temporal features for cross-view action recognition. Neurocomputing 173:110–117
Xu L, Gong C, Yang J, Wu Q, Yao L (2014) Violent video detection based on MoSIFT feature and sparse coding. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3538–3542
Hassner T, Itcher Y, Kliper-Gross O (2012) Violent flows: real-time detection of violent crowd behavior. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops. IEEE, Providence, USA, pp 1–6
Mousavi H, Mohammadi S, Perina A, Chellali R, Murino V (2015) Analyzing tracklets for the detection of abnormal crowd behavior. In: IEEE winter conference on applications of computer vision, pp 148–15
Colque RVHM, Junior CAC, Schwartz WR (2015) Histograms of optical flow orientation and magnitude to detect anomalous events in videos. In: 28th SIBGRAPI conference on graphics, patterns and images, pp 126–133
Gao Y, Liu H, Sun X, Wang C, Liu Y (2016) Violence detection using oriented violent flows. Image Vis Comput 48–49:37–41
Zhang T, Yang Z, Jia W, Yang B, Yang J, He X (2016) A new method for violence detection in surveillance scenes. Multimed Tools Appl 75:7327–7349
Zhang T, Jia W, He X, Yang J (2017) Discriminative dictionary learning with motion weber local descriptor for violence detection. IEEE Trans Circuits Syst Video Technol 27(3):696–709
Senst T, Eiselein V, Kuhn A, Sikora T (2017) Crowd violence detection using global motion-compensated lagrangian features and scale-sensitive video-level representation. IEEE Trans Inf Forensics Secur 12(12):2945–2956
Mabrouk AB, Zagrouba E (2017) Spatio-temporal feature using optical flow based distribution for violence detection. Pattern Recognit Lett 92:62–67
Gracia IS, Suarez OD, Garcia GB, Kim T-K (2015) Fast fight detection. PLoS ONE 10(4):e0120448. https://doi.org/10.1371/journal
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: 17th international conference on pattern recognition (ICPR’04), IEEE Comp. Soc. Washington, DC, USA, vol 3, pp 32–36
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253
Weinland D, Ronfard R, Boyer E (2006) Free viewpoint action recognition using motion history volumes. Comput Vis Image Underst 104:249–257
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Paul M, Haque SME, Chakraborty S (2013) Human detection in surveillance videos and its applications a review. EURASIP J Adv Signal Process 2013:176
Wang H, Klaser A, Schmid C, Liu C-L (2011) Action recognition by dense trajectories. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Colorado Springs, USA, pp 3169–3176
Liu M, Wang M, Wang J, Li D (2013) Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification: application to the recognition of orange beverage and Chinese vinegar. Sens Actuators B 177:970–980
Lorena AC, Jacintho Luis FO, Siqueira MF, De Giovanni R, Lohmann LG, de André CPLF, Carvalho MY (2011) Comparing machine learning classifiers in potential distribution modelling. Expert Syst Appl 38:5268–5275
Acknowledgements
I extent my gratitude toward Govt. Model Engineering College for providing all support for this work. I also appreciate the support provided by Bermejo et al. [3] by making Movies and Hockey dataset freely available to access.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Febin, I.P., Jayasree, K. & Joy, P.T. Violence detection in videos for an intelligent surveillance system using MoBSIFT and movement filtering algorithm. Pattern Anal Applic 23, 611–623 (2020). https://doi.org/10.1007/s10044-019-00821-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-019-00821-3