Abstract
Typical manually-selected features are insufficient to reliably detect violence actions. In this paper, we present a violence detection model that is based on a bi-channels convolutional neural network (CNN) and the support vector machine (SVM). The major contributions are twofolds: (1) we fork the original frames and the differential images into the proposed bi-channels CNN to obtain the appearance features and the motion features respectively. (2) The linear SVMs are adopted to classify the features and a label fusion approach is proposed to improve detection performance by integrating the appearance and motion information. We compared the proposed model with several state-of-the-art methods on two datasets. The results are promising and the proposed method can achieve real-time performance of 30 fps.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Laptev, I., Lindeberg, T.: On space-time interest points. Int. J. Comput. Vision 64(2–3), 107–123 (2005)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. In: CVPR 2005, pp. 886–893 (2005)
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006). https://doi.org/10.1007/11744047_33
De Souza, F.D.M., Chvez, G.C., Do Valle Jr., E.A., Arajo, A.D.A.: Violence detection in video using spatio-temporal features. In: Graphics, Patterns and Images, pp. 224–230 (2011)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)
Hassner, T., Itcher, Y., Kliper-Gross, O.: Violent flows: real-time detection of violent crowd behavior. In: Computer Vision and Pattern Recognition Workshops, pp. 1–6 (2012)
Bermejo Nievas, E., Deniz Suarez, O., Bueno GarcÃa, G., Sukthankar, R.: Violence detection in video using computer vision techniques. In: Real, P., Diaz-Pernil, D., Molina-Abril, H., Berciano, A., Kropatsch, W. (eds.) CAIP 2011. LNCS, vol. 6855, pp. 332–339. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23678-5_39
Chen, M.Y., Hauptmann, A.: Mosift: recognizing human actions in surveillance videos. Ann. Pharmacother. 39(1), 150–152 (2009)
Xu, L., Gong, C., Yang, J., Wu, Q., Yao, L.: Violent video detection based on mosift feature and sparse coding, pp. 3538–3542 (2014)
Wang, T., Snoussi, H.: Detection of abnormal visual events via global optical flow orientation histogram. IEEE Trans. Inf. Forensics Secur. 9(6), 988–998 (2014)
Cong, Y., Yuan, J., Liu, J.: Abnormal event detection in crowded scenes using sparse representation. Pattern Recogn. 46(7), 1851–1864 (2013)
Gnanavel, V.K., Srinivasan, A.: Abnormal event detection in crowded video scenes. In: Satapathy, S.C., Biswal, B.N., Udgata, S.K., Mandal, J.K. (eds.) Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014. AISC, vol. 328, pp. 441–448. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-12012-6_48
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos, vol. 1, pp. 568–576 (2014)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. iN: International Conference on Computer Vision, ICCV 2015, pp. 4489–4497 (2015)
Dong, Z., Qin, J., Wang, Y.: Multi-stream deep networks for person to person violence detection in videos, vol. 662, pp. 517–531 (2016)
Meng, Z., Yuan, J., Li, Z.: Trajectory-pooled deep convolutional networks for violence detection in videos. In: Liu, M., Chen, H., Vincze, M. (eds.) ICVS 2017. LNCS, vol. 10528, pp. 437–447. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68345-4_39
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition, pp. 1933–1941, January 2016
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets (2014)
Senst, T., Eiselein, V., Kuhn, A., Sikora, T.: Crowd violence detection using global motion-compensated lagrangian features and scale-sensitive video-level representation. IEEE Trans. Inf. Forensics Secur. 12(12), 2945–2956 (2017)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Xia, Q., Zhang, P., Wang, J., Tian, M., Fei, C. (2018). Real Time Violence Detection Based on Deep Spatio-Temporal Features. In: Zhou, J., et al. Biometric Recognition. CCBR 2018. Lecture Notes in Computer Science(), vol 10996. Springer, Cham. https://doi.org/10.1007/978-3-319-97909-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-97909-0_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97908-3
Online ISBN: 978-3-319-97909-0
eBook Packages: Computer ScienceComputer Science (R0)