Real Time Violence Detection Based on Deep Spatio-Temporal Features

Xia, Qing; Zhang, Ping; Wang, JingJing; Tian, Ming; Fei, Chun

doi:10.1007/978-3-319-97909-0_17

Qing Xia²¹,
Ping Zhang²¹,
JingJing Wang²¹,
Ming Tian²¹ &
…
Chun Fei²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10996))

Included in the following conference series:

Chinese Conference on Biometric Recognition

3359 Accesses
22 Citations

Abstract

Typical manually-selected features are insufficient to reliably detect violence actions. In this paper, we present a violence detection model that is based on a bi-channels convolutional neural network (CNN) and the support vector machine (SVM). The major contributions are twofolds: (1) we fork the original frames and the differential images into the proposed bi-channels CNN to obtain the appearance features and the motion features respectively. (2) The linear SVMs are adopted to classify the features and a label fusion approach is proposed to improve detection performance by integrating the appearance and motion information. We compared the proposed model with several state-of-the-art methods on two datasets. The results are promising and the proposed method can achieve real-time performance of 30 fps.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Laptev, I., Lindeberg, T.: On space-time interest points. Int. J. Comput. Vision 64(2–3), 107–123 (2005)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. In: CVPR 2005, pp. 886–893 (2005)
Google Scholar
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006). https://doi.org/10.1007/11744047_33
Chapter Google Scholar
De Souza, F.D.M., Chvez, G.C., Do Valle Jr., E.A., Arajo, A.D.A.: Violence detection in video using spatio-temporal features. In: Graphics, Patterns and Images, pp. 224–230 (2011)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)
Article Google Scholar
Hassner, T., Itcher, Y., Kliper-Gross, O.: Violent flows: real-time detection of violent crowd behavior. In: Computer Vision and Pattern Recognition Workshops, pp. 1–6 (2012)
Google Scholar
Bermejo Nievas, E., Deniz Suarez, O., Bueno García, G., Sukthankar, R.: Violence detection in video using computer vision techniques. In: Real, P., Diaz-Pernil, D., Molina-Abril, H., Berciano, A., Kropatsch, W. (eds.) CAIP 2011. LNCS, vol. 6855, pp. 332–339. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23678-5_39
Chapter Google Scholar
Chen, M.Y., Hauptmann, A.: Mosift: recognizing human actions in surveillance videos. Ann. Pharmacother. 39(1), 150–152 (2009)
Google Scholar
Xu, L., Gong, C., Yang, J., Wu, Q., Yao, L.: Violent video detection based on mosift feature and sparse coding, pp. 3538–3542 (2014)
Google Scholar
Wang, T., Snoussi, H.: Detection of abnormal visual events via global optical flow orientation histogram. IEEE Trans. Inf. Forensics Secur. 9(6), 988–998 (2014)
Article Google Scholar
Cong, Y., Yuan, J., Liu, J.: Abnormal event detection in crowded scenes using sparse representation. Pattern Recogn. 46(7), 1851–1864 (2013)
Article Google Scholar
Gnanavel, V.K., Srinivasan, A.: Abnormal event detection in crowded video scenes. In: Satapathy, S.C., Biswal, B.N., Udgata, S.K., Mandal, J.K. (eds.) Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014. AISC, vol. 328, pp. 441–448. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-12012-6_48
Chapter Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos, vol. 1, pp. 568–576 (2014)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. iN: International Conference on Computer Vision, ICCV 2015, pp. 4489–4497 (2015)
Google Scholar
Dong, Z., Qin, J., Wang, Y.: Multi-stream deep networks for person to person violence detection in videos, vol. 662, pp. 517–531 (2016)
Google Scholar
Meng, Z., Yuan, J., Li, Z.: Trajectory-pooled deep convolutional networks for violence detection in videos. In: Liu, M., Chen, H., Vincze, M. (eds.) ICVS 2017. LNCS, vol. 10528, pp. 437–447. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68345-4_39
Chapter Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition, pp. 1933–1941, January 2016
Google Scholar
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets (2014)
Google Scholar
Senst, T., Eiselein, V., Kuhn, A., Sikora, T.: Crowd violence detection using global motion-compensated lagrangian features and scale-sensitive video-level representation. IEEE Trans. Inf. Forensics Secur. 12(12), 2945–2956 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Optoelectronic Science and Engineering of UESTC, University of Electronic Science and Technology of China, Chengdu, China
Qing Xia, Ping Zhang, JingJing Wang, Ming Tian & Chun Fei

Authors

Qing Xia
View author publications
You can also search for this author in PubMed Google Scholar
Ping Zhang
View author publications
You can also search for this author in PubMed Google Scholar
JingJing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ming Tian
View author publications
You can also search for this author in PubMed Google Scholar
Chun Fei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Qing Xia or Ping Zhang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jie Zhou
Beihang University, Beijing, China
Yunhong Wang
Chinese Academy of Sciences, Beijing, China
Zhenan Sun
Xinjiang University, Urumqi, China
Zhenhong Jia
Tsinghua University, Beijing, China
Jianjiang Feng
Chinese Academy of Sciences, Beijing, China
Shiguang Shan
Xinjiang University, Urumqi, China
Kurban Ubul
Tsinghua University, Shenzhen, China
Zhenhua Guo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xia, Q., Zhang, P., Wang, J., Tian, M., Fei, C. (2018). Real Time Violence Detection Based on Deep Spatio-Temporal Features. In: Zhou, J., et al. Biometric Recognition. CCBR 2018. Lecture Notes in Computer Science(), vol 10996. Springer, Cham. https://doi.org/10.1007/978-3-319-97909-0_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-97909-0_17
Published: 09 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97908-3
Online ISBN: 978-3-319-97909-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics