Skip to main content

Multi-stream Deep Networks for Person to Person Violence Detection in Videos

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 662))

Abstract

Violence detection in videos has numerous applications, ranging from parental control and children protection to multimedia filtering and retrieval. A number of approaches have been proposed to detect vital clues for violent actions, among which most methods prefer employing trajectory based action recognition techniques. However, these methods can only model general characteristics of human actions, thus cannot well capture specific high order information of violent actions. Therefore, they are not suitable for detecting violence, which is typically intense and correlated with specific scenes. In this paper, we propose a novel framework, i.e., multi-stream deep convolutional neural networks, for person to person violence detection in videos. In addition to conventional spatial and temporal streams, we develop an acceleration stream to capture the important intense information usually involved in violent actions. Moreover, a simple and effective score-level fusion strategy is proposed to integrate multi-stream information. We demonstrate the effectiveness of our method on the typical violence dataset and extensive experimental results show its superiority over state-of-the-art methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Claire-Heilene, D.: VSD, a public dataset for the detection of violent scences in movies: design, annotation, ananlysis and evaluation. In: The Handbook of Brain Theory and Neural Networks, vol. 3361 (1995)

    Google Scholar 

  2. Dai, Q., Tu, J., Shi, Z., Jiang, Y.G., Xue, X.: Fudan at mediaeval 2013: violent scenes detection using motion features and part-level attributes. In: MediaEval (2013)

    Google Scholar 

  3. Dai, Q., Wu, Z., Jiang, Y.G., Xue, X., Tang, J.: Fudan-NJUST at mediaeval 2014: violent scenes detection using deep neural networks. In: MediaEval (2014)

    Google Scholar 

  4. Demarty, C.-H., Penet, C., Gravier, G., Soleymani, M.: A benchmarking campaign for the multimodal detection of violent scenes in movies. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7585, pp. 416–425. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33885-4_42

    Chapter  Google Scholar 

  5. Ding, C., Fan, S., Zhu, M., Feng, W., Jia, B.: Violence detection in video by using 3D convolutional neural networks. In: Bebis, G., et al. (eds.) ISVC 2014. LNCS, vol. 8888, pp. 551–558. Springer, Heidelberg (2014). doi:10.1007/978-3-319-14364-4_53

    Google Scholar 

  6. Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)

    Google Scholar 

  7. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

    Google Scholar 

  8. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)

    Article  Google Scholar 

  9. Hahn, M., Chen, S., Dehghan, A.: Deep tracking: visual tracking using deep convolutional networks (2015). arXiv preprint arXiv:1512.03993

  10. Ionescu, B., Schlüter, J., Mironica, I., Schedl, M.: A naive mid-level concept-based fusion approach to violence detection in hollywood movies. In: Proceedings of the 3rd ACM International Conference on Multimedia Retrieval, pp. 215–222. ACM (2013)

    Google Scholar 

  11. Jiang, Y.G., Dai, Q., Tan, C.C., Xue, X., Ngo, C.W.: The Shanghai-Hongkong team at mediaeval 2012: violent scene detection using trajectory-based features. In: MediaEval (2012)

    Google Scholar 

  12. Jiang, Y.-G., Dai, Q., Xue, X., Liu, W., Ngo, C.-W.: Trajectory-based modeling of human actions with motion reference points. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 425–438. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33715-4_31

    Chapter  Google Scholar 

  13. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

    Google Scholar 

  14. Martin, V., Glotin, H., Paris, S., Halkias, X., Prevot, J.M.: Violence detection in video by large scale multi-scale local binary pattern dynamics. In: MediaEval. Citeseer (2012)

    Google Scholar 

  15. Mohamed, A.r., Seide, F., Yu, D., Droppo, J., Stolcke, A., Zweig, G., Penn, G.: Deep bi-directional recurrent networks over spectral windows. In: ASRU (2015)

    Google Scholar 

  16. Na, S.H.: Deep learning for natural language processing and machine translation (2015)

    Google Scholar 

  17. Bermejo Nievas, E., Deniz Suarez, O., Bueno García, G., Sukthankar, R.: Violence detection in video using computer vision techniques. In: Real, P., Diaz-Pernil, D., Molina-Abril, H., Berciano, A., Kropatsch, W. (eds.) CAIP 2011. LNCS, vol. 6855, pp. 332–339. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23678-5_39

    Chapter  Google Scholar 

  18. Raj, A., Maturana, D., Scherer, S.: Multi-scale convolutional architecture for semantic segmentation (2015)

    Google Scholar 

  19. Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105(3), 222–245 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  20. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)

    Article  Google Scholar 

  21. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)

    Google Scholar 

  22. Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild (2012). arXiv preprint arXiv:1212.0402

  23. de Souza, F.D., Chávez, G.C., do Valle, E.A., Araujo, A.D.: Violence detection in video using spatio-temporal features. In: 2010 23rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pp. 224–230. IEEE (2010)

    Google Scholar 

  24. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks (2014). arXiv preprint arXiv:1412.0767

  25. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)

    Google Scholar 

  26. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)

    Google Scholar 

  27. Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4305–4314 (2015)

    Google Scholar 

  28. Weninger, F., Bergmann, J., Schuller, B.: Introducing current: the Munich open-source CUDA recurrent neural network toolkit. J. Mach. Learn. Res. 16(1), 547–551 (2015)

    MathSciNet  Google Scholar 

  29. Wu, J., Zhang, Y., Lin, W.: Towards good practices for action video encoding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2577–2584 (2014)

    Google Scholar 

  30. Wu, Z., Wang, X., Jiang, Y.G., Ye, H., Xue, X.: Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In: Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, pp. 461–470. ACM (2015)

    Google Scholar 

  31. Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694–4702 (2015)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the Hong Kong, Macao and Taiwan Science Technology Cooperation Program of China (No. L2015TGA9004), and the National Natural Science Foundation of China (No. 61573045).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Qin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Dong, Z., Qin, J., Wang, Y. (2016). Multi-stream Deep Networks for Person to Person Violence Detection in Videos. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 662. Springer, Singapore. https://doi.org/10.1007/978-981-10-3002-4_43

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3002-4_43

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3001-7

  • Online ISBN: 978-981-10-3002-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics