skip to main content
10.1145/3230833.3232809acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaresConference Proceedingsconference-collections
research-article

Breaking down violence: A deep-learning strategy to model and classify violence in videos

Published: 27 August 2018 Publication History

Abstract

Detecting violence in videos through automatic means is significant for law enforcement and analysis of surveillance cameras with the intent of maintaining public safety. Moreover, it may be a great tool for protecting children from accessing inappropriate content and help parents make a better informed decision about what their kids should watch. However, this is a challenging problem since the very definition of violence is broad and highly subjective. Hence, detecting such nuances from videos with no human supervision is not only technical, but also a conceptual problem. With this in mind, we explore how to better describe the idea of violence for a convolutional neural network by breaking it into more objective and concrete parts. Initially, our method uses independent networks to learn features for more specific concepts related to violence, such as fights, explosions, blood, etc. Then we use these features to classify each concept and later fuse them in a meta-classification to describe violence. We also explore how to represent time-based events in still-images as network inputs; since many violent acts are described in terms of movement. We show that using more specific concepts is an intuitive and effective solution, besides being complementary to form a more robust definition of violence. When compared to other methods for violence detection, this approach holds better classification quality while using only automatic features.

References

[1]
Sandra Avila, Daniel Moreira, Mauricio Perez, Daniel Moraes, Isabela Cota, Vanessa Testoni, Eduardo Valle, Siome Goldenstein, Anderson Rocha, et al. 2014. RECOD at MediaEval 2014: Violent scenes detection task. In Working Notes Proceedings of the MediaEval 2014 Workshop, Barcelona, Spain, October 16-17 (CEUR Workshop Proceedings). CEUR-WS.org.
[2]
Ming-yu Chen and Alexander Hauptmann. 2009. MoSIFT: Recognizing human actions in surveillance videos. In CMU-CS-09-161, Carnegie Mellon University (2009).
[3]
Wen-Huang Cheng, Wei-Ta Chu, and Ja-Ling Wu. 2003. Semantic Context Detection Based on Hierarchical Audio Models. In Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval. New York, NY, USA, 109--115.
[4]
Qi Dai, Jian Tu, Ziqiang Shi, Yu-Gang Jiang, and Xiangyang Xue. 2013. Fudan at MediaEval 2013: Violent Scenes Detection Using Motion Features and Part-Level Attributes. In MediaEval.
[5]
Qi Dai, Rui-Wei Zhao, Zuxuan Wu, Xi Wang, Zichen Gu, Wenhai Wu, and Yu-Gang Jiang. 2015. Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning. In Working Notes Proceedings of the MediaEval 2015 Workshop, Wurzen, Germany, September 14-15 (CEUR Workshop Proceedings), Vol. 1436.CEUR-WS.org. http://dblp.uni-trier.de/db/conf/mediaeval/mediaeval2015.html#DaiZWWGWJ15
[6]
Fillipe D.M. De Souza, Guillermo C. Chavez, Eduardo A. do Valle Jr., and Arnaldo de A. Araújo. {n. d.}. Violence detection in video using spatio-temporal features. In 23rd SIBGRAPI: Conference on Graphics, Patterns and Images, 2010. IEEE, 224--230.
[7]
Fillipe Dias Moreira de Souza, Eduardo Valle, Guillermo Cámara Chávez, and Arnaldo de Albuquerque Araújo. 2011. Color-Aware Local Spatiotemporal Features for Action Recognition. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications - 16th Iberoamerican Congress, CIARP 2011, Pucón, Chile, November 15-18, 2011. Proceedings. 248--255.
[8]
C. H. Demarty, B. Ionescu, Y. G. Jiang, V L. Quang, M. Schedl, and C. Penet. 2014. Benchmarking Violent Scenes Detection in movies. In 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI). 1--6.
[9]
Claire-Hélène Demarty, Cédric Penet, Guillaume Gravier, and Mohammad Soleymani. 2012. A Benchmarking Campaign for the Multimodal Detection of Violent Scenes in Movies. In Computer Vision -- ECCV 2012. Workshops and Demonstrations, Andrea Fusiello, Vittorio Murino, and Rita Cucchiara (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 416--425.
[10]
Claire-Hélène Demarty, Cédric Penet, Markus Schedl, Bogdan Ionescu, Vu Lam Quang, and Yu-Gang Jiang. {n. d.}. Benchmarking Violent Scenes Detection in movies. In 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI).
[11]
Nadia Derbas, Bahjat Safadi, and Georges Quénot. 2013. LIG at MediaEval 2013 Affect Task: Use of a Generic Method and Joint Audio-Visual Words. In MediaEval.
[12]
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of machine learning research 9, Aug (2008), 1871--1874.
[13]
Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D Convolutional Neural Networks for Human Action Recognition. In IEEE Transactions on Pattern Analysis andMachine Intelligence (PAMI), Vol. 35. 221--231.
[14]
Vu Lam, Duy-Dinh Le, Sang Phan, Shin'ichi Satoh, and Duc Anh Duong. 2013. NII-UIT at MediaEval 2013 Violent Scenes Detection Affect Task. In MediaEval.
[15]
Vu Lam, Duy-Dinh Le, Sang Phan, Shin'ichi Satoh, and Duc Anh Duong. 2014. NII-UIT at MediaEval 2014 Violent Scenes Detection Affect Task. In Working Notes Proceedings of the MediaEval 2014 Workshop, Barcelona, Spain, October 16-17 (CEUR Workshop Proceedings). CEUR-WS.org.
[16]
Vu Lam, Sang Phan Le, Duy-Dinh Le, Shin'ichi Satoh, and Duc Anh Duong. 2015. NII-UIT at MediaEval 2015 Affective Impact of Movies Task. In Working Notes Proceedings of the MediaEval 2015 Workshop, Wurzen, Germany, September 14-15 (CEUR Workshop Proceedings), Vol. 1436. CEUR-WS.org.
[17]
Ivan Laptev. 2005. On space-time interest points. International journal of computer vision 64, 2--3 (2005), 107--123.
[18]
Jian Lin and Weiqiang Wang. 2009. Weakly-Supervised Violence Detection in Movies with Audio and Video Based Co-training. In Advances in Multimedia Information Processing - PCM 2009. Springer Berlin Heidelberg, Berlin, Heidelberg, 930--935.
[19]
Daniel Moreira, Sandra Avila, Mauricio Perez, Daniel Moraes, Vanessa Testoni, Eduardo Valle, Siome Goldenstein, and Anderson Rocha. 2016. Pornography classification: The hidden clues in video space--time. Forensic Science International 268 (2016), 46--61.
[20]
Enrique Bermejo Nievas, Oscar Deniz Suarez, Gloria Bueno García, and Rahul Sukthankar. 2011. Violence Detection in Video Using Computer Vision Techniques. In Proceedings of the 14th International Conference on Computer Analysis of Images and Patterns - Volume Part II. Springer-Verlag, 332--339.
[21]
Marin Vlastelica P., Sergey Hayrapetyan, Makarand Tapaswi, and Rainer Stiefelhagen. 2015. KIT at MediaEval 2015 - Evaluating Visual Cues for Affective Impact of Movies Task. In Working Notes Proceedings of the MediaEval 2015 Workshop, Wurzen, Germany, September 14-15 (CEUR Workshop Proceedings), Vol. 1436. CEUR-WS.org. http://ceur-ws.org/Vol-1436/Paper30.pdf
[22]
Mauricio Perez, Sandra Avila, Daniel Moreira, Daniel Moraes, Vanessa Testoni, Eduardo Valle, Siome Goldenstein, and Anderson Rocha. 2017. Video pornography detection through deep learning techniques and motion information. Neurocomputing 230 (2017), 279--293.
[23]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper with Convolutions. In Computer Vision and Pattern Recognition (CVPR). http://arxiv.org/abs/1409.4842
[24]
Chun Chet Tan and Chong-Wah Ngo. 2013. The Vireo Team at MediaEval 2013: Violent Scenes Detection by Mid-level Concepts Learnt from Youtube. In MediaEval.
[25]
Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. 2013. Dense Trajectories and Motion Boundary Descriptors for Action Recognition. 103 (05 2013).
[26]
Jianxin Wu. 2012. Power mean SVM for large scale visual classification. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2344--2351.
[27]
Yun Yi, Hanli Wang, Bowen Zhang, and Jian Yu. 2015. MIC-TJU in MediaEval 2015 Affective Impact of Movies Task. In Working Notes Proceedings of the MediaEval 2015 Workshop, Wurzen, Germany, September 14-15 (CEUR Workshop Proceedings), Vol. 1436. CEUR-WS.org. http://dblp.uni-trier.de/db/conf/mediaeval/mediaeval2015.html#YiWZY15

Cited By

View all
  • (2024)IIVRS: an Intelligent Image and Video Rating System to Provide Scenario-Based Content for Different UsersInteracting with Computers10.1093/iwc/iwae03436:6(406-415)Online publication date: 21-Jul-2024
  • (2024)Revisiting vision-based violence detection in videos: A critical analysisNeurocomputing10.1016/j.neucom.2024.128113597(128113)Online publication date: Sep-2024
  • (2023)Hybrid CNN-LSTM Model for Automated Violence Detection and Classification in Surveillance Systems2023 12th International Conference on System Modeling & Advancement in Research Trends (SMART)10.1109/SMART59791.2023.10428538(169-175)Online publication date: 22-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ARES '18: Proceedings of the 13th International Conference on Availability, Reliability and Security
August 2018
603 pages
ISBN:9781450364485
DOI:10.1145/3230833
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Universität Hamburg: Universität Hamburg

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 August 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deep-learning
  2. Semantic Concept Detection
  3. Violence Classification

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ARES 2018

Acceptance Rates

ARES '18 Paper Acceptance Rate 128 of 260 submissions, 49%;
Overall Acceptance Rate 228 of 451 submissions, 51%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)IIVRS: an Intelligent Image and Video Rating System to Provide Scenario-Based Content for Different UsersInteracting with Computers10.1093/iwc/iwae03436:6(406-415)Online publication date: 21-Jul-2024
  • (2024)Revisiting vision-based violence detection in videos: A critical analysisNeurocomputing10.1016/j.neucom.2024.128113597(128113)Online publication date: Sep-2024
  • (2023)Hybrid CNN-LSTM Model for Automated Violence Detection and Classification in Surveillance Systems2023 12th International Conference on System Modeling & Advancement in Research Trends (SMART)10.1109/SMART59791.2023.10428538(169-175)Online publication date: 22-Dec-2023
  • (2023)Using two-stream EfficientNet-BiLSTM network for multiclass classification of disturbing YouTube videosMultimedia Tools and Applications10.1007/s11042-023-15774-383:12(36519-36546)Online publication date: 17-May-2023
  • (2022)Deep Learning for Activity Recognition Using Audio and VideoElectronics10.3390/electronics1105078211:5(782)Online publication date: 3-Mar-2022
  • (2022)Human Activity Classification Using the 3DCNN ArchitectureApplied Sciences10.3390/app1202093112:2(931)Online publication date: 17-Jan-2022
  • (2021)What should we pay attention to when classifying violent videos?The 16th International Conference on Availability, Reliability and Security10.1145/3465481.3470059(1-10)Online publication date: 17-Aug-2021
  • (2021)Detecting Violent Arm Movements Using CNN-LSTM2021 5th International Conference on Electrical Information and Communication Technology (EICT)10.1109/EICT54103.2021.9733510(1-6)Online publication date: 17-Dec-2021
  • (2021)Suspicious Activity Recognition Using Proposed Deep L4-Branched-Actionnet With Entropy Coded Ant Colony System OptimizationIEEE Access10.1109/ACCESS.2021.30910819(89181-89197)Online publication date: 2021
  • (2020)Violent Behavioral Activity Classification Using Artificial Neural Network2020 New Trends in Signal Processing (NTSP)10.1109/NTSP49686.2020.9229532(1-5)Online publication date: 14-Oct-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media