Skip to main content

Multimodal Violence Detection in Hollywood Movies: State-of-the-Art and Benchmarking

  • Chapter
  • First Online:

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

Abstract

This chapter introduces a benchmark evaluation targeting the detection of violent scenes in Hollywood movies. The evaluation was implemented in 2011 and 2012 as an affect task in the framework of the international MediaEval benchmark initiative. We report on these 2 years of evaluation, providing a detailed description of the dataset created, describing the state of the art by studying the results achieved by participants and providing a detailed analysis of two of the best performing multimodal systems. We elaborate on the lessons learned after 2 years to provide insights on future work emphasizing multimodal modeling and fusion.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.multimediaeval.org/

  2. 2.

    http://www.technicolor.com/

  3. 3.

    http://www.nist.gov/itl/iad/mig/sed.cfm

  4. 4.

    The development data is intended for designing and training the approaches.

  5. 5.

    The test set data is intended for the official benckmarking.

  6. 6.

    The Yaafe toolkit for audio feature extraction was used.

References

  1. Acar E, Albayrak S (2012) Dai lab at mediaeval 2012 affect task: the detection of violent scenes using affective features. In: MediaEval 2012, multimedia benchmark workshop

    Google Scholar 

  2. Acar E, Spiegel S, Albayrak S (2011) Mediaeval 2011 affect task: Violent scene detection combining audio and visual features with svm. In: MediaEval 2011, multimedia benchmark workshop

    Google Scholar 

  3. Baveye Y, Urban F, Chamaret C, Demoulin V, Hellier P (2013) Saliency-guided consistent color harmonization. Computational color imaging, Lecture notes in computer science, vol 7786. Springer, Berlin, pp 105–118

    Google Scholar 

  4. Chen LH, Hsu HW, Wang LY, Su CW (2011) Violence detection in movies. In: 8th IEEE international conference on computer graphics, imaging and visualization (CGIV 2011), pp 119–124

    Google Scholar 

  5. Chen LH, Su CW, Weng CF, Liao HYM (2009) Action Scene Detection With Support Vector Machines. J Multimedia 4:248–253. doi:10.4304/jmm.4.4.248-253

    Google Scholar 

  6. Cheng WH, Chu WT, Wu JL (2003) Semantic context detection based on hierarchical audio models. In: Proceedings of the 5th ACM SIGMM international workshop on multimedia information retrieval, pp 109–115

    Google Scholar 

  7. Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9:309–347. http://dx.doi.org/10.1007/BF00994110

  8. Datta A, Shah M, Da Vitoria Lobo N (2002) Person-on-person violence detection in video data. In: Proceedings of 16th IEEE international conference on pattern recognition, vol 1. pp 433–438

    Google Scholar 

  9. Demarty CH, Penet C, Gravier G, Soleymani M (2012) A benchmarking campaign for the multimodal detection of violent scenes in movies. In: Computer Vision-ECCV 2012. Workshops and demonstrations, Springer, pp 416–425

    Google Scholar 

  10. Derbas N, Thollard F, Safadi B, Quénot G (2012) Lig at mediaeval 2012 affect task: use of a generic method. In: MediaEval 2012, multimedia benchmark workshop

    Google Scholar 

  11. de Souza FDM, Chávez GC, do Valle E, de A Araujo A (2010) Violence detection in video using spatio-temporal features. In: 23rd IEEE conference on graphics, patterns and images (SIBGRAPI 2010), pp 224–230

    Google Scholar 

  12. de Weijer JV, Schmid C, Verbeek J, Larlus D (2009) Learning color names for real-world applications. IEEE Trans Image Process 18(7):1512–1523

    Article  MathSciNet  Google Scholar 

  13. Eyben F, Weninger F, Lehment N, Rigoll G, Schuller B (2012) Violent scenes detection with large, brute-forced acoustic and visual feature sets. In: MediaEval 2012, multimedia benchmark workshop

    Google Scholar 

  14. Giannakopoulos T, Makris A, Kosmopoulos D, Perantonis S, Theodoridis S (2010) Audio-visual fusion for detecting violent scenes in videos. In: Konstantopoulos S et al (eds) Artificial intelligence: theories, models and applications, Lecture notes in computer scienc, vol 6040. Springer, pp 91–100

    Google Scholar 

  15. Glotin H, Razik J, Paris S, Prevot JM (2011) Real-time entropic unsupervised violent scenes detection in hollywood movies - dyni @ mediaeval affect task 2011. In: MediaEval 2011, multimedia benchmark workshop

    Google Scholar 

  16. Gninkoun G, Soleymani M (2011) Automatic violence scenes detection: a multi-modal approach. In: MediaEval 2011, multimedia benchmark workshop

    Google Scholar 

  17. Gong Y, Wang W, Jiang S, Huang Q, Gao W (2008) Detecting violent scenes in movies by auditory and visual cues. In: Huang YM et al (eds) Advances in multimedia information processing - (PCM 2008), Lecture notes in computer science, vol 5353. Springer, pp 317–326

    Google Scholar 

  18. Gravier G, Demarty CH, Baghdadi S, Gros P (2012) Classification-oriented structure learning in bayesian networks for multimodal event detection in videos. Multimedia tools and applications, pp 1–17. doi: 10.1007/s11042-012-1169-y, http://dx.doi.org/10.1007/s11042-012-1169-y

  19. Hinton G, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R (2012) Improving neural networks by preventing co-adaptation of feature detectors. http://arxiv.org/abs/1207.0580

  20. Ionescu B, Buzuloiu V, Lambert P, Coquin D (2006) Improved cut detection for the segmentation of animation movies. In: IEEE international conference on acoustics, speech, and signal processing

    Google Scholar 

  21. Ionescu B, Schlüter J, Mironică I, Schedl M (2013) A naive mid-level concept-based fusion approach to violence detection in hollywood movies. In: Proceedings of the 3rd ACM international conference on multimedia retrieval, pp 215–222

    Google Scholar 

  22. Jiang YG, Dai Q, Tan CC, Xue X, Ngo CW (2012) The shanghai-hongkong team at mediaeval2012: Violent scene detection using trajectory-based features. In: MediaEval 2012, multimedia benchmark workshop

    Google Scholar 

  23. Kriegel B (2003) La violence à la télévision. rapport de la mission d’évaluation, d’analyse et de propositions relative aux représentations violentes à la télévision. Technical report, Ministère de la Culture et de la Communication, Paris

    Google Scholar 

  24. Krug EG, Mercy JA, Dahlberg LL, Zwi AB (2002) The world report on violence and health. The Lancet 360(9339):1083–1088 (2002). doi: 10.1016/S0140-6736(02)11133-0. http://www.sciencedirect.com/science/article/pii/S0140673602111330

  25. Lam V, Le DD, Le SP, Satoh S, Duong DA (2012) Nii, Japan at mediaeval 2012 violent scenes detection affect task. In: MediaEval 2011, multimedia benchmark workshop

    Google Scholar 

  26. Lam V, Le DD, Satoh S, Duong, DA (2011) Nii, Japan at mediaeval 2011 violent scenes detection task. In: MediaEval 2011, multimedia benchmark workshop

    Google Scholar 

  27. Lin J, Wang W (2009) Weakly-supervised violence detection in movies with audio and video based co-training. In: Advances in multimedia information processing-PCM 2009, Springer, pp 930–935

    Google Scholar 

  28. Lucas P (2002) Restricted Bayesian network structure learning. In: Advances in Bayesian networks, studies in fuzziness and soft computing, pp 217–232

    Google Scholar 

  29. Ludwig O, Delgado D, Goncalves V, Nunes U (2009) Trainable classifier-fusion schemes: An application to pedestrian detection. In: IEEE internation conference on intelligent transportation systems, pp 432–437

    Google Scholar 

  30. Martin V, Glotin H, Paris S, Halkias X, Prevot JM (2012) Violence detection in video by large scale multi-scale local binary pattern dynamics. In: MediaEval 2012, multimedia benchmark workshop

    Google Scholar 

  31. Nam J, Alghoniemy M, Tewfik AH (1998) Audio-visual content-based violent scene characterization. In: Proceedings of IEEE international conference on image processing (ICIP-98), vol 1. pp 353–357

    Google Scholar 

  32. Nievas EB, Suarez OD, García GB, Sukthankar R (2011) Violence detection in video using computer vision techniques. In: Computer analysis of images and patterns, Springer, pp 332–339

    Google Scholar 

  33. Penet C, Demarty CH, Gravier G, Gros P (2011) Technicolor and inria/irisa at mediaeval 2011: learning temporal modality integration with bayesian networks. In: MediaEval 2011, Multimedia Benchmark Workshop, CEUR Workshop Proceedings, vol 807. http://CEUR-WS.org

  34. Penet C, Demarty CH, Gravier G, Gros P (2013) Audio event detection in movies using multiple audio words and contextual Bayesian networks. In: Workshop on content-based multimedia indexing

    Google Scholar 

  35. Penet C, Demarty CH, Soleymani M, Gravier G, Gros P (2012) Technicolor/inria/imperial college london at the mediaeval 2012 violent scene detection task. In: MediaEval 2012, multimedia benchmark workshop

    Google Scholar 

  36. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536

    Article  Google Scholar 

  37. Safadi B, Quéenot G (2011) Lig at mediaeval 2011 affect task: use of a generic method. In: MediaEval 2011, multimedia benchmark, workshop

    Google Scholar 

  38. Schlüter J, Ionescu B, Mironică I, Schedl M (2012) Arf @ mediaeval 2012: an uninformed approach to violence detection in hollywood movies. In: MediaEval 2012, multimedia benchmark, workshop

    Google Scholar 

  39. Violence (1996) A public health priority. Technical Report, World Health Organization, Geneva, WHO/EHA/SPI.POA.2

    Google Scholar 

  40. Zajdel W, Krijnders JD, Andringa T, Gavrila DM (2007) Cassandra: audio-video sensor fusion for aggression detection. In: IEEE conference on advanced video and signal based surveillance (AVSS 2007), pp 200–205

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by the Quaero Program. We would also like to acknowledge the MediaEval Multimedia Benchmark for providing the framework to evaluate the task of violent scene detection. We also greatly appreciate our participants for giving us their consent to describe their systems and results in this paper. More information about the MediaEval campaign is available at: http://www.multimediaeval.org/. The working note proceedings of the MediaEval 2011 and 2012 which included the participants’ contributions can be found online at http://www.ceur-ws.org/Vol-807 and http://www.ceur-ws.org/Vol-927, respectively.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claire-Hélène Demarty .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Demarty, CH., Penet, C., Ionescu, B., Gravier, G., Soleymani, M. (2014). Multimodal Violence Detection in Hollywood Movies: State-of-the-Art and Benchmarking. In: Ionescu, B., Benois-Pineau, J., Piatrik, T., Quénot, G. (eds) Fusion in Computer Vision. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-05696-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05696-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05695-1

  • Online ISBN: 978-3-319-05696-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics