Skip to main content

Abstract

Violence has always been part of humanity, however, there are different types of violence, with physical violence being the most recurrent in our daily lives. This type of violence increasingly affects many people’s lives, so it is essential to try to combat violence. In recent years, human action recognition has been extensively studied, but mainly in video, an important computer vision area. Audio appears as a factor capable of circumventing these problems. Audio sensors can be omnidirectional, requiring less processing power and hardware and software performance when compared to the video. The audio can represent emotions. It is not affected by lighting or temperature problems, nor does it need to be at a favourable angle to capture the intended information. That said, audio is seen as the best way to recognize violence, applied with Machine Learning/Deep Learning/Transfer Learning techniques. In this paper we test a Convolutional Neural Network (CNN), a ResNet50, VGG16 and VGG19, in order to classify audios. Later we see that CNN obtains the best results, with a 92.44% accuracy in the test set. ResNet50 was the worst model used, obtaining an 86.34% accuracy. For the VGG models, both show a good potential but did not get better results than CNN.

Supported by organization ALGORITMI Centre.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Biblioteca moviepyhttps://github.com/Zulko/moviepy.

  2. 2.

    Biblioteca pydubhttps://github.com/jiaaro/pydub.

  3. 3.

    Biblioteca librosahttps://librosa.org/doc/latest/index.html.

References

  1. Souto, H., Mello, R., Furtado, A.: An acoustic scene classification approach involving domestic violence using machine learning. In: Anais do XVI Encontro Nacional de Inteligência Artificial e Computacional. SBC, 2019. APAV (2021). Estatisticas_APAV_Relatorio_Anual_2020.Pdf., apav.pt/apav_v3/images/pdf/Estatisticas_APAV_Relatorio_Anual_2020.pdf. Accessed 22 Oct 2021

  2. Durães, D., Santos, F., Marcondes, F.S., Lange, S., Machado, J.: Comparison of transfer learning behaviour in violence detection with different public datasets. In: Marreiros, G., Melo, F.S., Lau, N., Lopes Cardoso, H., Reis, L.P. (eds.) EPIA 2021. LNCS (LNAI), vol. 12981, pp. 290–298. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86230-5_23

    Chapter  Google Scholar 

  3. Souto, H., Mello, R., Furtado, A.: An acoustic scene classification approach involving domestic violence using machine learning. In: Anais do XVI Encontro Nacional de Inteligência Artificial e Computacional, pp. 705–716. SBC (2019)

    Google Scholar 

  4. Durães, D., Marcondes, F.S., Gonçalves, F., Fonseca, J., Machado, J., Novais, P.: Detection violent behaviors: a survey. In: Novais, P., Vercelli, G., Larriba-Pey, J.L., Herrera, F., Chamoso, P. (eds.) ISAmI 2020. AISC, vol. 1239, pp. 106–116. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-58356-9_11

    Chapter  Google Scholar 

  5. Hershey, S., et al.: CNN architectures for large-scale audio classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 131–135. IEEE (2017)

    Google Scholar 

  6. Crocco, M., Cristani, M., Trucco, A., Murino, V.: Audio surveillance: a systematic review. ACM Comput. Surv. (CSUR) 48(4), 1–46 (2016)

    Article  Google Scholar 

  7. Marcondes, F.S., Durães, D., Gonçalves, F., Fonseca, J., Machado, J., Novais, P.: In-vehicle violence detection in carpooling: a brief survey towards a general surveillance system. In: Dong, Y., Herrera-Viedma, E., Matsui, K., Omatsu, S., González Briones, A., Rodríguez González, S. (eds.) DCAI 2020. AISC, vol. 1237, pp. 211–220. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-53036-5_23

    Chapter  Google Scholar 

  8. Jesus, T., et al.: Review of trends in automatic human activity recognition using synthetic audio-visual data. In: Analide, C., Novais, P., Camacho, D., Yin, H. (eds.) IDEAL 2020. LNCS, vol. 12490, pp. 549–560. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62365-4_53

    Chapter  Google Scholar 

  9. Kong, Y., Fu, Y.: Human action recognition and prediction: a survey. Int. J. Comput. Vision 130(5), 1366–1401 (2022)

    Article  Google Scholar 

  10. Wu, Z., Shen, C., Van Den Hengel, A.: Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recogn. 90, 119–133 (2019)

    Article  Google Scholar 

  11. Soliman, M.M., Kamal, M.H., Nashed, M.A.E.M., Mostafa, Y.M., Chawky, B.S., Khattab, D.: Violence recognition from videos using deep learning techniques. In 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS), pp. 80–85. IEEE (2019)

    Google Scholar 

  12. Rapid-Rich Object Search Lab, NTU CCTV-Fights Dataset. https://rose1.ntu.edu.sg/dataset/cctvFights/. Accessed on 08 Jan 2022

  13. Wu, P., et al.: Not only look, but also listen: learning multimodal violence detection under weak supervision. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 322–339. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_20

    Chapter  Google Scholar 

  14. Santos, F., et al.: In-car violence detection based on the audio signal. In: Yin, H., et al. (eds.) IDEAL 2021. LNCS, vol. 13113, pp. 437–445. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-91608-4_43

    Chapter  Google Scholar 

  15. Nanni, L., Costa, Y.M., Aguiar, R.L., Mangolin, R.B., Brahnam, S., Silla, C.N.: Ensemble of convolutional neural networks to improve animal audio classification. EURASIP J. Audio Speech Music Process. 2020(1), 1–14 (2020)

    Article  Google Scholar 

  16. Gartzman, Dalya, Getting to Know the Mel Spectrogram (2019). https://towardsdatascience.com/getting-to-know-the-mel-spectrogram-31bca3e2d9d0. Accessed on 29 Jan 2022

  17. O’Shea, K., Nash, R.: An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458 (2015)

  18. Gujjar, J.P., Kumar, H.P., Chiplunkar, N.N.: Image classification and prediction using transfer learning in colab notebook. Global Transit. Proc. 2(2), 382–385 (2021)

    Article  Google Scholar 

  19. DarrenLevyOfficial (2021). https://www.youtube.com/watch?v=BB5Y0j8RLE4. Accessed 30 Jan 2022

Download references

Acknowledgements

This work is supported by: FCT Fundação para a Ciência e Tecnologia within the RD Units Project Scope: UIDB/00319/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dalila Durães .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Veloso, B., Durães, D., Novais, P. (2022). Analysis of Machine Learning Algorithms for Violence Detection in Audio. In: González-Briones, A., et al. Highlights in Practical Applications of Agents, Multi-Agent Systems, and Complex Systems Simulation. The PAAMS Collection. PAAMS 2022. Communications in Computer and Information Science, vol 1678. Springer, Cham. https://doi.org/10.1007/978-3-031-18697-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-18697-4_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-18696-7

  • Online ISBN: 978-3-031-18697-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics