Abstract
Violence has always been part of humanity, however, there are different types of violence, with physical violence being the most recurrent in our daily lives. This type of violence increasingly affects many people’s lives, so it is essential to try to combat violence. In recent years, human action recognition has been extensively studied, but mainly in video, an important computer vision area. Audio appears as a factor capable of circumventing these problems. Audio sensors can be omnidirectional, requiring less processing power and hardware and software performance when compared to the video. The audio can represent emotions. It is not affected by lighting or temperature problems, nor does it need to be at a favourable angle to capture the intended information. That said, audio is seen as the best way to recognize violence, applied with Machine Learning/Deep Learning/Transfer Learning techniques. In this paper we test a Convolutional Neural Network (CNN), a ResNet50, VGG16 and VGG19, in order to classify audios. Later we see that CNN obtains the best results, with a 92.44% accuracy in the test set. ResNet50 was the worst model used, obtaining an 86.34% accuracy. For the VGG models, both show a good potential but did not get better results than CNN.
Supported by organization ALGORITMI Centre.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Biblioteca moviepyhttps://github.com/Zulko/moviepy.
- 2.
Biblioteca pydubhttps://github.com/jiaaro/pydub.
- 3.
Biblioteca librosahttps://librosa.org/doc/latest/index.html.
References
Souto, H., Mello, R., Furtado, A.: An acoustic scene classification approach involving domestic violence using machine learning. In: Anais do XVI Encontro Nacional de Inteligência Artificial e Computacional. SBC, 2019. APAV (2021). Estatisticas_APAV_Relatorio_Anual_2020.Pdf., apav.pt/apav_v3/images/pdf/Estatisticas_APAV_Relatorio_Anual_2020.pdf. Accessed 22 Oct 2021
Durães, D., Santos, F., Marcondes, F.S., Lange, S., Machado, J.: Comparison of transfer learning behaviour in violence detection with different public datasets. In: Marreiros, G., Melo, F.S., Lau, N., Lopes Cardoso, H., Reis, L.P. (eds.) EPIA 2021. LNCS (LNAI), vol. 12981, pp. 290–298. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86230-5_23
Souto, H., Mello, R., Furtado, A.: An acoustic scene classification approach involving domestic violence using machine learning. In: Anais do XVI Encontro Nacional de Inteligência Artificial e Computacional, pp. 705–716. SBC (2019)
Durães, D., Marcondes, F.S., Gonçalves, F., Fonseca, J., Machado, J., Novais, P.: Detection violent behaviors: a survey. In: Novais, P., Vercelli, G., Larriba-Pey, J.L., Herrera, F., Chamoso, P. (eds.) ISAmI 2020. AISC, vol. 1239, pp. 106–116. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-58356-9_11
Hershey, S., et al.: CNN architectures for large-scale audio classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 131–135. IEEE (2017)
Crocco, M., Cristani, M., Trucco, A., Murino, V.: Audio surveillance: a systematic review. ACM Comput. Surv. (CSUR) 48(4), 1–46 (2016)
Marcondes, F.S., Durães, D., Gonçalves, F., Fonseca, J., Machado, J., Novais, P.: In-vehicle violence detection in carpooling: a brief survey towards a general surveillance system. In: Dong, Y., Herrera-Viedma, E., Matsui, K., Omatsu, S., González Briones, A., Rodríguez González, S. (eds.) DCAI 2020. AISC, vol. 1237, pp. 211–220. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-53036-5_23
Jesus, T., et al.: Review of trends in automatic human activity recognition using synthetic audio-visual data. In: Analide, C., Novais, P., Camacho, D., Yin, H. (eds.) IDEAL 2020. LNCS, vol. 12490, pp. 549–560. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62365-4_53
Kong, Y., Fu, Y.: Human action recognition and prediction: a survey. Int. J. Comput. Vision 130(5), 1366–1401 (2022)
Wu, Z., Shen, C., Van Den Hengel, A.: Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recogn. 90, 119–133 (2019)
Soliman, M.M., Kamal, M.H., Nashed, M.A.E.M., Mostafa, Y.M., Chawky, B.S., Khattab, D.: Violence recognition from videos using deep learning techniques. In 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS), pp. 80–85. IEEE (2019)
Rapid-Rich Object Search Lab, NTU CCTV-Fights Dataset. https://rose1.ntu.edu.sg/dataset/cctvFights/. Accessed on 08 Jan 2022
Wu, P., et al.: Not only look, but also listen: learning multimodal violence detection under weak supervision. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 322–339. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_20
Santos, F., et al.: In-car violence detection based on the audio signal. In: Yin, H., et al. (eds.) IDEAL 2021. LNCS, vol. 13113, pp. 437–445. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-91608-4_43
Nanni, L., Costa, Y.M., Aguiar, R.L., Mangolin, R.B., Brahnam, S., Silla, C.N.: Ensemble of convolutional neural networks to improve animal audio classification. EURASIP J. Audio Speech Music Process. 2020(1), 1–14 (2020)
Gartzman, Dalya, Getting to Know the Mel Spectrogram (2019). https://towardsdatascience.com/getting-to-know-the-mel-spectrogram-31bca3e2d9d0. Accessed on 29 Jan 2022
O’Shea, K., Nash, R.: An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458 (2015)
Gujjar, J.P., Kumar, H.P., Chiplunkar, N.N.: Image classification and prediction using transfer learning in colab notebook. Global Transit. Proc. 2(2), 382–385 (2021)
DarrenLevyOfficial (2021). https://www.youtube.com/watch?v=BB5Y0j8RLE4. Accessed 30 Jan 2022
Acknowledgements
This work is supported by: FCT Fundação para a Ciência e Tecnologia within the RD Units Project Scope: UIDB/00319/2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Veloso, B., Durães, D., Novais, P. (2022). Analysis of Machine Learning Algorithms for Violence Detection in Audio. In: González-Briones, A., et al. Highlights in Practical Applications of Agents, Multi-Agent Systems, and Complex Systems Simulation. The PAAMS Collection. PAAMS 2022. Communications in Computer and Information Science, vol 1678. Springer, Cham. https://doi.org/10.1007/978-3-031-18697-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-18697-4_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18696-7
Online ISBN: 978-3-031-18697-4
eBook Packages: Computer ScienceComputer Science (R0)