In-Car Violence Detection Based on the Audio Signal

Santos, Flávio; Durães, Dalila; Marcondes, Francisco S.; Hammerschmidt, Niklas; Lange, Sascha; Machado, José; Novais, Paulo

doi:10.1007/978-3-030-91608-4_43

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13113))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1595 Accesses
8 Citations

Abstract

When it is intended to detect violence in the car, audio, speech processing, music, and ambient sound are some of the main points of this problem since it is necessary to find the similarities and differences between these domains. The recent increase in interest in deep learning has allowed practical applications in many areas of signal processing, often surpassing traditional signal processing on a large scale. This paper presents a comparative study of state-of-the-art deep learning architectures applied for inside car violence detection based only on the audio signal. The methodology proposed for audio signal representation was Mel-spectrogram, after an in-depth review of the literature. We build an In-Car video dataset in the experiments and apply four different deep learning architectures to solve the classification problem. The results have shown that the ResNet-18 model presents the best accuracy results on the test set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arukgoda, A.S.: Improving Sinhala-Tamil translation through deep learning techniques. Ph.D. thesis (2021)
Google Scholar
Cho, Y., Bianchi-Berthouze, N., Julier, S.J.: DeepBreath: deep learning of breathing patterns for automatic stress recognition using low-cost thermal imaging in unconstrained settings. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 456–463. IEEE (2017)
Google Scholar
Choi, K., Fazekas, G., Cho, K., Sandler, M.B.: A tutorial on deep learning for music information retrieval. CoRR abs/1709.04396 (2017). http://arxiv.org/abs/1709.04396
Crocco, M., Cristani, M., Trucco, A., Murino, V.: Audio surveillance: a systematic review. ACM Comput. Surv. (CSUR) 48(4), 1–46 (2016)
Article Google Scholar
Gaviria, J.F., et al.: Deep learning-based portable device for audio distress signal recognition in urban areas. Appl. Sci. 10(21) (2020). https://doi.org/10.3390/app10217448. https://www.mdpi.com/2076-3417/10/21/7448
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
MATH Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hossain, M.S., Muhammad, G.: Emotion recognition using deep learning approach from audio-visual emotional big data. Inf. Fusion 49, 69–78 (2019)
Article Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Panchapagesan, S., et al.: Multi-task learning and weighted cross-entropy for DNN-based keyword spotting. In: Interspeech, vol. 9, pp. 760–764 (2016)
Google Scholar
Peixoto, B., Lavi, B., Bestagini, P., Dias, Z., Rocha, A.: Multimodal violence detection in videos. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2957–2961. IEEE (2020)
Google Scholar
Purwins, H., Li, B., Virtanen, T., Schlüter, J., Chang, S.Y., Sainath, T.: Deep learning for audio signal processing. IEEE J. Sel. Top. Signal Process. 13(2), 206–219 (2019)
Article Google Scholar
Rouas, J.L., Louradour, J., Ambellouis, S.: Audio events detection in public transport vehicle. In: 2006 IEEE Intelligent Transportation Systems Conference, pp. 733–738. IEEE (2006)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Google Scholar
Souto, H., Mello, R., Furtado, A.: An acoustic scene classification approach involving domestic violence using machine learning. In: Anais do XVI Encontro Nacional de Inteligência Artificial e Computacional, pp. 705–716. SBC (2019)
Google Scholar
Uçar, A., Demir, Y., Güzeliş, C.: Object recognition and detection with deep learning for autonomous driving applications. Simulation 93(9), 759–769 (2017)
Article Google Scholar

Download references

Acknowledgments

This work is supported by: European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project n\(^{\circ }\) 039334; Funding Reference: POCI-01-0247-FEDER- 039334].

Author information

Authors and Affiliations

Algorithm Center, University of Minho, Braga, Portugal
Flávio Santos, Dalila Durães, Francisco S. Marcondes, José Machado & Paulo Novais
Bosch Car Multimedia, Braga, Portugal
Niklas Hammerschmidt & Sascha Lange

Authors

Flávio Santos
View author publications
You can also search for this author in PubMed Google Scholar
Dalila Durães
View author publications
You can also search for this author in PubMed Google Scholar
Francisco S. Marcondes
View author publications
You can also search for this author in PubMed Google Scholar
Niklas Hammerschmidt
View author publications
You can also search for this author in PubMed Google Scholar
Sascha Lange
View author publications
You can also search for this author in PubMed Google Scholar
José Machado
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Novais
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dalila Durães .

Editor information

Editors and Affiliations

University of Manchester, Manchester, UK
Hujun Yin
Universidad Politecnica de Madrid, Madrid, Spain
David Camacho
University of Birmingham, Birmingham, UK
Peter Tino
University of Manchester, Manchester, UK
Richard Allmendinger
University of Huelva, Huelva, Spain
Antonio J. Tallón-Ballesteros
Southern University of Science and Technology, Shenzhen, China
Ke Tang
Yonsei University, Seoul, Korea (Republic of)
Sung-Bae Cho
University of Minho, Braga, Portugal
Paulo Novais
NOVA University of Lisbon, Lisbon, Portugal
Susana Nascimento

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Santos, F. et al. (2021). In-Car Violence Detection Based on the Audio Signal. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2021. IDEAL 2021. Lecture Notes in Computer Science(), vol 13113. Springer, Cham. https://doi.org/10.1007/978-3-030-91608-4_43

Download citation

DOI: https://doi.org/10.1007/978-3-030-91608-4_43
Published: 23 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91607-7
Online ISBN: 978-3-030-91608-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics