An audio-based anger detection algorithm using a hybrid artificial neural network and fuzzy logic model

Surana, Arihant; Rathod, Manish; Gite, Shilpa; Patil, Shruti; Kotecha, Ketan; Selvachandran, Ganeshsree; Quek, Shio Gai; Abraham, Ajith

doi:10.1007/s11042-023-16815-7

An audio-based anger detection algorithm using a hybrid artificial neural network and fuzzy logic model

Published: 06 October 2023

Volume 83, pages 38909–38929, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Arihant Surana¹,
Manish Rathod²,
Shilpa Gite²,
Shruti Patil²,
Ketan Kotecha^1,2,
Ganeshsree Selvachandran ORCID: orcid.org/0000-0001-7161-2109^1,3,
Shio Gai Quek⁴ &
…
Ajith Abraham⁵

129 Accesses
Explore all metrics

Abstract

Audio Emotion Recognition (AER) is an important factor for Human Emotion Analysis with or without any visual aiding components. Such audio has different modular parameters, such as rhythm, tone, and pitch. However, emotions are highly complex, and the way they get delivered to human ears with preconceived emotions are then instantly understood by humans, and this is something that has been perfected after thousands of years of human evolution. Artificial intelligence (AI) enabled AER has captured worldwide attention in the last couple of years and has gained increasing importance amongst AI researchers in various fields. It has become increasingly important in recent years, especially after the start of the Covid-19 pandemic that has resulted in work from home, online schooling, and online learning on a mass scale due to large-scale lockdowns and movement control orders around the world. The audio quality on online platforms differs from device to device and is dependent on the quality or the bandwidth of the Internet connection used in such applications. Therefore, as the world is recovering from the Covid-19 pandemic, an algorithm for anger detection proves necessary in maintaining public security and general safety and can also help in the early detection of mental health issues or anger management issues. This is because the presence of an angry person in public can pose a threat to the people around and may also impose a risk of damage to public property. As a result, detecting the presence of anger emotion through voices in all public places proves to be the first line of defense against any outbreaks of public nuisance or even violent crimes. Moreover, the more prominent the anger emotion of a person, the more amount of attention must be given to the person by the public security forces. This study uses a collection of audio files from the CREMA-D dataset as the input, where a collection of 364 audio files from 91 actors, each with three degrees of showing anger and a neutral emotion were used. All audio files in this collection use the script “It’s eleven o’clock”. A hybrid algorithm of artificial neural network (ANN) and fuzzy logic, along with a dedicated preprocessing technique specifically for handling audio files were introduced. A comprehensive discussion and analysis of the results was presented in which the proposed algorithm was compared with all the other audio classification algorithms that exist in literature, many of which merely deployed a readily made general purpose neural network-based algorithm. This brute force method of relying on an overly complicated computational structure proves too low in efficiency as the number of nodes involved in the computational process far surpasses the number of preprocessed inputs. On top of this, descriptions about preprocessing procedures for audio classification among all recent works are found to be unclear. Finally, the limitations and suggestions for improvements of the experimental setup, and the potential applications of the findings are also discussed and analyzed in the conclusion of this study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comparative Study on Speech Emotion Recognition Using Machine Learning

Speech Emotion Recognition Using Machine Learning: A Comparative Analysis

Article 04 April 2024

Feature extraction model for speech emotion detection with prodigious precedence assortment model using fuzzy-based convolution neural networks

Article 08 June 2023

Data Availability

This study uses the publicly available CREMA-D dataset which can be accessed on Github at https://github.com/CheyneyComputerScience/CREMA-D.

References

Yaffe P (2011) The 7% rule: fact, fiction, or misunderstanding. Ubiquity 2011:1. https://doi.org/10.1145/2043155.2043156
Article Google Scholar
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286. https://doi.org/10.1109/5.18626
Article Google Scholar
Nwe TL, Foo SW, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41(4):603–623
Article Google Scholar
Abbaschian BJ, Sierra-Sosa D, Elmaghraby A (2021) Deep learning techniques for speech emotion recognition, from databases to models. Sensors 21(4):1249
Article Google Scholar
Swain M, Routray A, Kabisatpathy P (2018) Databases, features and classifiers for speech emotion recognition: a review. Int J Speech Technol 21:93–120
Article Google Scholar
Voelkel S, Mello LV (2014) Audio feedback – Better feedback? Bioscience Education 22(1):16–30
Article Google Scholar
Akçay MB, Oğuz K (2020) Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun 116:56–76
Article Google Scholar
El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recogn 44(3):572–587
Article Google Scholar
Koolagudi SG, Rao KS (2012) Emotion recognition from speech using source, system, and prosodic features. Int J Speech Technol 15(2):265–289
Article Google Scholar
Chen L, Mao X, Xue Y, Cheng LL (2012) Speech emotion recognition: Features and classification models. Digit Signal Process 22(6):1154–1160
Article MathSciNet Google Scholar
Langari S, Marvi H, Zahedi M (2020) Efficient speech emotion recognition using modified feature extraction. Inf Med Unlocked 20:100424
Article Google Scholar
Mao Q, Dong M, Huang Z, Zhan Y (2014) Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans Multimed 16(8):2203–2213
Article Google Scholar
Chen M, He X, Yang J, Zhang H (2018) 3-D convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Process Lett 25(10):1440–1444
Article Google Scholar
Yeh JH, Pao TL, Lin CY, Tsai YW, Chen YT (2011) Segment-based emotion recognition from continuous Mandarin Chinese speech. Comput Hum Behav 27(5):1545–1552
Article Google Scholar
Ooi CS, Seng KP, Ang L, Chew LW (2014) A new approach of audio emotion recognition. Expert Syst Appl 41(13):5858–5869
Article Google Scholar
Demircan S, Kahramanlı H (2014) Feature extraction from speech data for emotion recognition. J Adv Comput Netw 2(1):28–30
Article Google Scholar
Lee CM, Narayanan SS (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303
Article Google Scholar
Neiberg, D, Elenius, K, Laskowski, K (2006) Emotion recognition in spontaneous speech using GMMs. Proceedings of the Ninth International Conference on Spoken Language Processing (INTERSPEECH 2006 – ICSLP), 809–812. https://doi.org/10.21437/Interspeech.2006-277
Cao H, Verma R, Nenkova A (2015) Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech. Comput Speech Lang 29(1):186–202
Article Google Scholar
Albornoz EM, Milone DH, Rufiner HL (2011) Spoken emotion recognition using hierarchical classifiers. Comput Speech Lang 25(3):556–570
Article Google Scholar
Nikopoulou, R, Vernikos, I, Spyrou, E, Mylonas, P (2018) Emotion recognition from speech: A classroom experiment. Proceedings of the 11th PErvasive Technologies Related to Assistive Environments Conference (PETRA '18), 104–105, Corfu, Greece. https://doi.org/10.1145/3197768.3197782
Livingstone SR, Russo FA (2018) The Ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5):e0196391
Article Google Scholar
Cao H, Cooper DG, Keutmann MK, Gur RC, Nenkova A, Verma R (2014) CREMA-D: Crowd-sourced emotional multimodal actors dataset. IEEE Trans Affect Comput 5(4):377–390
Article Google Scholar
Lee W, Son G (2023) Investigation of human state classification via EEG signals elicited by emotional audio-visual stimulation. Multimed Tools Appl. https://doi.org/10.1007/s11042-023-16294-w
Article Google Scholar
Kumar S, Gupta SK, Kumar V, Kumar M, Chaube MK, Naik NS (2022) Ensemble multimodal deep learning for early diagnosis and accurate classification of COVID-19. Comput Electr Eng 103:108396
Article Google Scholar
Kumar S, Chaube MK, Alsamhi SH, Gupta SK, Guizani M, Gravina R, Fortino G (2022) A novel multimodal fusion framework for early diagnosis and accurate classification of COVID-19 patients using X-ray images and speech signal processing techniques. Comput Methods Programs Biomed 226:107109
Article Google Scholar
Koutini K, Zadeh HE, Widmer G (2021) Receptive field regularization techniques for audio classification and tagging with deep convolutional neural networks. IEEE/ACM Trans Audio, Speech, Lang Process 29:1987–2000
Article Google Scholar
Schoneveld L, Othmani A, Abdelkawy H (2021) Leveraging recent advances in deep learning for audio-Visual emotion recognition. Pattern Recogn Lett 146:1–7
Article Google Scholar
Nemani P, Krishna GS, Sai BDS, Kumar S (2022) Deep learning based holistic speaker independent visual speech recognition. IEEE Trans Artif Intell. https://doi.org/10.1109/TAI.2022.3220190
Article Google Scholar
Tian J, She Y (2022) A visual-audio-based emotion recognition system integrating dimensional analysis. IEEE Trans Comput Soc Syst. https://doi.org/10.1109/TCSS.2022.3200060
Article Google Scholar
Khurana Y, Gupta S, Sathyaraj R, Raja SP (2022) A multimodal speech emotion recognition system with speaker recognition for social interactions. IEEE Trans Comput Soc Syst. https://doi.org/10.1109/TCSS.2022.3228649
Article Google Scholar
Kumar, S, Jaiswal, S, Kumar, R, Singh, SK (2018) Emotion recognition using facial expression. In R. Pal (Ed.), Innovative Research in Attention Modeling and Computer Vision Applications (pp. 327–345). IGI Global. https://doi.org/10.4018/978-1-4666-8723-3.ch013
Nandini D, Yadav J, Rani A, Singh V (2023) Design of subject independent 3D VAD emotion detection system using EEG signals and machine learning algorithms. Biomed Signal Process Control 85:104894
Article Google Scholar
Chauhan K, Sharma KK, Varma T (2023) Improved Speech emotion recognition using channel-wise global head pooling (CwGHP). Circ Syst Signal Process 42:5500–5522
Article Google Scholar
Mocanu B, Tapu R, Zaharia T (2023) Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning. Image Vis Comput 133:104676
Article Google Scholar
Min C, Lin H, Li X, Zhao H, Lu J, Yang L, Xu B (2023) Finding hate speech with auxiliary emotion detection from self-training multi-label learning perspective. Inf Fus 96:214–223
Article Google Scholar
Li Y, Kazemeini A, Mehta Y, Cambria E (2022) Multitask learning for emotion and personality traits detection. Neurocomputing 493:340–350
Article Google Scholar
Pradhan A, Srivastava S (2023) Hierarchical extreme puzzle learning machine-based emotion recognition using multimodal physiological signals. Biomed Signal Process Control 83:104624
Article Google Scholar
Ahmed N, Angbari ZA, Girijia S (2023) A systematic survey on multimodal emotion recognition using learning algorithms. Intell Syst Appl 17:200171
Google Scholar
Firdaus M, Singh GV, Ekbal A, Bhattacharyya P (2023) Affect-GCN: a multimodal graph convolutional network for multi-emotion with intensity recognition and sentiment analysis in dialogues. Multimed Tools Appl. https://doi.org/10.1007/s11042-023-14885-1
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the Editor-in-Chief, Editor(s), and the anonymous reviewers for their valuable comments and suggestions which has helped to improve the quality and clarity of the paper.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Symbiosis International (Deemed University), Symbiosis Institute of Technology, Pune, MH, India, 412115
Arihant Surana, Ketan Kotecha & Ganeshsree Selvachandran
Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed University), Pune, MH, India, 412115
Manish Rathod, Shilpa Gite, Shruti Patil & Ketan Kotecha
School of Business, Monash University Malaysia, Jalan Lagoon Selatan, Bandar Sunway, 47500, Subang Jaya, Selangor, Malaysia
Ganeshsree Selvachandran
Institute of Actuarial Science and Data Analytics, UCSI University, Jalan Menara GadingCheras, 56000, Kuala Lumpur, Malaysia
Shio Gai Quek
School of Computer Science Engineering & Technology, Bennett University, Greater Noida, Uttar Pradesh, 201310, India
Ajith Abraham

Authors

Arihant Surana
View author publications
You can also search for this author in PubMed Google Scholar
Manish Rathod
View author publications
You can also search for this author in PubMed Google Scholar
Shilpa Gite
View author publications
You can also search for this author in PubMed Google Scholar
Shruti Patil
View author publications
You can also search for this author in PubMed Google Scholar
Ketan Kotecha
View author publications
You can also search for this author in PubMed Google Scholar
Ganeshsree Selvachandran
View author publications
You can also search for this author in PubMed Google Scholar
Shio Gai Quek
View author publications
You can also search for this author in PubMed Google Scholar
Ajith Abraham
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the conception and design of the study. Material preparation, data collection, data visualization and data analysis were performed by Arihant Surana, Manish Rathod, Shilpa Gite and Shruti Patil. Advanced data analysis and validation were done by Ketan Kotecha, Ganeshsree Selvachandran, Shio Gai Quek and Ajith Abraham. The first draft of the manuscript was written by Arihant Surana and Manish Rathod, while the second draft of the manuscript was written by Shilpa Gite, Shruti Patil and Ketan Kotecha. This project was supervised and administered by Shilpa Gite, Shruti Patil and Ketan Kotecha. The final draft and the revised manuscript were prepared and edited by Shio Gai Quek, Ganeshsree Selvachandran and Ajith Abraham. All authors commented on previous versions of the manuscript. All authors have read and approved the final manuscript.

Corresponding authors

Correspondence to Shilpa Gite, Ketan Kotecha or Ganeshsree Selvachandran.

Ethics declarations

Ethical Compliance

(i). Authors’ declaration: This manuscript is the authors' original work and has not been published elsewhere. All authors have checked the manuscript and have agreed to this submission.

(ii). Ethical approval: This article does not contain any studies with human participants or animals performed by any of the authors.

Competing Interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Surana, A., Rathod, M., Gite, S. et al. An audio-based anger detection algorithm using a hybrid artificial neural network and fuzzy logic model. Multimed Tools Appl 83, 38909–38929 (2024). https://doi.org/10.1007/s11042-023-16815-7

Download citation

Received: 11 April 2023
Revised: 04 August 2023
Accepted: 31 August 2023
Published: 06 October 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11042-023-16815-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An audio-based anger detection algorithm using a hybrid artificial neural network and fuzzy logic model

Abstract

Access this article

Similar content being viewed by others

A Comparative Study on Speech Emotion Recognition Using Machine Learning

Speech Emotion Recognition Using Machine Learning: A Comparative Analysis

Feature extraction model for speech emotion detection with prodigious precedence assortment model using fuzzy-based convolution neural networks

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethical Compliance

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An audio-based anger detection algorithm using a hybrid artificial neural network and fuzzy logic model

Abstract

Access this article

Similar content being viewed by others

A Comparative Study on Speech Emotion Recognition Using Machine Learning

Speech Emotion Recognition Using Machine Learning: A Comparative Analysis

Feature extraction model for speech emotion detection with prodigious precedence assortment model using fuzzy-based convolution neural networks

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethical Compliance

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation