Voice Emotion Recognition in Real Time Applications

Aghajani, Mahsa; Ben Abdessalem, Hamdi; Frasson, Claude

doi:10.1007/978-3-030-80421-3_53

Mahsa Aghajani¹⁰,
Hamdi Ben Abdessalem¹⁰ &
Claude Frasson¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12677))

Included in the following conference series:

International Conference on Intelligent Tutoring Systems

1624 Accesses
1 Citations

Abstract

This paper reports the results of voice emotion recognition in real time using machine learning models. The models are trained with some commonly used and well-known audio emotion datasets together with a custom dataset. This custom dataset was recorded from non-actor and non-expert people who were trying to imagine themselves in scenarios leading to arise of the related emotion. The reason for considering this important dataset is to make the model proficient in recognizing emotions in people who are not perfect in reflecting their emotions in their voices. The results from several machine learning classifiers while recognizing five emotions like anger, happiness, sadness, neutrality and surprise are compared. Models were evaluated with and without considering the custom data set to show the effect of employing an imperfect dataset. Our experiments showed that without using our custom dataset, the ensemble machine learning models such as gradient boosting, begging and random forest reach validation accuracies 89.82%, 88.58% and 84.83% respectively, which are higher than other evaluated models. After considering our custom dataset, again these ensemble methods obtained better accuracies of 87.34%, 86.71% and 82.98% respectively. This shows that although considering our custom dataset lowers the overall accuracy but empowers the model for predicting the emotions in everyday scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Petrushin, V.: Emotion in speech: Recognition and application to call centers. In: Proceedings of Artificial Neural Networks in Engineering, vol. 710, p. 22 (1999)
Google Scholar
Petrushin, V.A.: Emotion recognition in speech signal: experimental study, development, and application (2000)
Google Scholar
Busso, C., et al.: Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the 6th International Conference on Multimodal Interfaces, pp. 205–211 (2004)
Google Scholar
Lee, C.M., et al.: Emotion recognition based on phoneme classes (2004)
Google Scholar
Deng, J., Xinzhou, X., Zhang, Z., Frühholz, S., Grandjean, D., Schuller, B.: Fisher kernels on phase-based features for speech emotion recognition. In: Jokinen, K., Wilcock, G. (eds.) Dialogues with social robots, pp. 195–203. Springer Singapore, Singapore (2017). https://doi.org/10.1007/978-981-10-2585-3_15
Chapter Google Scholar
Tao, F., Liu, G., Zhao, Q.: An ensemble framework of voice-based emotion recognition system for films and TV programs. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6209–6213 (2018)
Google Scholar
Chen, H., Liu, Z., Kang, X., Nishide, S., Ren, F.: Investigating voice features for Speech emotion recognition based on four kinds of machine learning methods. In: 2019 IEEE 6th International Conference on Cloud Computing and Intelligence Systems (CCIS), pp. 195–199 (2019)
Google Scholar
Yacoub, S., Simske, S., Lin, X., Burns, J., Recognition of emotions in interactive voice response systems (2003)
Google Scholar
Shaqra, F.A., Duwairi, R., Al-Ayyoub, M.: Recognizing emotion from speech based on age and gender using hierarchic,al models. Proc. Comput. Sci. 151, 37–44 (2019)
Article Google Scholar
Livingstone, S.R., Russo, F.A.: The Ryerson Audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLOS ONE 13(5), e0196391 (2018)
Article Google Scholar
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit. 44(3), 572–587 (2011)
Article Google Scholar
Pichora-Fuller, M.K., Dupuis, K.: Toronto emotional speech set (TESS). Scholars Portal Dataverse (2020)
Google Scholar
Surrey Audio-Visual Expressed Emotion (SAVEE) Database. http://kahlan.eps.surrey.ac.uk/savee/. Accessed 5 Jan 2021
Schuller, B., et al.: The INTERSPEECH 2010 paralinguistic challenge (2010)
Google Scholar
Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM international conference on Multimedia, pp. 1459–1462 (2010)
Google Scholar
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013)
Google Scholar

Download references

Acknowledgment

We acknowledge NSERC-CRD (National Science and Engineering Research Council Cooperative Research Development), Prompt, and BMU (Beam Me Up) for funding this work.

Author information

Authors and Affiliations

Département d’Informatique et de Recherche Opérationnelle, Université de Montréal, Montréal, H3C 3J7, Canada
Mahsa Aghajani, Hamdi Ben Abdessalem & Claude Frasson

Authors

Mahsa Aghajani
View author publications
You can also search for this author in PubMed Google Scholar
Hamdi Ben Abdessalem
View author publications
You can also search for this author in PubMed Google Scholar
Claude Frasson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahsa Aghajani .

Editor information

Editors and Affiliations

Department of Computer Science, Durham University, Durham, UK
Alexandra I. Cristea
University of West Attica, Aigaleo, Greece
Christos Troussas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aghajani, M., Ben Abdessalem, H., Frasson, C. (2021). Voice Emotion Recognition in Real Time Applications. In: Cristea, A.I., Troussas, C. (eds) Intelligent Tutoring Systems. ITS 2021. Lecture Notes in Computer Science(), vol 12677. Springer, Cham. https://doi.org/10.1007/978-3-030-80421-3_53

Download citation

DOI: https://doi.org/10.1007/978-3-030-80421-3_53
Published: 09 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80420-6
Online ISBN: 978-3-030-80421-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics