Abstract
Automatic Emotion Speech Recognition (ESR) is considered as an active research field in the Human-Computer Interface (HCI). Typically, the ESR system is consisting of two main parts: Front-End (features extraction) and Back-End (classification). However, most previous ESR systems have been focused on the features extraction part only and ignored the classification part. Whilst the classification process is considered an essential part in ESR systems, where its role is to map out the extracted features from audio samples to determine its corresponding emotion. Moreover, the evaluation of most ESR systems has been conducted based on Subject Independent (SI) scenario only. Therefore, in this paper, we are focusing on the Back-End (classification), where we have adopted our recent developed Extreme Learning Machine (ELM), called Optimized Genetic Algorithm-Extreme Learning Machine (OGA-ELM). In addition, we used the Mel Frequency Cepstral Coefficients (MFCC) method in order to extract the features from the speech utterances. This work proves the significance of the classification part in ESR systems, where it improves the ESR performance in terms of achieving higher accuracy. The performance of the proposed model was evaluated based on Berlin Emotional Speech (BES) dataset which consists of 7 emotions (neutral, happiness, boredom, anxiety, sadness, anger, and disgust). Four different evaluation scenarios have been conducted such as Subject Dependent (SD), SI, Gender Dependent Female (GD-Female), and Gender Dependent Male (GD-Male). The highest performance of the OGA-ELM was very impressive in the four different scenarios and achieved an accuracy of 93.26%, 100.00%, 96.14% and 97.10% for SI, SD, GD-Male, and GD-Female scenarios, respectively. Besides, the proposed ESR system has shown a fast execution time in all experiments to identify the emotions.
Similar content being viewed by others
References
Albadr MA, Tiun S, Ayob M, al-Dhief F (2020) Genetic algorithm based on natural selection theory for optimization problems. Symmetry 12(11):1758
Albadr MAA, Tiun S (2020) Spoken language identification based on particle swarm optimisation–extreme learning machine approach. Circ Syst Signal Process 1–27
Albadr MAA, Tiun S, al-Dhief FT, Sammour MAM (2018) Spoken language identification based on the enhanced self-adjusting extreme learning machine approach. PLoS One 13(4):e0194770
Albadr MAA, Tiun S, Ayob M, al-Dhief FT (2019) Spoken language identification based on optimised genetic algorithm–extreme learning machine approach. Int J Speech Technol 22(3):711–727
Albadr MAA, Tiun S, Ayob M, al-Dhief FT, Omar K, Hamzah FA (2020) Optimised genetic algorithm-extreme learning machine approach for automatic COVID-19 detection. PLoS One 15(12):e0242899
Albadra MAA, Tiuna S (2017) Extreme learning machine: a review. Int J Appl Eng Res 12(14):4610–4623
Al-Dhief FT et al (2020) A survey of voice pathology surveillance systems based on internet of things and machine learning algorithms. IEEE Access 8:64514–64533
Al-Dhief FT et al (2020) Voice pathology detection using machine learning technique. In 2020 IEEE 5th international symposium on telecommunication technologies (ISTT). IEEE
Alonso JB, Cabrera J, Medina M, Travieso CM (2015) New approach in quantification of emotional intensity from the speech signal: emotional temperature. Expert Syst Appl 42(24):9554–9564
Badshah AM et al (2017) Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 international conference on platform technology and service (PlatCon). IEEE
Baroi OL et al (2019) Effects of different environmental noises and sampling frequencies on the performance of MFCC and PLP based Bangla isolated word recognition system. In: 2019 1st international conference on advances in Science, engineering and robotics technology (ICASERT). IEEE
Basu S et al (2017) A review on emotion recognition using speech. In: 2017 international conference on inventive communication and computational technologies (ICICCT) IEEE
Bi W, Xu Y, Wang H (2020) Comparison of searching behaviour of three evolutionary algorithms applied to water distribution system design optimization. Water 12(3):695
Burkhardt F et al (2005) A database of German emotional speech. In: Ninth European Conference on Speech Communication and Technology
Calvo RA, D'Mello S (2010) Affect detection: an interdisciplinary review of models, methods, and their applications. IEEE Trans Affect Comput 1(1):18–37
Cao H, Verma R, Nenkova A (2015) Speaker-sensitive emotion recognition via ranking: studies on acted and spontaneous speech. Comput Speech Lang 29(1):186–202
Chavhan Y, Dhore M, Yesaware P (2010) Speech emotion recognition using support vector machine. Int J Comput Appl 1(20):6–9
Choudhury AR et al (2018) Emotion recognition from speech signals using excitation source and spectral features. In: 2018 IEEE applied signal processing conference (ASPCON). IEEE
Dendukuri LS, Hussain SJ (2019) Statistical feature set calculation using Teager energy operator on emotional speech signals. In: 2019 international conference on wireless communications signal processing and networking (WiSPNET). IEEE
Deng C, Huang GB, Xu J, Tang JX (2015) Extreme learning machines: new trends and applications. Science China Inf Sci 58(2):1–16
Dogra A, Kaul A, Sharma R (2019) Automatic recognition of dialects of Himachal Pradesh using MFCC &GMM. In: 2019 5th international conference on signal processing, computing and control (ISPCC). IEEE
El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn 44(3):572–587
Fortin F-A et al (2012) DEAP: evolutionary algorithms made easy. J Mach Learn Res 13(1):2171–2175
Gangamohan P, Kadiri SR, Yegnanarayana B (2016) Analysis of emotional speech—A review, in Toward Robotic Socially Believable Behaving Systems-Volume I, Springer, p. 205–238
Ghasemi J, Esmaily J, Moradinezhad R (2020) Intrusion detection system using an optimized kernel extreme learning machine and efficient features. Sādhanā 45(1):1–9
Gogna A, Tayal A (2012) Comparative analysis of evolutionary algorithms for image enhancement. Int J Met 2(1):80–100
Guo L, Wang L, Dang J, Liu Z, Guan H (2019) Exploration of complementary features for speech emotion recognition based on kernel extreme learning machine. IEEE Access 7:75798–75809
Han W et al (2006) An efficient MFCC extraction method in speech recognition. In: 2006 IEEE international symposium on circuits and systems. IEEE
Huang G-B, Chen L, Siew CK (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892
Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Huang G-B et al (2011) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst, Man Cybern, Part B (Cybernetics) 42(2):513–529
Huang G-B et al (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Systems, Man, Cybernetics, Part B (Cybernetics) 42(2)513–529
Jain M et al (2020) Speech emotion recognition using support vector machine. arXiv preprint arXiv:2002.07590
Juvela L et al (2018) Speech waveform synthesis from MFCC sequences with generative adversarial networks. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE
Kaya H, Karpov AA (2018) Efficient and effective strategies for cross-corpus acoustic emotion recognition. Neurocomputing 275:1028–1034
Kaya H, Karpov AA, Salah AA (2016) Robust acoustic emotion recognition based on cascaded normalization and extreme learning machines. In: international symposium on neural networks. Springer, 2016
Kostoulas T, Mporas I, Kocsis O, Ganchev T, Katsaounos N, Santamaria JJ, Jimenez-Murcia S, Fernandez-Aranda F, Fakotakis N (2012) Affective speech interface in serious games for supporting therapy of mental disorders. Expert Syst Appl 39(12):11072–11079
Kuchibhotla S, Vankayalapati HD, Anne KR (2016) An optimal two stage feature selection for speech emotion recognition using acoustic features. Int J Speech Technol 19(4):657–667
Lopez-de-Ipiña K et al (2015) On automatic diagnosis of Alzheimer’s disease based on spontaneous speech analysis and emotional temperature. Cogn Comput 7(1):44–55
Mar LL, Pa WP (2019) Depression detection from speech emotion recognition. Seventeenth International Conference on Computer Applications (ICCA 2019)
Muda L, Begam M, Elamvazuthi I (2010) Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:1003.4083
Murugappan M et al (2020) Emotion classification in Parkinson's disease EEG using RQA and ELM. In: 2020 16th IEEE international colloquium on Signal Processing & its Applications (CSPA). IEEE
Neiberg D, Elenius K (2008) Automatic recognition of anger in spontaneous speech. In: Ninth Annual Conference of the International Speech Communication Association
Özseven T (2019) A novel feature selection method for speech emotion recognition. Appl Acoust 146:320–326
Pakyurek M, Atmis M, Kulac S, Uludag U (2020) Extraction of novel features based on histograms of MFCCs used in emotion classification from generated original speech dataset. Elektronika ir Elektrotechnika 26(1):46–51
Petrushin VA (2000) Emotion recognition in speech signal: experimental study, development, and application. In: Sixth International Conference on Spoken Language Processing
Poorna S, Nair G (2019) Multistage classification scheme to enhance speech emotion recognition. Int J Speech Technol 22(2):327–340
Renanti MD, Buono A, Kusuma WA (2013) Infant cries identification by using codebook as feature matching, and mfcc as feature extraction. J Theoretical Appl Inform Technol 56(3)
Shah AF and Anto PB (2017) Hybrid spectral features for speech emotion recognition. In: 2017 international conference on innovations in information, embedded and communication systems (ICIIECS). IEEE
Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In Australasian joint conference on artificial intelligence. 2006. Springer
Trang H, Loc TH, Nam HBH (2014) Proposed combination of PCA and MFCC feature extraction in speech recognition system. In: 2014 International Conference on Advanced Technologies for Communications (ATC 2014). IEEE
Tripathi A, Singh U, Bansal G, Gupta R, Singh AK (2020) A review on emotion detection and classification using speech. Available at SSRN 3601803
Tzinis E, Potamianos A (2017) Segment-based speech emotion recognition using recurrent neural networks. In: 2017 seventh international conference on affective computing and intelligent interaction (ACII). IEEE
van Heeswijk M (2015) Advances in extreme learning machines
Wang K et al (2015) Speech emotion recognition using Fourier parameters. IEEE Trans Affect Comput 6(1):69–75
Wang Y, Cao F, Yuan Y (2011) A study on effectiveness of extreme learning machine. Neurocomputing 74(16):2483–2490
Wilhelmstötter F (2021) Jenetics Library User’s Manual 6.2. [Online]. Available: https://jenetics.io
Yogesh C et al (2017) A new hybrid PSO assisted biogeography-based optimization for emotion and stress recognition from speech signal. Expert Syst Appl 69:149–158
Yu F et al (2016) Improved roulette wheel selection-based genetic algorithm for TSP. In: 2016 international conference on network and information Systems for Computers (ICNISC), IEEE
Zaidan NA, Salam MS (2016) MFCC global features selection in improving speech emotion recognition rate. In: Advances in machine learning and signal processing. Springer, p. 141–153
Zhang X, Sun J, Luo Z (2014) One-against-all weighted dynamic time warping for language-independent and speaker-dependent speech recognition in adverse conditions. PLoS One 9(2):e85458
Zhao S et al (2014) Automatic detection of expressed emotion in Parkinson's disease. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE
Acknowledgements
This project was funded by the Universiti Kebangsaan Malaysia under Dana Impak Perdana grant (Research code: GUP-2020-063).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Albadr, M.A.A., Tiun, S., Ayob, M. et al. Speech emotion recognition using optimized genetic algorithm-extreme learning machine. Multimed Tools Appl 81, 23963–23989 (2022). https://doi.org/10.1007/s11042-022-12747-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12747-w