Speech emotion recognition using optimized genetic algorithm-extreme learning machine

Albadr, Musatafa Abbas Abbood; Tiun, Sabrina; Ayob, Masri; AL-Dhief, Fahad Taha; Omar, Khairuddin; Maen, Mhd Khaled

doi:10.1007/s11042-022-12747-w

Speech emotion recognition using optimized genetic algorithm-extreme learning machine

Published: 19 March 2022

Volume 81, pages 23963–23989, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Musatafa Abbas Abbood Albadr ORCID: orcid.org/0000-0003-2062-688X¹,
Sabrina Tiun¹,
Masri Ayob¹,
Fahad Taha AL-Dhief²,
Khairuddin Omar¹ &
…
Mhd Khaled Maen³

419 Accesses
17 Citations
2 Altmetric
Explore all metrics

Abstract

Automatic Emotion Speech Recognition (ESR) is considered as an active research field in the Human-Computer Interface (HCI). Typically, the ESR system is consisting of two main parts: Front-End (features extraction) and Back-End (classification). However, most previous ESR systems have been focused on the features extraction part only and ignored the classification part. Whilst the classification process is considered an essential part in ESR systems, where its role is to map out the extracted features from audio samples to determine its corresponding emotion. Moreover, the evaluation of most ESR systems has been conducted based on Subject Independent (SI) scenario only. Therefore, in this paper, we are focusing on the Back-End (classification), where we have adopted our recent developed Extreme Learning Machine (ELM), called Optimized Genetic Algorithm-Extreme Learning Machine (OGA-ELM). In addition, we used the Mel Frequency Cepstral Coefficients (MFCC) method in order to extract the features from the speech utterances. This work proves the significance of the classification part in ESR systems, where it improves the ESR performance in terms of achieving higher accuracy. The performance of the proposed model was evaluated based on Berlin Emotional Speech (BES) dataset which consists of 7 emotions (neutral, happiness, boredom, anxiety, sadness, anger, and disgust). Four different evaluation scenarios have been conducted such as Subject Dependent (SD), SI, Gender Dependent Female (GD-Female), and Gender Dependent Male (GD-Male). The highest performance of the OGA-ELM was very impressive in the four different scenarios and achieved an accuracy of 93.26%, 100.00%, 96.14% and 97.10% for SI, SD, GD-Male, and GD-Female scenarios, respectively. Besides, the proposed ESR system has shown a fast execution time in all experiments to identify the emotions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simple Meta-optimization of the Feature MFCC for Public Emotional Datasets Classification

Recognition of emotion from speech using evolutionary cepstral coefficients

Article 18 September 2020

Improving Speech Emotion Recognition System Using Spectral and Prosodic Features

References

Albadr MA, Tiun S, Ayob M, al-Dhief F (2020) Genetic algorithm based on natural selection theory for optimization problems. Symmetry 12(11):1758
Article Google Scholar
Albadr MAA, Tiun S (2020) Spoken language identification based on particle swarm optimisation–extreme learning machine approach. Circ Syst Signal Process 1–27
Albadr MAA, Tiun S, al-Dhief FT, Sammour MAM (2018) Spoken language identification based on the enhanced self-adjusting extreme learning machine approach. PLoS One 13(4):e0194770
Article Google Scholar
Albadr MAA, Tiun S, Ayob M, al-Dhief FT (2019) Spoken language identification based on optimised genetic algorithm–extreme learning machine approach. Int J Speech Technol 22(3):711–727
Article Google Scholar
Albadr MAA, Tiun S, Ayob M, al-Dhief FT, Omar K, Hamzah FA (2020) Optimised genetic algorithm-extreme learning machine approach for automatic COVID-19 detection. PLoS One 15(12):e0242899
Article Google Scholar
Albadra MAA, Tiuna S (2017) Extreme learning machine: a review. Int J Appl Eng Res 12(14):4610–4623
Google Scholar
Al-Dhief FT et al (2020) A survey of voice pathology surveillance systems based on internet of things and machine learning algorithms. IEEE Access 8:64514–64533
Article Google Scholar
Al-Dhief FT et al (2020) Voice pathology detection using machine learning technique. In 2020 IEEE 5th international symposium on telecommunication technologies (ISTT). IEEE
Alonso JB, Cabrera J, Medina M, Travieso CM (2015) New approach in quantification of emotional intensity from the speech signal: emotional temperature. Expert Syst Appl 42(24):9554–9564
Article Google Scholar
Badshah AM et al (2017) Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 international conference on platform technology and service (PlatCon). IEEE
Baroi OL et al (2019) Effects of different environmental noises and sampling frequencies on the performance of MFCC and PLP based Bangla isolated word recognition system. In: 2019 1st international conference on advances in Science, engineering and robotics technology (ICASERT). IEEE
Basu S et al (2017) A review on emotion recognition using speech. In: 2017 international conference on inventive communication and computational technologies (ICICCT) IEEE
Bi W, Xu Y, Wang H (2020) Comparison of searching behaviour of three evolutionary algorithms applied to water distribution system design optimization. Water 12(3):695
Article Google Scholar
Burkhardt F et al (2005) A database of German emotional speech. In: Ninth European Conference on Speech Communication and Technology
Calvo RA, D'Mello S (2010) Affect detection: an interdisciplinary review of models, methods, and their applications. IEEE Trans Affect Comput 1(1):18–37
Article Google Scholar
Cao H, Verma R, Nenkova A (2015) Speaker-sensitive emotion recognition via ranking: studies on acted and spontaneous speech. Comput Speech Lang 29(1):186–202
Article Google Scholar
Chavhan Y, Dhore M, Yesaware P (2010) Speech emotion recognition using support vector machine. Int J Comput Appl 1(20):6–9
Google Scholar
Choudhury AR et al (2018) Emotion recognition from speech signals using excitation source and spectral features. In: 2018 IEEE applied signal processing conference (ASPCON). IEEE
Dendukuri LS, Hussain SJ (2019) Statistical feature set calculation using Teager energy operator on emotional speech signals. In: 2019 international conference on wireless communications signal processing and networking (WiSPNET). IEEE
Deng C, Huang GB, Xu J, Tang JX (2015) Extreme learning machines: new trends and applications. Science China Inf Sci 58(2):1–16
Article Google Scholar
Dogra A, Kaul A, Sharma R (2019) Automatic recognition of dialects of Himachal Pradesh using MFCC &GMM. In: 2019 5th international conference on signal processing, computing and control (ISPCC). IEEE
El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn 44(3):572–587
Article Google Scholar
Fortin F-A et al (2012) DEAP: evolutionary algorithms made easy. J Mach Learn Res 13(1):2171–2175
MathSciNet Google Scholar
Gangamohan P, Kadiri SR, Yegnanarayana B (2016) Analysis of emotional speech—A review, in Toward Robotic Socially Believable Behaving Systems-Volume I, Springer, p. 205–238
Ghasemi J, Esmaily J, Moradinezhad R (2020) Intrusion detection system using an optimized kernel extreme learning machine and efficient features. Sādhanā 45(1):1–9
Article Google Scholar
Gogna A, Tayal A (2012) Comparative analysis of evolutionary algorithms for image enhancement. Int J Met 2(1):80–100
Google Scholar
Guo L, Wang L, Dang J, Liu Z, Guan H (2019) Exploration of complementary features for speech emotion recognition based on kernel extreme learning machine. IEEE Access 7:75798–75809
Article Google Scholar
Han W et al (2006) An efficient MFCC extraction method in speech recognition. In: 2006 IEEE international symposium on circuits and systems. IEEE
Huang G-B, Chen L, Siew CK (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892
Article Google Scholar
Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Article Google Scholar
Huang G-B et al (2011) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst, Man Cybern, Part B (Cybernetics) 42(2):513–529
Article Google Scholar
Huang G-B et al (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Systems, Man, Cybernetics, Part B (Cybernetics) 42(2)513–529
Jain M et al (2020) Speech emotion recognition using support vector machine. arXiv preprint arXiv:2002.07590
Juvela L et al (2018) Speech waveform synthesis from MFCC sequences with generative adversarial networks. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE
Kaya H, Karpov AA (2018) Efficient and effective strategies for cross-corpus acoustic emotion recognition. Neurocomputing 275:1028–1034
Article Google Scholar
Kaya H, Karpov AA, Salah AA (2016) Robust acoustic emotion recognition based on cascaded normalization and extreme learning machines. In: international symposium on neural networks. Springer, 2016
Kostoulas T, Mporas I, Kocsis O, Ganchev T, Katsaounos N, Santamaria JJ, Jimenez-Murcia S, Fernandez-Aranda F, Fakotakis N (2012) Affective speech interface in serious games for supporting therapy of mental disorders. Expert Syst Appl 39(12):11072–11079
Article Google Scholar
Kuchibhotla S, Vankayalapati HD, Anne KR (2016) An optimal two stage feature selection for speech emotion recognition using acoustic features. Int J Speech Technol 19(4):657–667
Article Google Scholar
Lopez-de-Ipiña K et al (2015) On automatic diagnosis of Alzheimer’s disease based on spontaneous speech analysis and emotional temperature. Cogn Comput 7(1):44–55
Article Google Scholar
Mar LL, Pa WP (2019) Depression detection from speech emotion recognition. Seventeenth International Conference on Computer Applications (ICCA 2019)
Muda L, Begam M, Elamvazuthi I (2010) Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:1003.4083
Murugappan M et al (2020) Emotion classification in Parkinson's disease EEG using RQA and ELM. In: 2020 16th IEEE international colloquium on Signal Processing & its Applications (CSPA). IEEE
Neiberg D, Elenius K (2008) Automatic recognition of anger in spontaneous speech. In: Ninth Annual Conference of the International Speech Communication Association
Özseven T (2019) A novel feature selection method for speech emotion recognition. Appl Acoust 146:320–326
Article Google Scholar
Pakyurek M, Atmis M, Kulac S, Uludag U (2020) Extraction of novel features based on histograms of MFCCs used in emotion classification from generated original speech dataset. Elektronika ir Elektrotechnika 26(1):46–51
Article Google Scholar
Petrushin VA (2000) Emotion recognition in speech signal: experimental study, development, and application. In: Sixth International Conference on Spoken Language Processing
Poorna S, Nair G (2019) Multistage classification scheme to enhance speech emotion recognition. Int J Speech Technol 22(2):327–340
Article Google Scholar
Renanti MD, Buono A, Kusuma WA (2013) Infant cries identification by using codebook as feature matching, and mfcc as feature extraction. J Theoretical Appl Inform Technol 56(3)
Shah AF and Anto PB (2017) Hybrid spectral features for speech emotion recognition. In: 2017 international conference on innovations in information, embedded and communication systems (ICIIECS). IEEE
Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In Australasian joint conference on artificial intelligence. 2006. Springer
Trang H, Loc TH, Nam HBH (2014) Proposed combination of PCA and MFCC feature extraction in speech recognition system. In: 2014 International Conference on Advanced Technologies for Communications (ATC 2014). IEEE
Tripathi A, Singh U, Bansal G, Gupta R, Singh AK (2020) A review on emotion detection and classification using speech. Available at SSRN 3601803
Tzinis E, Potamianos A (2017) Segment-based speech emotion recognition using recurrent neural networks. In: 2017 seventh international conference on affective computing and intelligent interaction (ACII). IEEE
van Heeswijk M (2015) Advances in extreme learning machines
Wang K et al (2015) Speech emotion recognition using Fourier parameters. IEEE Trans Affect Comput 6(1):69–75
Article Google Scholar
Wang Y, Cao F, Yuan Y (2011) A study on effectiveness of extreme learning machine. Neurocomputing 74(16):2483–2490
Article Google Scholar
Wilhelmstötter F (2021) Jenetics Library User’s Manual 6.2. [Online]. Available: https://jenetics.io
Yogesh C et al (2017) A new hybrid PSO assisted biogeography-based optimization for emotion and stress recognition from speech signal. Expert Syst Appl 69:149–158
Article Google Scholar
Yu F et al (2016) Improved roulette wheel selection-based genetic algorithm for TSP. In: 2016 international conference on network and information Systems for Computers (ICNISC), IEEE
Zaidan NA, Salam MS (2016) MFCC global features selection in improving speech emotion recognition rate. In: Advances in machine learning and signal processing. Springer, p. 141–153
Zhang X, Sun J, Luo Z (2014) One-against-all weighted dynamic time warping for language-independent and speaker-dependent speech recognition in adverse conditions. PLoS One 9(2):e85458
Article Google Scholar
Zhao S et al (2014) Automatic detection of expressed emotion in Parkinson's disease. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE

Download references

Acknowledgements

This project was funded by the Universiti Kebangsaan Malaysia under Dana Impak Perdana grant (Research code: GUP-2020-063).

Author information

Authors and Affiliations

CAIT, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
Musatafa Abbas Abbood Albadr, Sabrina Tiun, Masri Ayob & Khairuddin Omar
School of Electrical Engineering, Department of Communication Engineering, Universiti Teknologi Malaysia, UTM Johor Bahru, Johor, Malaysia
Fahad Taha AL-Dhief
Department of Information and Technology, Uppsala University, Uppsala, Sweden
Mhd Khaled Maen

Authors

Musatafa Abbas Abbood Albadr
View author publications
You can also search for this author in PubMed Google Scholar
Sabrina Tiun
View author publications
You can also search for this author in PubMed Google Scholar
Masri Ayob
View author publications
You can also search for this author in PubMed Google Scholar
Fahad Taha AL-Dhief
View author publications
You can also search for this author in PubMed Google Scholar
Khairuddin Omar
View author publications
You can also search for this author in PubMed Google Scholar
Mhd Khaled Maen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Musatafa Abbas Abbood Albadr.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Albadr, M.A.A., Tiun, S., Ayob, M. et al. Speech emotion recognition using optimized genetic algorithm-extreme learning machine. Multimed Tools Appl 81, 23963–23989 (2022). https://doi.org/10.1007/s11042-022-12747-w

Download citation

Received: 23 February 2021
Revised: 17 May 2021
Accepted: 21 February 2022
Published: 19 March 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11042-022-12747-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech emotion recognition using optimized genetic algorithm-extreme learning machine

Abstract

Access this article

Similar content being viewed by others

Simple Meta-optimization of the Feature MFCC for Public Emotional Datasets Classification

Recognition of emotion from speech using evolutionary cepstral coefficients

Improving Speech Emotion Recognition System Using Spectral and Prosodic Features

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speech emotion recognition using optimized genetic algorithm-extreme learning machine

Abstract

Access this article

Similar content being viewed by others

Simple Meta-optimization of the Feature MFCC for Public Emotional Datasets Classification

Recognition of emotion from speech using evolutionary cepstral coefficients

Improving Speech Emotion Recognition System Using Spectral and Prosodic Features

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation