Skip to main content

Advertisement

Log in

Speech emotion recognition using optimized genetic algorithm-extreme learning machine

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Automatic Emotion Speech Recognition (ESR) is considered as an active research field in the Human-Computer Interface (HCI). Typically, the ESR system is consisting of two main parts: Front-End (features extraction) and Back-End (classification). However, most previous ESR systems have been focused on the features extraction part only and ignored the classification part. Whilst the classification process is considered an essential part in ESR systems, where its role is to map out the extracted features from audio samples to determine its corresponding emotion. Moreover, the evaluation of most ESR systems has been conducted based on Subject Independent (SI) scenario only. Therefore, in this paper, we are focusing on the Back-End (classification), where we have adopted our recent developed Extreme Learning Machine (ELM), called Optimized Genetic Algorithm-Extreme Learning Machine (OGA-ELM). In addition, we used the Mel Frequency Cepstral Coefficients (MFCC) method in order to extract the features from the speech utterances. This work proves the significance of the classification part in ESR systems, where it improves the ESR performance in terms of achieving higher accuracy. The performance of the proposed model was evaluated based on Berlin Emotional Speech (BES) dataset which consists of 7 emotions (neutral, happiness, boredom, anxiety, sadness, anger, and disgust). Four different evaluation scenarios have been conducted such as Subject Dependent (SD), SI, Gender Dependent Female (GD-Female), and Gender Dependent Male (GD-Male). The highest performance of the OGA-ELM was very impressive in the four different scenarios and achieved an accuracy of 93.26%, 100.00%, 96.14% and 97.10% for SI, SD, GD-Male, and GD-Female scenarios, respectively. Besides, the proposed ESR system has shown a fast execution time in all experiments to identify the emotions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Albadr MA, Tiun S, Ayob M, al-Dhief F (2020) Genetic algorithm based on natural selection theory for optimization problems. Symmetry 12(11):1758

    Article  Google Scholar 

  2. Albadr MAA, Tiun S (2020) Spoken language identification based on particle swarm optimisation–extreme learning machine approach. Circ Syst Signal Process 1–27

  3. Albadr MAA, Tiun S, al-Dhief FT, Sammour MAM (2018) Spoken language identification based on the enhanced self-adjusting extreme learning machine approach. PLoS One 13(4):e0194770

    Article  Google Scholar 

  4. Albadr MAA, Tiun S, Ayob M, al-Dhief FT (2019) Spoken language identification based on optimised genetic algorithm–extreme learning machine approach. Int J Speech Technol 22(3):711–727

    Article  Google Scholar 

  5. Albadr MAA, Tiun S, Ayob M, al-Dhief FT, Omar K, Hamzah FA (2020) Optimised genetic algorithm-extreme learning machine approach for automatic COVID-19 detection. PLoS One 15(12):e0242899

    Article  Google Scholar 

  6. Albadra MAA, Tiuna S (2017) Extreme learning machine: a review. Int J Appl Eng Res 12(14):4610–4623

    Google Scholar 

  7. Al-Dhief FT et al (2020) A survey of voice pathology surveillance systems based on internet of things and machine learning algorithms. IEEE Access 8:64514–64533

    Article  Google Scholar 

  8. Al-Dhief FT et al (2020) Voice pathology detection using machine learning technique. In 2020 IEEE 5th international symposium on telecommunication technologies (ISTT). IEEE

  9. Alonso JB, Cabrera J, Medina M, Travieso CM (2015) New approach in quantification of emotional intensity from the speech signal: emotional temperature. Expert Syst Appl 42(24):9554–9564

    Article  Google Scholar 

  10. Badshah AM et al (2017) Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 international conference on platform technology and service (PlatCon). IEEE

  11. Baroi OL et al (2019) Effects of different environmental noises and sampling frequencies on the performance of MFCC and PLP based Bangla isolated word recognition system. In: 2019 1st international conference on advances in Science, engineering and robotics technology (ICASERT). IEEE

  12. Basu S et al (2017) A review on emotion recognition using speech. In: 2017 international conference on inventive communication and computational technologies (ICICCT) IEEE

  13. Bi W, Xu Y, Wang H (2020) Comparison of searching behaviour of three evolutionary algorithms applied to water distribution system design optimization. Water 12(3):695

    Article  Google Scholar 

  14. Burkhardt F et al (2005) A database of German emotional speech. In: Ninth European Conference on Speech Communication and Technology

  15. Calvo RA, D'Mello S (2010) Affect detection: an interdisciplinary review of models, methods, and their applications. IEEE Trans Affect Comput 1(1):18–37

    Article  Google Scholar 

  16. Cao H, Verma R, Nenkova A (2015) Speaker-sensitive emotion recognition via ranking: studies on acted and spontaneous speech. Comput Speech Lang 29(1):186–202

    Article  Google Scholar 

  17. Chavhan Y, Dhore M, Yesaware P (2010) Speech emotion recognition using support vector machine. Int J Comput Appl 1(20):6–9

    Google Scholar 

  18. Choudhury AR et al (2018) Emotion recognition from speech signals using excitation source and spectral features. In: 2018 IEEE applied signal processing conference (ASPCON). IEEE

  19. Dendukuri LS, Hussain SJ (2019) Statistical feature set calculation using Teager energy operator on emotional speech signals. In: 2019 international conference on wireless communications signal processing and networking (WiSPNET). IEEE

  20. Deng C, Huang GB, Xu J, Tang JX (2015) Extreme learning machines: new trends and applications. Science China Inf Sci 58(2):1–16

    Article  Google Scholar 

  21. Dogra A, Kaul A, Sharma R (2019) Automatic recognition of dialects of Himachal Pradesh using MFCC &GMM. In: 2019 5th international conference on signal processing, computing and control (ISPCC). IEEE

  22. El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn 44(3):572–587

    Article  Google Scholar 

  23. Fortin F-A et al (2012) DEAP: evolutionary algorithms made easy. J Mach Learn Res 13(1):2171–2175

    MathSciNet  Google Scholar 

  24. Gangamohan P, Kadiri SR, Yegnanarayana B (2016) Analysis of emotional speech—A review, in Toward Robotic Socially Believable Behaving Systems-Volume I, Springer, p. 205–238

  25. Ghasemi J, Esmaily J, Moradinezhad R (2020) Intrusion detection system using an optimized kernel extreme learning machine and efficient features. Sādhanā 45(1):1–9

    Article  Google Scholar 

  26. Gogna A, Tayal A (2012) Comparative analysis of evolutionary algorithms for image enhancement. Int J Met 2(1):80–100

    Google Scholar 

  27. Guo L, Wang L, Dang J, Liu Z, Guan H (2019) Exploration of complementary features for speech emotion recognition based on kernel extreme learning machine. IEEE Access 7:75798–75809

    Article  Google Scholar 

  28. Han W et al (2006) An efficient MFCC extraction method in speech recognition. In: 2006 IEEE international symposium on circuits and systems. IEEE

  29. Huang G-B, Chen L, Siew CK (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892

    Article  Google Scholar 

  30. Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501

    Article  Google Scholar 

  31. Huang G-B et al (2011) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst, Man Cybern, Part B (Cybernetics) 42(2):513–529

    Article  Google Scholar 

  32. Huang G-B et al (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Systems, Man, Cybernetics, Part B (Cybernetics) 42(2)513–529

  33. Jain M et al (2020) Speech emotion recognition using support vector machine. arXiv preprint arXiv:2002.07590

  34. Juvela L et al (2018) Speech waveform synthesis from MFCC sequences with generative adversarial networks. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE

  35. Kaya H, Karpov AA (2018) Efficient and effective strategies for cross-corpus acoustic emotion recognition. Neurocomputing 275:1028–1034

    Article  Google Scholar 

  36. Kaya H, Karpov AA, Salah AA (2016) Robust acoustic emotion recognition based on cascaded normalization and extreme learning machines. In: international symposium on neural networks. Springer, 2016

  37. Kostoulas T, Mporas I, Kocsis O, Ganchev T, Katsaounos N, Santamaria JJ, Jimenez-Murcia S, Fernandez-Aranda F, Fakotakis N (2012) Affective speech interface in serious games for supporting therapy of mental disorders. Expert Syst Appl 39(12):11072–11079

    Article  Google Scholar 

  38. Kuchibhotla S, Vankayalapati HD, Anne KR (2016) An optimal two stage feature selection for speech emotion recognition using acoustic features. Int J Speech Technol 19(4):657–667

    Article  Google Scholar 

  39. Lopez-de-Ipiña K et al (2015) On automatic diagnosis of Alzheimer’s disease based on spontaneous speech analysis and emotional temperature. Cogn Comput 7(1):44–55

    Article  Google Scholar 

  40. Mar LL, Pa WP (2019) Depression detection from speech emotion recognition. Seventeenth International Conference on Computer Applications (ICCA 2019)

  41. Muda L, Begam M, Elamvazuthi I (2010) Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:1003.4083

  42. Murugappan M et al (2020) Emotion classification in Parkinson's disease EEG using RQA and ELM. In: 2020 16th IEEE international colloquium on Signal Processing & its Applications (CSPA). IEEE

  43. Neiberg D, Elenius K (2008) Automatic recognition of anger in spontaneous speech. In: Ninth Annual Conference of the International Speech Communication Association

  44. Özseven T (2019) A novel feature selection method for speech emotion recognition. Appl Acoust 146:320–326

    Article  Google Scholar 

  45. Pakyurek M, Atmis M, Kulac S, Uludag U (2020) Extraction of novel features based on histograms of MFCCs used in emotion classification from generated original speech dataset. Elektronika ir Elektrotechnika 26(1):46–51

    Article  Google Scholar 

  46. Petrushin VA (2000) Emotion recognition in speech signal: experimental study, development, and application. In: Sixth International Conference on Spoken Language Processing

  47. Poorna S, Nair G (2019) Multistage classification scheme to enhance speech emotion recognition. Int J Speech Technol 22(2):327–340

    Article  Google Scholar 

  48. Renanti MD, Buono A, Kusuma WA (2013) Infant cries identification by using codebook as feature matching, and mfcc as feature extraction. J Theoretical Appl Inform Technol 56(3)

  49. Shah AF and Anto PB (2017) Hybrid spectral features for speech emotion recognition. In: 2017 international conference on innovations in information, embedded and communication systems (ICIIECS). IEEE

  50. Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In Australasian joint conference on artificial intelligence. 2006. Springer

  51. Trang H, Loc TH, Nam HBH (2014) Proposed combination of PCA and MFCC feature extraction in speech recognition system. In: 2014 International Conference on Advanced Technologies for Communications (ATC 2014). IEEE

  52. Tripathi A, Singh U, Bansal G, Gupta R, Singh AK (2020) A review on emotion detection and classification using speech. Available at SSRN 3601803

  53. Tzinis E, Potamianos A (2017) Segment-based speech emotion recognition using recurrent neural networks. In: 2017 seventh international conference on affective computing and intelligent interaction (ACII). IEEE

  54. van Heeswijk M (2015) Advances in extreme learning machines

  55. Wang K et al (2015) Speech emotion recognition using Fourier parameters. IEEE Trans Affect Comput 6(1):69–75

    Article  Google Scholar 

  56. Wang Y, Cao F, Yuan Y (2011) A study on effectiveness of extreme learning machine. Neurocomputing 74(16):2483–2490

    Article  Google Scholar 

  57. Wilhelmstötter F (2021) Jenetics Library User’s Manual 6.2. [Online]. Available: https://jenetics.io

  58. Yogesh C et al (2017) A new hybrid PSO assisted biogeography-based optimization for emotion and stress recognition from speech signal. Expert Syst Appl 69:149–158

    Article  Google Scholar 

  59. Yu F et al (2016) Improved roulette wheel selection-based genetic algorithm for TSP. In: 2016 international conference on network and information Systems for Computers (ICNISC), IEEE

  60. Zaidan NA, Salam MS (2016) MFCC global features selection in improving speech emotion recognition rate. In: Advances in machine learning and signal processing. Springer, p. 141–153

  61. Zhang X, Sun J, Luo Z (2014) One-against-all weighted dynamic time warping for language-independent and speaker-dependent speech recognition in adverse conditions. PLoS One 9(2):e85458

    Article  Google Scholar 

  62. Zhao S et al (2014) Automatic detection of expressed emotion in Parkinson's disease. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE

Download references

Acknowledgements

This project was funded by the Universiti Kebangsaan Malaysia under Dana Impak Perdana grant (Research code: GUP-2020-063).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Musatafa Abbas Abbood Albadr.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Albadr, M.A.A., Tiun, S., Ayob, M. et al. Speech emotion recognition using optimized genetic algorithm-extreme learning machine. Multimed Tools Appl 81, 23963–23989 (2022). https://doi.org/10.1007/s11042-022-12747-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12747-w

Keywords

Navigation