Skip to main content
Log in

Speech Emotion Recognition: A Comprehensive Survey

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Speech emotion recognition could be considered a new topic in speech processing where he plays that plays an essential role in human interaction. Emotions are a king of speech that recognizes the three significant aspects of designing the speech emotion recognition system. This article reviews the work on speech emotion recognition and is helpful for further research. Firstly, speech emotion recognition databases are described for evaluating system performance. Secondly, the choice of feature is presented in the speech representation. And third is the design of a suitable class. While the section fourth explains the multiple classifier system and its impact on system. In the fifth part of the article, we review the most important challenges in the system speech emotion recognition. The final results obtained from the system function and its constraints are discussed, and we provide directions to improve speech emotion recognition systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig.19

Similar content being viewed by others

Data Availability

Enquiries about data availability should be directed to the authors.

References

  1. Nicholson, J., Takahashi, K., & Nakatsu, R. (2000). Emotion recognition in speech using neural networks. Neural Computing & Applications, 9(4), 290–296.

    Article  MATH  Google Scholar 

  2. Yoon, W.-J., Cho, Y.-H., & Park, K.-S. (2007). A study of speech emotion recognition and its application to mobile services. In International conference on ubiquitous intelligence and computing. Springer.‏

  3. Mikuckas, A., Mikuckiene, I., Venckauskas, A., Kazanavicius, E., Lukas, R., & Plauska, I. (2014). Emotion recognition in human computer interaction systems. Elektronika ir Elektrotechnika, 20(10), 51–56.

    Article  Google Scholar 

  4. Landau, M. J. (2008). Acoustical properties of speech as indicators of depression and suicidal risk. Vanderbilt Undergraduate Research Journal, 4, 66.

    Article  Google Scholar 

  5. Falk, T. H., & Chan, W. Y. (2010). Modulation spectral features for robust far-field speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 18(1), 90–100.

    Article  Google Scholar 

  6. El Ayadi, M. M. H., Kamel, M. S., & Karray, F. (2007) Speech emotion recognition using Gaussian mixture vector autoregressive models. In ICASSP 2007 (vol. 4, pp. 957–960).

  7. Patil, S., & Kharate, G. K. (2020). A review on emotional speech recognition: resources, features, and classifiers. In 2020 IEEE 5th international conference on computing communication and automation (ICCCA). IEEE‏.

  8. Akçay, M. B., & Oğuz, K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication, 116, 56–76.

    Article  Google Scholar 

  9. Begeer, S., Mandell, D., Wijnker-Holmes, B., Venderbosch, S., Rem, D., Stekelenburg, F., & Koot, H. M. (2013). Sex differences in the timing of identification among children and adults with autism spectrum disorders. Journal of Autism and Developmental Disorders, 43(5), 1151–1156.

    Article  Google Scholar 

  10. Sak, H., Vinyals, O., Heigold, G., Senior, A., McDermott, E., Monga, R., & Mao, M. (2014). Sequence discriminative distributed training of long short-term memory recurrent neural networks.‏

  11. Fernandez, R. (2004). A computational model for the automatic recognition of affect in speech. Diss. Massachusetts Institute of Technology‏.

  12. Chowdhury, A., & Ross, A. (2019). Fusing MFCC and LPC features using 1D triplet CNN for speaker recognition in severely degraded audio signals. IEEE Transactions on Information Forensics and Security, 15, 1616–1629.

    Article  Google Scholar 

  13. Liscombe, J. J. (2007). Prosody and speaker state: Paralinguistics, pragmatics, and proficiency. Columbia University.‏

  14. Wang, J., & Han, Z. (2019). Research on speech emotion recognition technology based on deep and shallow neural network. In 2019 Chinese control conference (CCC). IEEE.‏

  15. Bojanić, M., Delić, V., & Karpov, A. (2020). Call redistribution for a call center based on speech emotion recognition. Applied Sciences, 10(13), 4653.

    Article  Google Scholar 

  16. Ververidis, D., & Kotropoulos, C. (2003). A review of emotional speech databases. In Proceedings of the panhellenic conference on informatics (PCI) (vol. 2003). 2003.‏

  17. Engberg, I. S., & Hansen, A. V. (1996). Documentation of the Danish emotional speech database des. Internal A.A.U. report, Center for Person Kommunikation, Denmark 22.‏

  18. Chen, M., & Zhao, X. (2020). A multi-scale fusion framework for bimodal speech emotion recognition. Interspeech‏.

  19. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. A database of German emotional speech. In Ninth european conference on speech communication and technology.‏

  20. Liberman, M. (2002). Emotional prosody speech and transcripts. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002S28.‏

  21. Koolagudi, S. G., Reddy, R., Yadav, J., & Rao, K. S. (2011). IITKGP-SEHSC: Hindi speech corpus for emotion analysis. In 2011 International conference on devices and communications (ICDeCom). IEEE.

  22. Kandali, A. B., Routray, A., & Basu, T. K. (2009). Vocal emotion recognition in five native languages of Assam using new wavelet features. International Journal of Speech Technology, 12(1), 1–13.

    Article  Google Scholar 

  23. Li, Y., Tao, J., Chao, L., Bao, W., & Liu, Y. (2017). CHEAVD: A Chinese natural emotional audio–visual database. Journal of Ambient Intelligence and Humanized Computing, 8(6), 913–924.

    Article  Google Scholar 

  24. Zhalehpour, S., Onder, O., Akhtar, Z., & Erdem, C. E. (2016). BAUM-1: A spontaneous audio-visual face database of affective and mental states. IEEE Transactions on Affective Computing, 8(3), 300–313.

    Article  Google Scholar 

  25. Hansen, J. H. L., & Bou-Ghazale, S. E. (1997). Getting started with SUSAS: A speech under simulated and actual stress database. In Fifth European conference on speech communication and technology‏.

  26. Jackson, P. (2014). Haq SJU (2014) Surrey audio-visual expressed emotion (savee) database. University of Surrey.

    Google Scholar 

  27. Zhang, J. T. F. L. M., & Jia, H. (2008). Design of speech corpus for mandarin text to speech. In The Blizzard challenge 2008 workshop.

  28. Chatterjee, R., Mazumdar, S., Sherratt, R. S., Halder, R., Maitra, T., & Giri, D. (2021). Real-time speech emotion analysis for smart home assistants. IEEE Transactions on Consumer Electronics, 67(1), 68–76.

    Article  Google Scholar 

  29. Engberg, I. S., Hansen, A. V., Andersen, O., & Dalsgaard, P. (1997). Design, recording and verification of a Danish emotional speech database. In Fifth European conference on speech communication and technology.‏

  30. Mori, S., Moriyama, T., & Ozawa, S. (2006). Emotional speech synthesis using subspace constraints in prosody. In 2006 IEEE international conference on multimedia and expo. IEEE.‏

  31. Ringeval, F., Sonderegger, A., Sauer, J., & Lalanne, D. (2013). Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG). IEEE.

  32. Asgari, M., Kiss, G., Van Santen, J., Shafran, I., & Song, X. (2014). Automatic measurement of affective valence and arousal in speech. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE.‏

  33. Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., & Mahjoub, M. A. (2018). Speech emotion recognition: Methods and cases study. ICAART, 20(2), 66.

    Google Scholar 

  34. Cámbara, G., Luque, J., & Farrús, M. (2020). Convolutional speech recognition with pitch and voice quality features. arXiv preprint arXiv:2009.01309.‏

  35. Alex, S. B., Mary, L., & Babu, B. P. (2020). Attention and feature selection for automatic speech emotion recognition using utterance and syllable-level prosodic features. Circuits, Systems, and Signal Processing, 39(11), 5681–5709.

    Article  Google Scholar 

  36. Koduru, A., Valiveti, H. B., & Budati, A. K. (2020). Feature extraction algorithms to improve the speech emotion recognition rate. International Journal of Speech Technology, 23(1), 45–55.

    Article  Google Scholar 

  37. Abdel-Hamid, L. (2020). Egyptian Arabic speech emotion recognition using prosodic, spectral and wavelet features. Speech Communication, 122, 19–30.

    Article  Google Scholar 

  38. Farrús, M., Hernando, J., & Ejarque, P. (2007). Jitter and shimmer measurements for speaker recognition. In 8th Annual conference of the International Speech Communication Association; 2007 Aug. 27–31; Antwerp (Belgium) (pp. 778–781). International Speech Communication Association (ISCA).

  39. Li, X., Tao, J., Johnson, M. T., Soltis, J., Savage, A., Leong, K. M., & Newman, J. D. (2007). Stress and emotion classification using jitter and shimmer features. In 2007 IEEE international conference on acoustics, speech and signal processing—ICASSP'07 (vol. 4). IEEE.‏

  40. Lokesh, S., & Ramya Devi, M. (2019). Speech recognition system using enhanced mel frequency cepstral coefficient with windowing and framing method. Cluster Computing, 22(5), 11669–11679.

    Article  Google Scholar 

  41. Yang, Z., & Huang, Y. (2022). Algorithm for speech emotion recognition classification based on mel-frequency cepstral coefficients and broad learning system. Evolutionary Intelligence, 15(4), 2485–2494.

    Article  Google Scholar 

  42. Dey, A., Chattopadhyay, S., Singh, P. K., Ahmadian, A., Ferrara, M., & Sarkar, R. (2020). A hybrid meta-heuristic feature selection method using golden ratio and equilibrium optimization algorithms for speech emotion recognition. IEEE Access, 8, 200953–200970.

    Article  Google Scholar 

  43. Ancilin, J., & Milton, A. (2021). Improved speech emotion recognition with Mel frequency magnitude coefficient. Applied Acoustics, 179, 108046.

    Article  Google Scholar 

  44. Albu, C., Lupu, E., & Arsinte, R. (2019). Emotion recognition from speech signal in multilingual experiments. In 6th International conference on advancements of medicine and health care through technology; 17–20 October 2018, Cluj-Napoca, Romania. Springer.‏

  45. Patni, H., Jagtap, A., Bhoyar, V., & Gupta, A. (2021). Speech emotion recognition using MFCC, GFCC, Chromagram and RMSE features. In 2021 8th International conference on signal processing and integrated networks (SPIN). IEEE.‏

  46. Zvarevashe, K., & Olugbara, O. (2020). Ensemble learning of hybrid acoustic features for speech emotion recognition. Algorithms, 13(3), 70.

    Article  Google Scholar 

  47. Palo, H. K., Chandra, M., & Mohanty, M. N. (2017). Emotion recognition using M.L.P. and GMM for Oriya language. International Journal of Computational Vision and Robotics, 7(4), 426–442.

    Article  Google Scholar 

  48. Jha, T., Kavya, R., Christopher, J., & Arunachalam, V. (2022). Machine learning techniques for speech emotion recognition using paralinguistic acoustic features. International Journal of Speech Technology, 25(3), 707–725.

    Article  Google Scholar 

  49. Pearson, K. L. I. I. I. (1901). On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), 559–572.

    Article  MATH  Google Scholar 

  50. Kacha, A., Grenez, F., Orozco-Arroyave, J. R., & Schoentgen, J. (2020). Principal component analysis of the spectrogram of the speech signal: Interpretation and application to dysarthric speech. Computer Speech & Language, 59, 114–122.

    Article  Google Scholar 

  51. Al-Dujaili, M. J., & Mezeel, M. T. (2021). Novel approach for reinforcement the extraction of E.C.G. signal for twin fetuses based on modified B.S.S. Wireless Personal Communications, 119(3), 2431–2450.

    Article  Google Scholar 

  52. Lugger, M., Janoir, M.-E., & Yang, B. (2009). Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In 2009 17th European signal processing conference. IEEE.‏

  53. Pourdarbani, R., Sabzi, S., Kalantari, D., Hernández-Hernández, J. L., & Arribas, J. I. (2020). A computer vision system based on majority-voting ensemble neural network for the automatic classification of three chickpea varieties. Foods, 9(2), 113.

    Article  Google Scholar 

  54. Issa, D., Fatih Demirci, M., & Yazici, A. (2020). Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control, 59, 101894.

    Article  Google Scholar 

  55. Al Dujaili, M. J., Ebrahimi-Moghadam, A., & Fatlawi, A. (2021). Speech emotion recognition based on SVM and K_NN classifications fusion. International Journal of Electrical and Computer Engineering, 11(2), 1259.

    Google Scholar 

  56. Sun, L., Zou, B., Fu, S., Chen, J., & Wang, F. (2019). Speech emotion recognition based on DNN-decision tree SVM model. Speech Communication, 115, 29–37.

    Article  Google Scholar 

  57. Venkataramanan, K., & Rajamohan, H. R. (2019). Emotion recognition from speech. arXiv preprint arXiv:1912.10458.‏

  58. Mao, S., Tao, D., Zhang, G., Ching, P. C., & Lee, T. (2019). Revisiting hidden Markov models for speech emotion recognition. In ICASSP 2019—2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE.‏

  59. Praseetha, V. M., & Joby, P. P. (2021). Speech emotion recognition using data augmentation. International Journal of Speech Technology, 66, 1–10.

    Google Scholar 

  60. Zimmermann, M., Mehdipour Ghazi, M., Ekenel, H. K., & Thiran, J. P. (2016). Visual speech recognition using PCA networks and LSTMs in a tandem GMM-HMM system. In Asian conference on computer vision. Springer.‏

  61. Vlassis, N., & Likas, A. (2002). A greedyEM algorithm for Gaussian mixture learning. Neural Processing Letters, 15(1), 77–87.

    Article  MATH  Google Scholar 

  62. Patnaik, S. (2022). Speech emotion recognition by using complex MFCC and deep sequential model. Multimedia Tools and Applications, 66, 1–26.

    Google Scholar 

  63. Zhang, J., Yin, Z., Chen, P., & Nichele, S. (2020). Emotion recognition using multimodal data and machine learning techniques: A tutorial and review. Information Fusion, 59, 103–126.

    Article  Google Scholar 

  64. Wang, C., Ren, Y., Zhang, N., Cui, F., & Luo, S. (2022). Speech emotion recognition based on multi feature and multi lingual fusion. Multimedia Tools and Applications, 81(4), 4897–4907.

    Article  Google Scholar 

  65. Mao, J.-W., He, Y., & Liu, Z.-T. (2018). Speech emotion recognition based on linear discriminant analysis and support vector machine decision tree. In 2018 37th Chinese control conference (CCC). IEEE.‏

  66. Zhao, J. J., Ma, R. L., & Zhang, X. L. (2017). Speech emotion recognition based on decision tree and improved SVM mixed model. Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 37(4), 386–390.

    Google Scholar 

  67. Jacob, A. (2017). Modelling speech emotion recognition using logistic regression and decision trees. International Journal of Speech Technology, 20(4), 897–905.

    Article  Google Scholar 

  68. Waghmare, V. B., Deshmukh, R. R., Shrishrimal, P. P., Janvale, G. B., & Ambedkar, B. (2014). Emotion recognition system from artificial marathi speech using MFCC and LDA techniques. In Fifth international conference on advances in communication, network, and computing—C.N.C.

  69. Lingampeta, D., & Yalamanchili, B. (2020). Human emotion recognition using acoustic features with optimized feature selection and fusion techniques. In 2020 International conference on inventive computation technologies (ICICT). IEEE.‏

  70. Kurpukdee, N., Koriyama, T., Kobayashi, T., Kasuriya, S., Wutiwiwatchai, C., & Lamsrichan, P. (2017). Speech emotion recognition using convolutional long short-term memory neural network and support vector machines. 2017 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE.‏

  71. Butz, M. V. (2002). Anticipatory learning classifier systems, (Vol. 4). Springer.

    MATH  Google Scholar 

  72. Wang, Y., & Guan, L. (2004). An investigation of speech-based human emotion recognition. In IEEE 6th workshop on multimedia signal processing, 2004. IEEE.‏

  73. Vryzas, N., Vrysis, L., Matsiola, M., Kotsakis, R., Dimoulas, C., & Kalliris, G. (2020). Continuous speech emotion recognition with convolutional neural networks. Journal of the Audio Engineering Society, 68(1/2), 14–24.

    Article  Google Scholar 

  74. Lieskovská, E., Jakubec, M., Jarina, R., & Chmulík, M. (2021). A review on speech emotion recognition using deep learning and attention mechanism. Electronics, 10(10), 1163.

    Article  Google Scholar 

  75. Badshah, A. M., Ahmad, J., Rahim, N., & Baik, S. W. (2017). Speech emotion recognition from spectrograms with deep convolutional neural network. In Proceedings of the international conference on platform technology service (pp. 1–5).

  76. Khalil, R. A., Jones, E., Babar, M. I., Jan, T., Zafar, M. H., & Alhussain, T. (2019). Speech emotion recognition using deep learning techniques: A review. IEEE Access, 7, 117327–117345.

    Article  Google Scholar 

  77. Xie, Y., Liang, R., Liang, Z., Huang, C., Zou, C., & Schüller, B. (2019). Speech emotion classification using attention-based LSTM. IEEE/ACM Transactions on Audio Speech Language Processing, 27, 1675–1685.

    Article  Google Scholar 

  78. Zhao, J., Mao, X., & Chen, L. (2019). Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control, 47, 312–323.

    Article  Google Scholar 

  79. Qayyum, A. B. A., Arefeen, A., & Shahnaz, C. (2019). Convolutional neural network (CNN) based speech-emotion recognition. In 2019 IEEE international conference on signal processing, information, communication & systems (SPICSCON). IEEE.

  80. Nam, Y., & Lee, C. (2021). Cascaded convolutional neural network architecture for speech emotion recognition in noisy conditions. Sensors, 21(13), 4399.

    Article  Google Scholar 

  81. Christy, A., Vaithyasubramanian, S., Jesudoss, A., & Praveena, M. A. (2020). Multimodal speech emotion recognition and classification using convolutional neural network techniques. International Journal of Speech Technology, 23(2), 381–388.

    Article  Google Scholar 

  82. Yao, Z., Wang, Z., Liu, W., Liu, Y., & Pan, J. (2020). Speech emotion recognition using fusion of three multi-task learning-based classifiers: HSF-DNN, MS-CNN and LLD-RNN. Speech Communication, 120, 11–19.

    Article  Google Scholar 

  83. Alghifari, M. F., Gunawan, T. S., & Kartiwi, M. (2018). Speech emotion recognition using deep feedforward neural network. Indonesian Journal of Electrical Engineering and Computer Science, 10(2), 554–561.

    Article  Google Scholar 

  84. Yadav, S. P., Zaidi, S., Mishra, A., & Yadav, V. (2022). Survey on machine learning in speech emotion recognition and vision systems using a recurrent neural network (RNN). Archives of Computational Methods in Engineering, 29(3), 1753–1770.

    Article  Google Scholar 

  85. Rejaibi, E., Komaty, A., Meriaudeau, F., Agrebi, S., & Othmani, A. (2022). MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. Biomedical Signal Processing and Control, 71, 103107.

    Article  Google Scholar 

  86. Zheng, H., & Yang, Y. (2019). An improved speech emotion recognition algorithm based on deep belief network. In 2019 IEEE international conference on power, intelligent computing and systems (ICPICS). IEEE.‏‏

  87. Valiyavalappil Haridas, A., Marimuthu, R., Sivakumar, V. G., & Chakraborty, B. (2020). Emotion recognition of speech signal using Taylor series and deep belief network based classification. Evolutionary Intelligence, 66, 1–14.

    Google Scholar 

  88. Huang, C., Gong, W., Fu, W., & Feng, D. (2014). A research of speech emotion recognition based on deep belief network and SVM. Mathematical Problems in Engineering, 6, 66.

    Google Scholar 

  89. Poon-Feng, K., Huang, D. Y., Dong, M., & Li, H. (2014). Acoustic emotion recognition based on fusion of multiple feature-dependent deep Boltzmann machines. In The 9th international symposium on chinese spoken language processing. IEEE.‏

  90. Bautista, J. L., Lee, Y. K., & Shin, H. S. (2022). Speech emotion recognition based on parallel CNN-attention networks with multi-fold data augmentation. Electronics, 11(23), 3935.

    Article  Google Scholar 

  91. Quck, W. Y., Huang, D. Y., Lin, W., Li, H., & Dong, M. (2016). Mobile acoustic emotion recognition. In 2016 IEEE region 10 conference (TENCON). IEEE.

  92. Atmaja, B. T., & Akagi, M. (2019). Speech emotion recognition based on speech segment using LSTM with attention model. In 2019 IEEE international conference on signals and systems (ICSigSys). IEEE.‏

  93. Abdelhamid, A. A., El-Kenawy, E. S., Alotaibi, B., Amer, G. M., Abdelkader, M. Y., Ibrahim, A., & Eid, M. M. (2022). Robust speech emotion recognition using CNN+ LSTM based on stochastic fractal search optimization algorithm. IEEE Access, 10, 49265–49284.

    Article  Google Scholar 

  94. Kaya, H., Fedotov, D., Yesilkanat, A., Verkholyak, O., Zhang, Y., & Karpov, A. (2018). LSTM based cross-corpus and cross-task acoustic emotion recognition. Interspeech.‏

  95. Shami, M. T., & Kamel, M. S. (2005). Segment-based approach to the recognition of emotions in speech. In 2005 IEEE international conference on multimedia and expo. IEEE‏.

  96. Sun, L., Huang, Y., Li, Q., & Li, P. (2022). Multi-classification speech emotion recognition based on two-stage bottleneck features selection and MCJD algorithm. Signal, Image and Video Processing, 66, 1–9.

    Google Scholar 

  97. Wu, C.-H., & Liang, W.-B. (2010). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transactions on Affective Computing, 2(1), 10–21.

    Google Scholar 

  98. Fierrez, J., Morales, A., Vera-Rodriguez, R., & Camacho, D. (2018). Multiple classifiers in biometrics. Part 1: Fundamentals and review. Information Fusion, 44, 57–64.

    Article  Google Scholar 

  99. Jahangir, R., Teh, Y. W., Hanif, F., & Mujtaba, G. (2021). Deep learning approaches for speech emotion recognition: State of the art and research challenges. Multimedia Tools and Applications, 80(16), 23745–23812.

    Article  Google Scholar 

  100. Song, P., Jin, Y., Zhao, L., & Xin, M. (2014). Speech emotion recognition using transfer learning. IEICE Transactions on Information and Systems, 97(9), 2530–2532.

    Article  Google Scholar 

  101. Basu, S., Chakraborty, J., Bag, A., & Aftabuddin, M. (2017). A review on emotion recognition using speech. In 2017 International conference on inventive communication and computational technologies (ICICCT). IEEE.

  102. Jiang, W., Wang, Z., Jin, J. S., Han, X., & Li, C. (2019). Speech emotion recognition with heterogeneous feature unification of deep neural network. Sensors, 19(12), 2730.

    Article  Google Scholar 

  103. Zhao, Z., Zhao, Y., Bao, Z., Wang, H., Zhang, Z., & Li, C. (2018). Deep spectrum feature representations for speech emotion recognition. Proceedings of the joint workshop of the 4th workshop on affective social multimedia computing and first multimodal affective computing of large-scale multimedia data.

  104. Anvarjon, T., & Kwon, S. (2020). Deep-net: A lightweight CNN-based speech emotion recognition system using deep frequency features. Sensors, 20(18), 5212.

    Article  Google Scholar 

  105. Lalitha, S., Geyasruti, D., Narayanan, R., & Shravani, M. (2015). Emotion detection using MFCC and Cepstrum features. Procedia Computer Science, 70, 29–35.

    Article  Google Scholar 

  106. Sun, L., & Fu, S. (2019). Wang F (2019) Decision tree SVM model with Fisher feature selection for speech emotion recognition. EURASIP Journal on Audio, Speech, and Music Processing, 1, 1–14.

    Google Scholar 

  107. Yeh, J.-H., Pao, T.-L., Lin, C.-Y., Tsai, Y.-W., & Chen, Y.-T. (2011). Segment-based emotion recognition from continuous Mandarin Chinese speech. Computers in Human Behavior, 27(5), 1545–1552.

    Article  Google Scholar 

  108. Albornoz, E. M., Milone, D. H., & Rufiner, H. L. (2011). Spoken emotion recognition using hierarchical classifiers. Computer Speech & Language, 25(3), 556–570.

    Article  Google Scholar 

  109. Lim, W., Jang, D., & Lee, T. (2016). Speech emotion recognition using convolutional and recurrent neural networks. In 2016 Asia-Pacific signal and information processing association annual summit and conference (APSIPA). IEEE.

  110. Huang, Z., Dong, M., Mao, Q., & Zhan, Y. (2014). Speech emotion recognition using CNN. In Proceedings of the 22nd ACM international conference on multimedia.‏

  111. Grimm, M., Kroschel, K., Mower, E., & Narayanan, S. (2007). Primitives-based evaluation and estimation of emotions in speech. Speech Communications, 49(10–110), 787–800.

    Article  Google Scholar 

  112. Kwon, O. W., Chan, K., Hao, J., & Lee, T. W. (2003). Emotion recognition by speech signals. In Eighth European conference on speech communication and technology.‏

  113. Alonso, J. B., Cabrera, J., Medina, M., & Travieso, C. M. (2015). New approach in quantification of emotional intensity from the speech signal: Emotional temperature. Expert Systems with Applications, 42(24), 9554–9564.

    Article  Google Scholar 

  114. Shukla, S., Dandapat, S., & Mahadeva Prasanna, S. R. (2016). A subspace projection approach for analysis of speech under stressed condition. Circuits, Systems, and Signal Processing, 35(12), 4486–4500.

    Article  MathSciNet  Google Scholar 

  115. Mao, Q., Dong, M., Huang, Z., & Zhan, Y. (2014). Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia, 16(8), 2203–2213.

    Article  Google Scholar 

  116. Liu, G., He, W., & Jin, B. (2018). Feature fusion of speech emotion recognition based on deep learning. In 2018 International conference on network infrastructure and digital content (IC-NIDC). IEEE.‏

  117. Lanjewar, R. B., Mathurkar, S., & Patel, N. (2015). Implementation and comparison of speech emotion recognition system using Gaussian mixture model (GMM) and K-nearest neighbor (K-NN) techniques. Procedia Computer Science, 49, 50–57.

    Article  Google Scholar 

  118. Shaw, A., Vardhan, R. K., & Saxena, S. (2016). Emotion recognition and classification in speech using artificial neural networks. International Journal of Computer Applications, 145(8), 5–9.

    Article  Google Scholar 

  119. Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., Mahjoub, M. A., Cleder, C. (2020). Automatic speech emotion recognition using machine learning. In Social media and machine learning. InTech.

  120. Kumar, S., & Yadav, J. (2021). Emotion recognition in Hindi language using gender information, GMFCC, DMFCC and deep LSTM. In Journal of Physics: Conference Series 1950. No. 1. I.O.P. Publishing.

  121. Rajisha, T. M., Sunija, A. P., & Riyas, K. S. (2016). Performance analysis of Malayalam language speech emotion recognition system using ANN/SVM. Procedia Technology, 24, 1097–1104.

    Article  Google Scholar 

  122. Kandali, A. B., Routray, A., & Basu, T. K. (2008). Emotion recognition from Assamese speeches using MFCC features and GMM classifier. In TENCON 2008—2008 IEEE region 10 conference. IEEE.

  123. Liu, D., Chen, L., Wang, Z., & Diao, G. (2021). Speech expression multimodal emotion recognition based on deep belief network. Journal of Grid Computing, 19(2), 1–13.

    Article  Google Scholar 

  124. Sharma, S. (2021). Emotion recognition from speech using artificial neural networks and recurrent neural networks. In 2021 11th International conference on cloud computing, data science & engineering (confluence). IEEE.

  125. Kwon, S. (2020). CLSTM: Deep feature-based speech emotion recognition using the hierarchical ConvLSTM network. Mathematics, 8(12), 2133.

    Article  Google Scholar 

Download references

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Mohammed Jawad Al-Dujaili or Abbas Ebrahimi-Moghadam.

Ethics declarations

Conflict of interest

We have no conflicts of interest to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Dujaili, M.J., Ebrahimi-Moghadam, A. Speech Emotion Recognition: A Comprehensive Survey. Wireless Pers Commun 129, 2525–2561 (2023). https://doi.org/10.1007/s11277-023-10244-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-023-10244-3

Keywords

Navigation