Speech Emotion Recognition: A Comprehensive Survey

Al-Dujaili, Mohammed Jawad; Ebrahimi-Moghadam, Abbas

doi:10.1007/s11277-023-10244-3

Speech Emotion Recognition: A Comprehensive Survey

Published: 08 March 2023

Volume 129, pages 2525–2561, (2023)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

2264 Accesses
11 Citations
Explore all metrics

Abstract

Speech emotion recognition could be considered a new topic in speech processing where he plays that plays an essential role in human interaction. Emotions are a king of speech that recognizes the three significant aspects of designing the speech emotion recognition system. This article reviews the work on speech emotion recognition and is helpful for further research. Firstly, speech emotion recognition databases are described for evaluating system performance. Secondly, the choice of feature is presented in the speech representation. And third is the design of a suitable class. While the section fourth explains the multiple classifier system and its impact on system. In the fifth part of the article, we review the most important challenges in the system speech emotion recognition. The final results obtained from the system function and its constraints are discussed, and we provide directions to improve speech emotion recognition systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 13

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Article Open access 13 February 2024

Priyadarsini Samal & Mohammad Farukh Hashmi

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Essam H. Houssein, Asmaa Hammad & Abdelmgeid A. Ali

Automatic speech recognition: a survey

Article 10 November 2020

Mishaim Malik, Muhammad Kamran Malik, … Imran Makhdoom

Data Availability

Enquiries about data availability should be directed to the authors.

References

Nicholson, J., Takahashi, K., & Nakatsu, R. (2000). Emotion recognition in speech using neural networks. Neural Computing & Applications, 9(4), 290–296.
Article MATH Google Scholar
Yoon, W.-J., Cho, Y.-H., & Park, K.-S. (2007). A study of speech emotion recognition and its application to mobile services. In International conference on ubiquitous intelligence and computing. Springer.‏
Mikuckas, A., Mikuckiene, I., Venckauskas, A., Kazanavicius, E., Lukas, R., & Plauska, I. (2014). Emotion recognition in human computer interaction systems. Elektronika ir Elektrotechnika, 20(10), 51–56.
Article Google Scholar
Landau, M. J. (2008). Acoustical properties of speech as indicators of depression and suicidal risk. Vanderbilt Undergraduate Research Journal, 4, 66.
Article Google Scholar
Falk, T. H., & Chan, W. Y. (2010). Modulation spectral features for robust far-field speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 18(1), 90–100.
Article Google Scholar
El Ayadi, M. M. H., Kamel, M. S., & Karray, F. (2007) Speech emotion recognition using Gaussian mixture vector autoregressive models. In ICASSP 2007 (vol. 4, pp. 957–960).
Patil, S., & Kharate, G. K. (2020). A review on emotional speech recognition: resources, features, and classifiers. In 2020 IEEE 5th international conference on computing communication and automation (ICCCA). IEEE‏.
Akçay, M. B., & Oğuz, K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication, 116, 56–76.
Article Google Scholar
Begeer, S., Mandell, D., Wijnker-Holmes, B., Venderbosch, S., Rem, D., Stekelenburg, F., & Koot, H. M. (2013). Sex differences in the timing of identification among children and adults with autism spectrum disorders. Journal of Autism and Developmental Disorders, 43(5), 1151–1156.
Article Google Scholar
Sak, H., Vinyals, O., Heigold, G., Senior, A., McDermott, E., Monga, R., & Mao, M. (2014). Sequence discriminative distributed training of long short-term memory recurrent neural networks.‏
Fernandez, R. (2004). A computational model for the automatic recognition of affect in speech. Diss. Massachusetts Institute of Technology‏.
Chowdhury, A., & Ross, A. (2019). Fusing MFCC and LPC features using 1D triplet CNN for speaker recognition in severely degraded audio signals. IEEE Transactions on Information Forensics and Security, 15, 1616–1629.
Article Google Scholar
Liscombe, J. J. (2007). Prosody and speaker state: Paralinguistics, pragmatics, and proficiency. Columbia University.‏
Wang, J., & Han, Z. (2019). Research on speech emotion recognition technology based on deep and shallow neural network. In 2019 Chinese control conference (CCC). IEEE.‏
Bojanić, M., Delić, V., & Karpov, A. (2020). Call redistribution for a call center based on speech emotion recognition. Applied Sciences, 10(13), 4653.
Article Google Scholar
Ververidis, D., & Kotropoulos, C. (2003). A review of emotional speech databases. In Proceedings of the panhellenic conference on informatics (PCI) (vol. 2003). 2003.‏
Engberg, I. S., & Hansen, A. V. (1996). Documentation of the Danish emotional speech database des. Internal A.A.U. report, Center for Person Kommunikation, Denmark 22.‏
Chen, M., & Zhao, X. (2020). A multi-scale fusion framework for bimodal speech emotion recognition. Interspeech‏.
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. A database of German emotional speech. In Ninth european conference on speech communication and technology.‏
Liberman, M. (2002). Emotional prosody speech and transcripts. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002S28.‏
Koolagudi, S. G., Reddy, R., Yadav, J., & Rao, K. S. (2011). IITKGP-SEHSC: Hindi speech corpus for emotion analysis. In 2011 International conference on devices and communications (ICDeCom). IEEE.
Kandali, A. B., Routray, A., & Basu, T. K. (2009). Vocal emotion recognition in five native languages of Assam using new wavelet features. International Journal of Speech Technology, 12(1), 1–13.
Article Google Scholar
Li, Y., Tao, J., Chao, L., Bao, W., & Liu, Y. (2017). CHEAVD: A Chinese natural emotional audio–visual database. Journal of Ambient Intelligence and Humanized Computing, 8(6), 913–924.
Article Google Scholar
Zhalehpour, S., Onder, O., Akhtar, Z., & Erdem, C. E. (2016). BAUM-1: A spontaneous audio-visual face database of affective and mental states. IEEE Transactions on Affective Computing, 8(3), 300–313.
Article Google Scholar
Hansen, J. H. L., & Bou-Ghazale, S. E. (1997). Getting started with SUSAS: A speech under simulated and actual stress database. In Fifth European conference on speech communication and technology‏.
Jackson, P. (2014). Haq SJU (2014) Surrey audio-visual expressed emotion (savee) database. University of Surrey.
Google Scholar
Zhang, J. T. F. L. M., & Jia, H. (2008). Design of speech corpus for mandarin text to speech. In The Blizzard challenge 2008 workshop.
Chatterjee, R., Mazumdar, S., Sherratt, R. S., Halder, R., Maitra, T., & Giri, D. (2021). Real-time speech emotion analysis for smart home assistants. IEEE Transactions on Consumer Electronics, 67(1), 68–76.
Article Google Scholar
Engberg, I. S., Hansen, A. V., Andersen, O., & Dalsgaard, P. (1997). Design, recording and verification of a Danish emotional speech database. In Fifth European conference on speech communication and technology.‏
Mori, S., Moriyama, T., & Ozawa, S. (2006). Emotional speech synthesis using subspace constraints in prosody. In 2006 IEEE international conference on multimedia and expo. IEEE.‏
Ringeval, F., Sonderegger, A., Sauer, J., & Lalanne, D. (2013). Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG). IEEE.
Asgari, M., Kiss, G., Van Santen, J., Shafran, I., & Song, X. (2014). Automatic measurement of affective valence and arousal in speech. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE.‏
Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., & Mahjoub, M. A. (2018). Speech emotion recognition: Methods and cases study. ICAART, 20(2), 66.
Google Scholar
Cámbara, G., Luque, J., & Farrús, M. (2020). Convolutional speech recognition with pitch and voice quality features. arXiv preprint arXiv:2009.01309.‏
Alex, S. B., Mary, L., & Babu, B. P. (2020). Attention and feature selection for automatic speech emotion recognition using utterance and syllable-level prosodic features. Circuits, Systems, and Signal Processing, 39(11), 5681–5709.
Article Google Scholar
Koduru, A., Valiveti, H. B., & Budati, A. K. (2020). Feature extraction algorithms to improve the speech emotion recognition rate. International Journal of Speech Technology, 23(1), 45–55.
Article Google Scholar
Abdel-Hamid, L. (2020). Egyptian Arabic speech emotion recognition using prosodic, spectral and wavelet features. Speech Communication, 122, 19–30.
Article Google Scholar
Farrús, M., Hernando, J., & Ejarque, P. (2007). Jitter and shimmer measurements for speaker recognition. In 8th Annual conference of the International Speech Communication Association; 2007 Aug. 27–31; Antwerp (Belgium) (pp. 778–781). International Speech Communication Association (ISCA).
Li, X., Tao, J., Johnson, M. T., Soltis, J., Savage, A., Leong, K. M., & Newman, J. D. (2007). Stress and emotion classification using jitter and shimmer features. In 2007 IEEE international conference on acoustics, speech and signal processing—ICASSP'07 (vol. 4). IEEE.‏
Lokesh, S., & Ramya Devi, M. (2019). Speech recognition system using enhanced mel frequency cepstral coefficient with windowing and framing method. Cluster Computing, 22(5), 11669–11679.
Article Google Scholar
Yang, Z., & Huang, Y. (2022). Algorithm for speech emotion recognition classification based on mel-frequency cepstral coefficients and broad learning system. Evolutionary Intelligence, 15(4), 2485–2494.
Article Google Scholar
Dey, A., Chattopadhyay, S., Singh, P. K., Ahmadian, A., Ferrara, M., & Sarkar, R. (2020). A hybrid meta-heuristic feature selection method using golden ratio and equilibrium optimization algorithms for speech emotion recognition. IEEE Access, 8, 200953–200970.
Article Google Scholar
Ancilin, J., & Milton, A. (2021). Improved speech emotion recognition with Mel frequency magnitude coefficient. Applied Acoustics, 179, 108046.
Article Google Scholar
Albu, C., Lupu, E., & Arsinte, R. (2019). Emotion recognition from speech signal in multilingual experiments. In 6th International conference on advancements of medicine and health care through technology; 17–20 October 2018, Cluj-Napoca, Romania. Springer.‏
Patni, H., Jagtap, A., Bhoyar, V., & Gupta, A. (2021). Speech emotion recognition using MFCC, GFCC, Chromagram and RMSE features. In 2021 8th International conference on signal processing and integrated networks (SPIN). IEEE.‏
Zvarevashe, K., & Olugbara, O. (2020). Ensemble learning of hybrid acoustic features for speech emotion recognition. Algorithms, 13(3), 70.
Article Google Scholar
Palo, H. K., Chandra, M., & Mohanty, M. N. (2017). Emotion recognition using M.L.P. and GMM for Oriya language. International Journal of Computational Vision and Robotics, 7(4), 426–442.
Article Google Scholar
Jha, T., Kavya, R., Christopher, J., & Arunachalam, V. (2022). Machine learning techniques for speech emotion recognition using paralinguistic acoustic features. International Journal of Speech Technology, 25(3), 707–725.
Article Google Scholar
Pearson, K. L. I. I. I. (1901). On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), 559–572.
Article MATH Google Scholar
Kacha, A., Grenez, F., Orozco-Arroyave, J. R., & Schoentgen, J. (2020). Principal component analysis of the spectrogram of the speech signal: Interpretation and application to dysarthric speech. Computer Speech & Language, 59, 114–122.
Article Google Scholar
Al-Dujaili, M. J., & Mezeel, M. T. (2021). Novel approach for reinforcement the extraction of E.C.G. signal for twin fetuses based on modified B.S.S. Wireless Personal Communications, 119(3), 2431–2450.
Article Google Scholar
Lugger, M., Janoir, M.-E., & Yang, B. (2009). Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In 2009 17th European signal processing conference. IEEE.‏
Pourdarbani, R., Sabzi, S., Kalantari, D., Hernández-Hernández, J. L., & Arribas, J. I. (2020). A computer vision system based on majority-voting ensemble neural network for the automatic classification of three chickpea varieties. Foods, 9(2), 113.
Article Google Scholar
Issa, D., Fatih Demirci, M., & Yazici, A. (2020). Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control, 59, 101894.
Article Google Scholar
Al Dujaili, M. J., Ebrahimi-Moghadam, A., & Fatlawi, A. (2021). Speech emotion recognition based on SVM and K_NN classifications fusion. International Journal of Electrical and Computer Engineering, 11(2), 1259.
Google Scholar
Sun, L., Zou, B., Fu, S., Chen, J., & Wang, F. (2019). Speech emotion recognition based on DNN-decision tree SVM model. Speech Communication, 115, 29–37.
Article Google Scholar
Venkataramanan, K., & Rajamohan, H. R. (2019). Emotion recognition from speech. arXiv preprint arXiv:1912.10458.‏
Mao, S., Tao, D., Zhang, G., Ching, P. C., & Lee, T. (2019). Revisiting hidden Markov models for speech emotion recognition. In ICASSP 2019—2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE.‏
Praseetha, V. M., & Joby, P. P. (2021). Speech emotion recognition using data augmentation. International Journal of Speech Technology, 66, 1–10.
Google Scholar
Zimmermann, M., Mehdipour Ghazi, M., Ekenel, H. K., & Thiran, J. P. (2016). Visual speech recognition using PCA networks and LSTMs in a tandem GMM-HMM system. In Asian conference on computer vision. Springer.‏
Vlassis, N., & Likas, A. (2002). A greedyEM algorithm for Gaussian mixture learning. Neural Processing Letters, 15(1), 77–87.
Article MATH Google Scholar
Patnaik, S. (2022). Speech emotion recognition by using complex MFCC and deep sequential model. Multimedia Tools and Applications, 66, 1–26.
Google Scholar
Zhang, J., Yin, Z., Chen, P., & Nichele, S. (2020). Emotion recognition using multimodal data and machine learning techniques: A tutorial and review. Information Fusion, 59, 103–126.
Article Google Scholar
Wang, C., Ren, Y., Zhang, N., Cui, F., & Luo, S. (2022). Speech emotion recognition based on multi feature and multi lingual fusion. Multimedia Tools and Applications, 81(4), 4897–4907.
Article Google Scholar
Mao, J.-W., He, Y., & Liu, Z.-T. (2018). Speech emotion recognition based on linear discriminant analysis and support vector machine decision tree. In 2018 37th Chinese control conference (CCC). IEEE.‏
Zhao, J. J., Ma, R. L., & Zhang, X. L. (2017). Speech emotion recognition based on decision tree and improved SVM mixed model. Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 37(4), 386–390.
Google Scholar
Jacob, A. (2017). Modelling speech emotion recognition using logistic regression and decision trees. International Journal of Speech Technology, 20(4), 897–905.
Article Google Scholar
Waghmare, V. B., Deshmukh, R. R., Shrishrimal, P. P., Janvale, G. B., & Ambedkar, B. (2014). Emotion recognition system from artificial marathi speech using MFCC and LDA techniques. In Fifth international conference on advances in communication, network, and computing—C.N.C.
Lingampeta, D., & Yalamanchili, B. (2020). Human emotion recognition using acoustic features with optimized feature selection and fusion techniques. In 2020 International conference on inventive computation technologies (ICICT). IEEE.‏
Kurpukdee, N., Koriyama, T., Kobayashi, T., Kasuriya, S., Wutiwiwatchai, C., & Lamsrichan, P. (2017). Speech emotion recognition using convolutional long short-term memory neural network and support vector machines. 2017 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE.‏
Butz, M. V. (2002). Anticipatory learning classifier systems, (Vol. 4). Springer.
MATH Google Scholar
Wang, Y., & Guan, L. (2004). An investigation of speech-based human emotion recognition. In IEEE 6th workshop on multimedia signal processing, 2004. IEEE.‏
Vryzas, N., Vrysis, L., Matsiola, M., Kotsakis, R., Dimoulas, C., & Kalliris, G. (2020). Continuous speech emotion recognition with convolutional neural networks. Journal of the Audio Engineering Society, 68(1/2), 14–24.
Article Google Scholar
Lieskovská, E., Jakubec, M., Jarina, R., & Chmulík, M. (2021). A review on speech emotion recognition using deep learning and attention mechanism. Electronics, 10(10), 1163.
Article Google Scholar
Badshah, A. M., Ahmad, J., Rahim, N., & Baik, S. W. (2017). Speech emotion recognition from spectrograms with deep convolutional neural network. In Proceedings of the international conference on platform technology service (pp. 1–5).
Khalil, R. A., Jones, E., Babar, M. I., Jan, T., Zafar, M. H., & Alhussain, T. (2019). Speech emotion recognition using deep learning techniques: A review. IEEE Access, 7, 117327–117345.
Article Google Scholar
Xie, Y., Liang, R., Liang, Z., Huang, C., Zou, C., & Schüller, B. (2019). Speech emotion classification using attention-based LSTM. IEEE/ACM Transactions on Audio Speech Language Processing, 27, 1675–1685.
Article Google Scholar
Zhao, J., Mao, X., & Chen, L. (2019). Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control, 47, 312–323.
Article Google Scholar
Qayyum, A. B. A., Arefeen, A., & Shahnaz, C. (2019). Convolutional neural network (CNN) based speech-emotion recognition. In 2019 IEEE international conference on signal processing, information, communication & systems (SPICSCON). IEEE.
Nam, Y., & Lee, C. (2021). Cascaded convolutional neural network architecture for speech emotion recognition in noisy conditions. Sensors, 21(13), 4399.
Article Google Scholar
Christy, A., Vaithyasubramanian, S., Jesudoss, A., & Praveena, M. A. (2020). Multimodal speech emotion recognition and classification using convolutional neural network techniques. International Journal of Speech Technology, 23(2), 381–388.
Article Google Scholar
Yao, Z., Wang, Z., Liu, W., Liu, Y., & Pan, J. (2020). Speech emotion recognition using fusion of three multi-task learning-based classifiers: HSF-DNN, MS-CNN and LLD-RNN. Speech Communication, 120, 11–19.
Article Google Scholar
Alghifari, M. F., Gunawan, T. S., & Kartiwi, M. (2018). Speech emotion recognition using deep feedforward neural network. Indonesian Journal of Electrical Engineering and Computer Science, 10(2), 554–561.
Article Google Scholar
Yadav, S. P., Zaidi, S., Mishra, A., & Yadav, V. (2022). Survey on machine learning in speech emotion recognition and vision systems using a recurrent neural network (RNN). Archives of Computational Methods in Engineering, 29(3), 1753–1770.
Article Google Scholar
Rejaibi, E., Komaty, A., Meriaudeau, F., Agrebi, S., & Othmani, A. (2022). MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. Biomedical Signal Processing and Control, 71, 103107.
Article Google Scholar
Zheng, H., & Yang, Y. (2019). An improved speech emotion recognition algorithm based on deep belief network. In 2019 IEEE international conference on power, intelligent computing and systems (ICPICS). IEEE.‏‏
Valiyavalappil Haridas, A., Marimuthu, R., Sivakumar, V. G., & Chakraborty, B. (2020). Emotion recognition of speech signal using Taylor series and deep belief network based classification. Evolutionary Intelligence, 66, 1–14.
Google Scholar
Huang, C., Gong, W., Fu, W., & Feng, D. (2014). A research of speech emotion recognition based on deep belief network and SVM. Mathematical Problems in Engineering, 6, 66.
Google Scholar
Poon-Feng, K., Huang, D. Y., Dong, M., & Li, H. (2014). Acoustic emotion recognition based on fusion of multiple feature-dependent deep Boltzmann machines. In The 9th international symposium on chinese spoken language processing. IEEE.‏
Bautista, J. L., Lee, Y. K., & Shin, H. S. (2022). Speech emotion recognition based on parallel CNN-attention networks with multi-fold data augmentation. Electronics, 11(23), 3935.
Article Google Scholar
Quck, W. Y., Huang, D. Y., Lin, W., Li, H., & Dong, M. (2016). Mobile acoustic emotion recognition. In 2016 IEEE region 10 conference (TENCON). IEEE.
Atmaja, B. T., & Akagi, M. (2019). Speech emotion recognition based on speech segment using LSTM with attention model. In 2019 IEEE international conference on signals and systems (ICSigSys). IEEE.‏
Abdelhamid, A. A., El-Kenawy, E. S., Alotaibi, B., Amer, G. M., Abdelkader, M. Y., Ibrahim, A., & Eid, M. M. (2022). Robust speech emotion recognition using CNN+ LSTM based on stochastic fractal search optimization algorithm. IEEE Access, 10, 49265–49284.
Article Google Scholar
Kaya, H., Fedotov, D., Yesilkanat, A., Verkholyak, O., Zhang, Y., & Karpov, A. (2018). LSTM based cross-corpus and cross-task acoustic emotion recognition. Interspeech.‏
Shami, M. T., & Kamel, M. S. (2005). Segment-based approach to the recognition of emotions in speech. In 2005 IEEE international conference on multimedia and expo. IEEE‏.
Sun, L., Huang, Y., Li, Q., & Li, P. (2022). Multi-classification speech emotion recognition based on two-stage bottleneck features selection and MCJD algorithm. Signal, Image and Video Processing, 66, 1–9.
Google Scholar
Wu, C.-H., & Liang, W.-B. (2010). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transactions on Affective Computing, 2(1), 10–21.
Google Scholar
Fierrez, J., Morales, A., Vera-Rodriguez, R., & Camacho, D. (2018). Multiple classifiers in biometrics. Part 1: Fundamentals and review. Information Fusion, 44, 57–64.
Article Google Scholar
Jahangir, R., Teh, Y. W., Hanif, F., & Mujtaba, G. (2021). Deep learning approaches for speech emotion recognition: State of the art and research challenges. Multimedia Tools and Applications, 80(16), 23745–23812.
Article Google Scholar
Song, P., Jin, Y., Zhao, L., & Xin, M. (2014). Speech emotion recognition using transfer learning. IEICE Transactions on Information and Systems, 97(9), 2530–2532.
Article Google Scholar
Basu, S., Chakraborty, J., Bag, A., & Aftabuddin, M. (2017). A review on emotion recognition using speech. In 2017 International conference on inventive communication and computational technologies (ICICCT). IEEE.
Jiang, W., Wang, Z., Jin, J. S., Han, X., & Li, C. (2019). Speech emotion recognition with heterogeneous feature unification of deep neural network. Sensors, 19(12), 2730.
Article Google Scholar
Zhao, Z., Zhao, Y., Bao, Z., Wang, H., Zhang, Z., & Li, C. (2018). Deep spectrum feature representations for speech emotion recognition. Proceedings of the joint workshop of the 4th workshop on affective social multimedia computing and first multimodal affective computing of large-scale multimedia data.
Anvarjon, T., & Kwon, S. (2020). Deep-net: A lightweight CNN-based speech emotion recognition system using deep frequency features. Sensors, 20(18), 5212.
Article Google Scholar
Lalitha, S., Geyasruti, D., Narayanan, R., & Shravani, M. (2015). Emotion detection using MFCC and Cepstrum features. Procedia Computer Science, 70, 29–35.
Article Google Scholar
Sun, L., & Fu, S. (2019). Wang F (2019) Decision tree SVM model with Fisher feature selection for speech emotion recognition. EURASIP Journal on Audio, Speech, and Music Processing, 1, 1–14.
Google Scholar
Yeh, J.-H., Pao, T.-L., Lin, C.-Y., Tsai, Y.-W., & Chen, Y.-T. (2011). Segment-based emotion recognition from continuous Mandarin Chinese speech. Computers in Human Behavior, 27(5), 1545–1552.
Article Google Scholar
Albornoz, E. M., Milone, D. H., & Rufiner, H. L. (2011). Spoken emotion recognition using hierarchical classifiers. Computer Speech & Language, 25(3), 556–570.
Article Google Scholar
Lim, W., Jang, D., & Lee, T. (2016). Speech emotion recognition using convolutional and recurrent neural networks. In 2016 Asia-Pacific signal and information processing association annual summit and conference (APSIPA). IEEE.
Huang, Z., Dong, M., Mao, Q., & Zhan, Y. (2014). Speech emotion recognition using CNN. In Proceedings of the 22nd ACM international conference on multimedia.‏
Grimm, M., Kroschel, K., Mower, E., & Narayanan, S. (2007). Primitives-based evaluation and estimation of emotions in speech. Speech Communications, 49(10–110), 787–800.
Article Google Scholar
Kwon, O. W., Chan, K., Hao, J., & Lee, T. W. (2003). Emotion recognition by speech signals. In Eighth European conference on speech communication and technology.‏
Alonso, J. B., Cabrera, J., Medina, M., & Travieso, C. M. (2015). New approach in quantification of emotional intensity from the speech signal: Emotional temperature. Expert Systems with Applications, 42(24), 9554–9564.
Article Google Scholar
Shukla, S., Dandapat, S., & Mahadeva Prasanna, S. R. (2016). A subspace projection approach for analysis of speech under stressed condition. Circuits, Systems, and Signal Processing, 35(12), 4486–4500.
Article MathSciNet Google Scholar
Mao, Q., Dong, M., Huang, Z., & Zhan, Y. (2014). Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia, 16(8), 2203–2213.
Article Google Scholar
Liu, G., He, W., & Jin, B. (2018). Feature fusion of speech emotion recognition based on deep learning. In 2018 International conference on network infrastructure and digital content (IC-NIDC). IEEE.‏
Lanjewar, R. B., Mathurkar, S., & Patel, N. (2015). Implementation and comparison of speech emotion recognition system using Gaussian mixture model (GMM) and K-nearest neighbor (K-NN) techniques. Procedia Computer Science, 49, 50–57.
Article Google Scholar
Shaw, A., Vardhan, R. K., & Saxena, S. (2016). Emotion recognition and classification in speech using artificial neural networks. International Journal of Computer Applications, 145(8), 5–9.
Article Google Scholar
Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., Mahjoub, M. A., Cleder, C. (2020). Automatic speech emotion recognition using machine learning. In Social media and machine learning. InTech.
Kumar, S., & Yadav, J. (2021). Emotion recognition in Hindi language using gender information, GMFCC, DMFCC and deep LSTM. In Journal of Physics: Conference Series 1950. No. 1. I.O.P. Publishing.
Rajisha, T. M., Sunija, A. P., & Riyas, K. S. (2016). Performance analysis of Malayalam language speech emotion recognition system using ANN/SVM. Procedia Technology, 24, 1097–1104.
Article Google Scholar
Kandali, A. B., Routray, A., & Basu, T. K. (2008). Emotion recognition from Assamese speeches using MFCC features and GMM classifier. In TENCON 2008—2008 IEEE region 10 conference. IEEE.
Liu, D., Chen, L., Wang, Z., & Diao, G. (2021). Speech expression multimodal emotion recognition based on deep belief network. Journal of Grid Computing, 19(2), 1–13.
Article Google Scholar
Sharma, S. (2021). Emotion recognition from speech using artificial neural networks and recurrent neural networks. In 2021 11th International conference on cloud computing, data science & engineering (confluence). IEEE.
Kwon, S. (2020). CLSTM: Deep feature-based speech emotion recognition using the hierarchical ConvLSTM network. Mathematics, 8(12), 2133.
Article Google Scholar

Download references

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Author information

Authors and Affiliations

Departement of Electronic and Communication, Faculty of Engineering, University of Kufa, Najaf, Iraq
Mohammed Jawad Al-Dujaili
Electrical Engineering Department Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
Abbas Ebrahimi-Moghadam

Authors

Mohammed Jawad Al-Dujaili
View author publications
You can also search for this author in PubMed Google Scholar
Abbas Ebrahimi-Moghadam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Mohammed Jawad Al-Dujaili or Abbas Ebrahimi-Moghadam.

Ethics declarations

Conflict of interest

We have no conflicts of interest to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Al-Dujaili, M.J., Ebrahimi-Moghadam, A. Speech Emotion Recognition: A Comprehensive Survey. Wireless Pers Commun 129, 2525–2561 (2023). https://doi.org/10.1007/s11277-023-10244-3

Download citation

Accepted: 20 February 2023
Published: 08 March 2023
Issue Date: April 2023
DOI: https://doi.org/10.1007/s11277-023-10244-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech Emotion Recognition: A Comprehensive Survey

Abstract

Access this article

Similar content being viewed by others

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Automatic speech recognition: a survey

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speech Emotion Recognition: A Comprehensive Survey

Abstract

Access this article

Similar content being viewed by others

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Automatic speech recognition: a survey

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation