Skip to main content
Log in

Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Speaker emotion recognition is achieved through processing methods that include isolation of the speech signal and extraction of selected features for the final classification. In terms of acoustics, speech processing techniques offer extremely valuable paralinguistic information derived mainly from prosodic and spectral features. In some cases, the process is assisted by speech recognition systems, which contribute to the classification using linguistic information. Both frameworks deal with a very challenging problem, as emotional states do not have clear-cut boundaries and often differ from person to person. In this article, research papers that investigate emotion recognition from audio channels are surveyed and classified, based mostly on extracted and selected features and their classification methodology. Important topics from different classification techniques, such as databases available for experimentation, appropriate feature extraction and selection methods, classifiers and performance issues are discussed, with emphasis on research published in the last decade. This survey also provides a discussion on open trends, along with directions for future research on this topic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Aigner M, Sachs G, Bruckmüller E, Winklbaur B, Zitterl W, Kryspin-Exner I, Gur R, Katschnig H (2007) Cognitive and emotion recognition deficits in obsessive–compulsive disorder. Psychiatr Res 149: 121–128

    Article  Google Scholar 

  • Anagnostopoulos CN, Iliou T (2010) Towards emotion recognition from speech: definition, problems and the materials of research. Stud Comput Intell 279: 127–143

    Article  Google Scholar 

  • Anagnostopoulos CN, Vovoli E (2010) Sound processing features for speaker-dependent and phrase-independent emotion recognition in Berlin Database. In: Papadopoulos GA, Wojtkowski W, Wojtkowski G, Wrycza S, Zupancic J (eds) Information systems development, pp 413–421

  • Ang J, Dhillon R, Shriberg E, Stolcke A (2002) Prosody-based automatic detection of annoyance and frustration in human–computer dialog. In: Proceedings of interspeech, pp 2037–2040

  • Atassi H, Esposito A (2008) A speaker independent approach to the classification of emotional vocal expressions. In: Proceedings of 20th IEEE international conference on tools with artificial intelligence, pp 147–152

  • Athanaselis T, Bakamidis S, Dologlou I, Cowie R, Douglas-Cowie E, Cox C (2005) ASR for emotional speech: clarifying the issues and enhancing performance. Neural Netw 18: 437–444

    Article  Google Scholar 

  • Batliner A, Fischer K, Huber R, Spilker J, Nolth E (2003) How to find trouble in communication. Speech Commun 40: 117–143

    Article  MATH  Google Scholar 

  • Batliner A, Steidl S, Schuller B, Seppi D, Laskowski K, Vogt T, Devillers L, Vidrascu L, Amir N, Kessous L, Aharonson V (2006) Combining efforts for improving automatic classification of emotional user states. In: Proceedings of 1st international language technologies conference, pp 240–245

  • Bogert B, Healy M, Tukey J (1963) The quefrency analysis of time series for echoes: cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking. In: Rosenblatt M (ed) Symposium on time series analysis. Wiley, New York, pp 209–243

  • Calder J, Lawrence AD, Young AW (2001) Neuropsychology of fear and loathing. Nat Rev Neurosci 2: 352–363

    Article  Google Scholar 

  • Cheng XM, Cheng PY, Zhao L (2009) A study on emotional feature analysis and recognition in speech signal. In: Proceedings of international conference on measuring technology and mechatronics automation, pp 418–420

  • Cen L, Ser W, Yu ZL (2008) Speech emotion recognition using canonical correlation analysis and probabilistic neural network. In: Proceedings of 7th international conference on machine learning and applications, pp 859–862

  • Cowie R, Douglas-Cowie E, Savvidou S, McMahon E, Sawey M, Schroder M (2000) FEELTRACE: an instrument for recording perceived emotion in real time. In: Proceedings of ISCA speech and emotion workshop, pp 19–24

  • Cowie R, Douglas-Cowie E, Cox C (2005) Beyond emotion archetypes: databases for emotion modelling using neural networks. Neural Netw 18: 371–388

    Article  Google Scholar 

  • Devillers L, Vasilescu I, Lamel L (2003) Emotion detection in task oriented spoken dialogs. In: Proceedings of IEEE multimedia human–machine interface and interaction conference, pp 549–552

  • Douglas-Cowie E, Campbell N, Cowie R, Roach P (2003) Emotional speech: towards a new generation of databases. Speech Commun 40: 33–60

    Article  MATH  Google Scholar 

  • Douglas-Cowie E, Cowie R, Sneddon I, Cox C, Lowry O, McRorie M, Martin JC, Devillers L, Abrilan S, Batliner A, Amir N, Karpouzis K (2007) The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In: Proceedings of international conference affective computing and intelligent interaction, pp 488–500

  • Dumouche P, Dehak N, Attabi Y, Dehak R, Boufaden N (2009) Cepstral and long-term features for emotion recognition. In: Proceedings of INTERSPEECH, pp 344–347

  • Fernandez R, Picard RW (2003) Modeling drivers’ speech under stress. Speech Communications, vol 40. Elsevier, pp 145–159

  • Firoz Shah A, Vimal Krishnan VR, Raji Sukumar A, Jayakumar A, Babu Anto P (2009) Speaker independent automatic emotion recognition from speech: a comparison of MFCCs and discrete wavelet transforms. In: Proceedings of international conference on advances in recent technologies in communication and computing, pp 528–531

  • Fontaine JRJ, Scherer KR, Roesch EB, Ellsworth PC (2010) The world of emotions is not two dimensional. Psychol Sci 18: 1050–1057

    Article  Google Scholar 

  • Forbes-Riley K, Litman DJ (2004) Predicting emotion in spoken dialogue from multiple knowledge sources. In: Proceedings of human language technology conference, North American chapter of the association computational linguistics (HLT/NAACL), pp 201–208

  • France DJ, Shivavi RG, Silverman S, Silverman M, Wilkes M (2000) Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans Biomed Eng, 7:829–837

    Google Scholar 

  • Fu L, Mao X, Chen L (2008a) Relative speech emotion recognition based artificial neural network. In: Proceedings of IEEE Pacific-Asia workshop on computational intelligence and industrial application, pp 140–144

  • Fu L, Mao X, Chen L (2008b) Speaker independent emotion recognition using HMMs fusion system with relative features. In: Proceedings of 1st international conference on intelligent networks and intelligent systems, pp 608–611

  • Giannakopoulos T, Pikrakis A, Theodoridis S (2009) A dimensional approach to emotion recognition of speech from movies. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 65–68

  • Graciarena M, Shriberg E, Stolcke A, Enos F, Hirschberg J, Kajarekar S (2006) Combining prosodic lexical and cepstral systems for deceptive speech detection. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 1033–1036

  • Hanjalic A (2006) Extracting moods from pictures and sounds: towards truly personalized TV. IEEE Signal Process Mag 23: 90–100

    Article  Google Scholar 

  • Hanjalic A, Xu LQ (2005) Affective video content representation and modeling. IEEE Trans Multimed 7: 143–154

    Article  Google Scholar 

  • Hoch S, Althoff F, McGlaun G, Rigoll G (2005) Bimodal fusion of emotional data in an automotive environment. In: Proceedings of international conference audio. Speech and Signal Processing, vol 2, pp 1085–1088

  • Hozjan V, Kacic Z (2006) Context-independent multilingual emotion recognition from speech signals. Int J Speech Technol 6: 311–320

    Article  Google Scholar 

  • Ijima Y, Tachibana M, Nose T, Kobayashi T (2009) Emotional speech recognition based on style estimation and adaptation with multiple-regression HMM. In: Proceedings of 2009 IEEE international conference on acoustics, speech and signal processing, pp 4157–4160

  • Iliou T, Anagnostopoulos C-N (2009) Comparison of different classifiers for emotion recognition. In: Proceedings of panhellenic conference in informatics, pp 102–106

  • Jin Y, Zhao Y, Huang C, Zhao L (2009) Study on the emotion recognition of whispered speech. In: Proceedings of global congress on intelligent systems, pp 242–246

  • Kockmann M, Burget L, Cernocky J (2009) Brno university of technology system for interspeech 2009 emotion challenge. In: Proceedings of INTERSPEECH, pp 348–351

  • Kostoulas TP, Fakotakis N (2006) A speaker dependent emotion recognition framework, CSNDSP. In: Proceedings of 5th international symposium computers, systems, networks and digital signal processing, pp 305–309

  • Kostoulas T, Ganchev T, Mporas I, Fakotakis N (2007) Detection of negative emotional states in real-world scenario. In: Proceedings of 19th IEEE international conference on tools with artificial intelligence, pp 502–509

  • Kostoulas T, Ganchev T, Lazaridis A, Fakotakis N (2010) Enhancing Emotion recognition from speech through feature selection. In: Sojka P, Horák A, Kopecek I, Pala K (eds) Text, speech and dialogue, lecture notes in artificial intelligence, vol 6231, pp 338–344

  • Kwon OW, Chan K, Hao J, Lee TW (2003) Emotion recognition by speech signals. In: Proceedings of Eurospeech conference, pp 125–128

  • Lee CM, Narayanan SS (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13: 293–303

    Article  Google Scholar 

  • Lee CM, Narayanan SS, Pieraccini R (2002) Combining acoustic and language information for emotion recognition. In: Proceedings of interspeech, pp 873–376

  • Lee CM, Yildirim S, Bulut M, Kazemzadeh A, Busso C, Deng Z, Lee SS, Narayanan S (2004) Emotion recognition based on phoneme classes. In: Proceedings of international conference spoken language processing, pp 205–211

  • Lee C, Mower E, Busso C, Lee S, Narayanan S (2009) Emotion recognition using a hierarchical binary decision tree approach. In: Proceedings of INTERSPEECH, pp 320–323

  • Litman DJ, Forbes-Riley K (2004) Predicting student emotions in computer-human tutoring dialogues In: Proceedings of 42nd annual meeting on association for computational linguistics

  • Luengo I, Navas E, Hernaez I (2010) Feature analysis and evaluation for automatic emotion identification in speech. IEEE Trans Multimed 12: 490–501

    Article  Google Scholar 

  • Lugger M, Yang B (2007a) An incremental analysis of different feature groups in speaker independent emotion recognition. In: Proceedings of international congress phonetic sciences, pp 2149–2152

  • Lugger M, Yang B (2007b) The relevance of voice quality features in speaker independent emotion recognition. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 17–20

  • Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge

  • Mao X, Chen L, Fu L (2009) Multi-level speech emotion recognition based on HMM and ANN. In: Proceedings of world congress on computer science and information engineering, pp 225–229

  • Matos S, Birring SS, Pavord ID, Evans DH (2006) Detection of cough signals in continuous audio recordings Using HMM. IEEE Trans Biomed Eng 53: 1078–1083

    Article  Google Scholar 

  • Mishra HK, Sekhar CC (2009) Variational gaussian mixture models for speech emotion recognition. In: Proceedings of 7th international conference on advances in pattern recognition, pp 183–186

  • Morrison D, Wang R, Silva LCD (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Commun 49: 98–112

    Article  Google Scholar 

  • Navas E, Hernáez I, Luengo I (2006) An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS. IEEE Trans Audio Speech Lang Process 14: 1117–1127

    Article  Google Scholar 

  • Neiberg D, Elenius K, Laskowski K (2006) Emotion recognition in spontaneous speech using GMMs. In: Proceedings of INTERSPEECH conference, pp 809–812

  • Nogueiras A, Moreno A, Bonafonte A, Mariño JB (2001) Speech emotion recognition using Hidden Markov models. In: Proceedings of EUROSPEECH, pp 2679–2682

  • Nwe TL, Foo SW, De Silva LC (2003) Classification of stress in speech using linear and nonlinear features. In: Proceedings of IEEE international conference acoustics, speech, and signal processing, pp 9–12

  • Ortony A, Clore G, Collins A (1988) The cognitive structure of emotions. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Pal P, Iyer AN, Yantorno RE (2006) Emotion detection from infant facial expressions and cries. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, pp 721–724

  • Pao TL, Liao WY, Chen YT, Yeh JH, Cheng YM, Chien CS (2007a) Comparison of several classifiers for emotion recognition from noisy mandarin speech. In: Proceedings of 3rd international conference on international information hiding and multimedia signal processing, pp 23–26

  • Pao TL, Chien CS, Chen YT, Yeh JH, Cheng YM, Liao WY (2007b) Combination of multiple classifiers for improving emotion recognition in Mandarin speech. In: Proceedings of 3rd international conference on international information hiding and multimedia signal processing, pp 35–38

  • Petridis S, Pantic M (2008) Audiovisual discrimination between laughter and speech. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, pp 5117–5120

  • Rong J, Chen YPP, Chowdhury M, Li G (2007) Acoustic features extraction for emotion recognition. In: Proceedings 6th IEEE/ACIS international conference on computer and information science, pp 419–424

  • Russell JA, Weiss A, Mendelsohn GA (1989) Affect Grid: a single-item scale of pleasure and arousal. J Pers Soc Psychol 57: 493–502

    Article  Google Scholar 

  • Russell JA, Bachorowski J, Fernandez-Dols J (2003) Facial and vocal expressions of emotion. Annu Revis Psychol 54: 329–349

    Article  Google Scholar 

  • Schroder M (2003) Experimental study of affect bursts. Speech Commun 40:99–116

    Google Scholar 

  • Schuller B, Rigoll G (2009) Recognising interest in conversational speech–comparing bag of frames and supra-segmental features. In: Proceedings of INTERSPEECH, pp 1999–2002

  • Schuller B, Rigoll G, Lang M (2003) Hidden Markov model-based speech emotion recognition. In: Proceedings of international conference on multimedia and expo, pp 401–404

  • Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: Proceedings of IEEE international conference acoustics, speech, and signal processing, pp. 577–580

  • Schuller B, Muller R, Lang M, Rigoll G (2005a) Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: Proceedings of 9th Eurospeech–Interspeech, pp 805–809

  • Schuller B, Villar RJ, Rigoll G, Lang M (2005b) Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 325–328

  • Schuller B, Reiter S, Mueller R, Al-Hames M, Lang M, Rigoll G (2005c) Speaker-independent speech emotion recognition by ensemble classification. In: Proceedings international conference on multimedia and expo, pp 864–867

  • Schuller B, Reiter S, Rigoll G (2006) Evolutionary feature generation in speech emotion recognition. In: Proceedings 2006 IEEE international conference on multimedia and expo, pp 5–8

  • Schuller B, Batliner A, Seppi D, Steidl S, Vogt T, Wagner J, Devillers L, Vidrascu L, Amir N, Kessous L, Aharonson V (2007) The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: Proceedings of INTERSPEECH, pp 2253–2256

  • Schuller B, Müller R, Eyben F, Gast J, Hörnler B, Wöllmer M, Rigoll G, Höthker A, Konosu H (2009) Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image Vis Comput 27: 1760–1774

    Article  Google Scholar 

  • Schuller B, Wollmer M, Eyben F, Rigoll G (2009) The role of prosody in affective speech. Peter Lan Publishing Group, Bern

    Google Scholar 

  • Schuller B, Batliner A, Steidl S, Seppi D (2009c) Emotion recognition from speech: putting ASR in the loop. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 4585–4588

  • Schuller B, Schenk J, Rigoll G, Knaup T (2009d) “The Godfather” vs. “Chaos”: comparing linguistic analysis based on on-line knowledge sources and bags-of-n-grams for movie review valence estimation. In: Proceedings of 10th international conference on document analysis and recognition, pp 858-862

  • Schuller B, Steidl S, Batliner A (2009e) The INTERSPEECH 2009 emotion challenge. In: Proceedings of INTERSPEECH, pp 312–315

  • Schuller B, Vlasenko B, Eyben F, Wollmer M, Stuhlsatz A, Wendemuth A, Rigoll G (2010) Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans Affect Comput 1: 119–131

    Article  Google Scholar 

  • Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun 53: 1062–1087

    Article  Google Scholar 

  • Shami MT, Kamel MS (2005) Segment-based approach to the recognition of emotions in speech. In: Proceedings of IEEE international conference on multimedia and expo, pp 4–7

  • Stuhlsatz A, Meyer C, Eyben F, Zielke T, Meier G, Schuller B (2011) Deep neural networks for acoustic emotion recognition: raising the benchmarks. In: Proceedings international conference on acoustics speech and signal processing, pp 5688–5691

  • Sidorova J (2007) Speech emotion recognition. Ph.D. Thesis, Universitat Pompeu Fabra, Barcelona

  • Vlasenko B, Schuller B, Wendemut A, Rigoll G, Frame vs (2007) Turn-level: emotion recognition from speech considering static and dynamic processing. In: Proceedings 2nd international conference on affective computing and intelligent interaction, pp 139–147

  • Vogt T, André E (2005) Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In: Proceedings IEEE international conference on multimedia and expo, pp 474–477

  • Vogt T, André E (2006) Improving automatic emotion recognition from speech via gender differentiation. In: Proceedings of language resources and evaluation conference, pp 1123–1126

  • Vogt T, André E (2009) Exploring the benefits of discretization of acoustic features for speech emotion recognition. In: Proceedings 10th INTERSPEECH conference, pp 328–331

  • Wagner J, Kim NJ, Andre E (2005) From physiological signals to emotions: implementing and comparing selected methods for feature extraction and classification. In: Proceedings of IEEE international conference multimedia and expo, pp 940–943

  • Wang Y, Du S, Zhan Y (2008) Adaptive and optimal classification of speech emotion recognition. In: Proceedings of 4th international conference on natural computation, pp 407–411

  • Wang S, Ling X, Zhang F, Tong J (2010) Speech emotion recognition based on principal component analysis and back propagation neural network. In: Proceedings of international conference on measuring technology and mechatronics automation, pp 437–440

  • Wenjing H, Haifeng L, Chunyu G (2009) A hybrid speech emotion perception method of VQ-based feature processing and ANN recognition. In: Proceedings of global congress on intelligent systems, pp 145–149

  • Wierzbicka A (1999) Emotions across languages and cultures: diversity and universals. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Wu CH, Liang WB (2011) Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Trans Affect Comput 2: 10–21

    Article  Google Scholar 

  • Wu CH, Chuang ZJ, Lin YC (2006) Emotion recognition from text using semantic label and separable mixture model. ACM Trans Asian Lang Inf Process 5: 165–182

    Article  Google Scholar 

  • Wu S, Falk TH, Chan WY (2009) Automatic recognition of speech emotion using long-term spectro-temporal features. In: Proceedings of 16th international conference on digital signal processing

  • Yang C, Ji L, Liu G (2009a) Study to speech emotion recognition based on TWINsSVM. In: Proceedings of 5th international conference on natural computation, pp 312–316

  • Yang T, Yang J, Bi F (2009b) Emotion statuses recognition of speech signal using intuitionistic fuzzy set. In: Proceedings of world congress on software engineering, pp 204–207

  • You M, Chen C, Bu J, Liu J, Tao J (2006) Emotional speech analysis on nonlinear manifold. In: Proceedings of 18th international conference on pattern recognition, pp 91–94

  • Yu W (2008) Research and implementation of emotional feature classification and recognition in speech signal. In: Proceedings of international symposium on intelligent information technology application, pp 471–474

  • Yun S, Yoo CD, (2009) Speech emotion recognition via a max-margin framework incorporating a loss function based on the Watson and Tellegen’s emotion model. In: Proceedings IEEE international conference on acoustics, speech and signal processing, pp 4169–4172

  • Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31: 39–58

    Article  Google Scholar 

  • Zhou Y, Zhang J, Wang L, Yan Y (2009a) Emotion recognition and conversion for mandarin speech. In: Proceedings of 6th international conference on fuzzy systems and knowledge discovery, pp 179–183

  • Zhou Y, Sun Y, Yang L, Yan Y (2009b) Applying articulatory features to speech emotion recognition. In: Proceedings of international conference on research challenges in computer science, pp 73–76

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christos-Nikolaos Anagnostopoulos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Anagnostopoulos, CN., Iliou, T. & Giannoukos, I. Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artif Intell Rev 43, 155–177 (2015). https://doi.org/10.1007/s10462-012-9368-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-012-9368-5

Keywords

Navigation