Skip to main content
Log in

Emotion recognition from speech: a review

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Emotion recognition from speech has emerged as an important research area in the recent past. In this regard, review of existing work on emotional speech processing is useful for carrying out further research. In this paper, the recent literature on speech emotion recognition has been presented considering the issues related to emotional speech corpora, different types of speech features and models used for recognition of emotions from speech. Thirty two representative speech databases are reviewed in this work from point of view of their language, number of speakers, number of emotions, and purpose of collection. The issues related to emotional speech databases used in emotional speech recognition are also briefly discussed. Literature on different features used in the task of emotion recognition from speech is presented. The importance of choosing different classification models has been discussed along with the review. The important issues to be considered for further emotion recognition research in general and in specific to the Indian context have been highlighted where ever necessary.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  • Alpert, M., Pouget, E. R., & Silva, R. R. (2001). Reflections of depression in acoustic measures of the patient’s speech. Journal of Affective Disorders, 66, 59–69.

    Article  Google Scholar 

  • Ambrus, D. C. (2000). Collecting and recording of an emotional speech database. Tech. rep., Faculty of Electrical Engineering, Institute of Electronics, Univ. of Maribor.

  • Ananthapadmanabha, T. V., & Yegnanarayana, B. (1979). Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27, 309–319.

    Article  Google Scholar 

  • Atal, B. S. (1972). Automatic speaker recognition based on pitch contours. The Journal of the Acoustical Society of America, 52(6), 1687–1697.

    Article  Google Scholar 

  • Ayadi, M. E., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognition, 44, 572–587.

    Article  MATH  Google Scholar 

  • Bajpai, A., & Yegnanarayana, B. (2004). Exploring features for audio clip classification using LP residual and AANN models. In The international conference on intelligent sensing and information processing 2004 (ICISIP 2004), Chennai, India, Jan. 2004 (pp. 305–310).

    Chapter  Google Scholar 

  • Bajpai, A., & Yegnanarayana, B. (2008). Combining evidence from sub-segmental and segmental features for audio clip classification. In IEEE region 10 conference TENCON, India, Nov. 2008 (pp. 1–5). IIIT, Hyderabad.

    Google Scholar 

  • Banziger, T., & Scherer, K. R. (2005). The role of intonation in emotional expressions. Speech Communication, 46, 252–267.

    Article  Google Scholar 

  • Bapineedu, G., Avinash, B., Gangashetty, S. V., & Yegnanarayana, B. (2009). Analysis of lombard speech using excitation source information. In INTERSPEECH-09, Brighton, UK, 6–10 September 2009 (pp. 1091–1094).

    Google Scholar 

  • Batliner, A., Buckow, J., Niemann, H., Noth, E., & Warnke, V. (2000). Verbmobile Foundations of speech to speech translation. Berlin: Springer.

    Google Scholar 

  • Batliner, A., Hacker, C., Steidl, S., Noth, E., Archy, D. S., Russell, M., & Wong, M. (2004). You stupid tin box children interacting with the Aibo robot: a cross-linguistic emotional speech corpus. In Proc. language resources and evaluation (LREC 04), Lisbon.

    Google Scholar 

  • Batliner, A., Biersacky, S., & Steidl, S. (2006). The prosody of pet robot directed speech: Evidence from children. In Speech prosody 2006, Dresden (pp. 1–4).

    Google Scholar 

  • Benesty, J., Sondhi, M. M., & Huang, Y. (Eds.) (2008). Springer handbook on speech processing. Berlin: Springer.

    Google Scholar 

  • Bitouk, D., Verma, R., & Nenkova, A. (2010, in press). Class-level spectral features for emotion recognition. Speech Communication.

  • Bozkurt, E., Erzin, E., Erdem, C. E., & Erdem, A. T. (2009). Improving automatic emotion recognition from speech signals. In 10th annual conference of the international speech communication association (interspeech), Brighton, UK, Sept. 6–10, 2009 (pp. 324–327).

    Google Scholar 

  • Burkhardt, F., & Sendlmeier, W. F. (2000). Verification of acoustical correlates of emotional speech using formant synthesis. In ITRW on speech and emotion, Newcastle, Northern Ireland, UK, Sept. 2000 (pp. 151–156).

    Google Scholar 

  • Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Interspeech.

    Google Scholar 

  • Cahn, J. E. (1990). The generation of affect in synthesized speech. In JAVIOS, Jul. 1990 (pp. 1–19).

    Google Scholar 

  • Caldognetto, E. M., Cosi, P., Drioli, C., Tisato, G., & Cavicchio, F. (2004). Modifications of phonetic labial targets in emotive speech: effects of the co-production of speech and emotions. Speech Communication, 44(1–4), 173–185.

    Article  Google Scholar 

  • Charles, F., Pizzi, D., Cavazza, M., Vogt, T., & Andr, E. (2009). Emoemma: Emotional speech input for interactive story telling. In Decker, Sichman, Sierra, & Castelfranchi (Eds.), 8th int. conf. on autonomous agents and multiagent systems (AAMAS 2009), Budapest, Hungary, May 2009 (pp. 1381–1382).

    Google Scholar 

  • Chauhan, A., Koolagudi, S. G., Kafley, S., & Rao, K. S. (2010). Emotion recognition using lp residual. In IEEE TechSym 2010, West Bengal, India, April 2010. IIT Kharagpur: IEEE.

    Google Scholar 

  • Cowie, R., & Cornelius, R. R. (2003). Describing the emotional states that are expressed in speech. Speech Communication, 40, 5–32.

    Article  MATH  Google Scholar 

  • Cowie, R., & Douglas-Cowie, E. (1996). Automatic statistical analysis of the signal and prosodic signs of emotion in speech. In Fourth international conference on spoken language processing ICSLP 96, Philadelphia, PA, USA, October 1996 (pp. 1989–1992).

    Chapter  Google Scholar 

  • Cummings, K. E., & Clements, M. A. (1995). Analysis of the glottal excitation of emotionally styled and stressed speech. The Journal of the Acoustical Society of America, 98, 88–98.

    Article  Google Scholar 

  • Dellaert, F., Polzin, T., & Waibel, A. (1996). Recognising emotions in speech. In ICSLP 96, Oct. 1996.

    Google Scholar 

  • Dellert, F., Polzin, T., & Waibel, A. (1996). Recognizing emotion in speech. In 4th international conference on spoken language processing, Philadelphia, PA, USA, Oct. 1996 (pp. 1970–1973).

    Chapter  Google Scholar 

  • Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases. Speech Communication, 40, 33–60.

    Article  MATH  Google Scholar 

  • Edgington, M. (1997). Investigating the limitations of concatenative synthesis. In European conference on speech communication and technology (Eurospeech 97), Rhodes/Athens, Greece, 1997 (pp. 593–596).

    Google Scholar 

  • Fernandez, R., & Picard, R. W. (2003). Modeling drivers’ speech under stress. Speech Communication, 40, 145–159.

    Article  MATH  Google Scholar 

  • France, D. J., Shiavi, R. G., Silverman, S., Silverman, M., & Wilkes, M. (2000). Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Transactions on Biomedical Engineering, 47(7), 829–837.

    Article  Google Scholar 

  • Gobl, C., & Chasaide, A. (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Communication, 40, 189–212.

    Article  MATH  Google Scholar 

  • Gonzalez, G. M. (1999). Bilingual computer-assisted psychological assessment: an innovative approach for screening depression in Chicanos/Latinos. Tech. report-39, Univ. Michigan.

  • Grimm, M., Kroschel, K., & Narayanan, S. (2008). The Vera am Mittag German audio-visual emotional speech database. In IEEE international conference multimedia and expo, Hanover, Apr. 2008 (pp. 865–868).

    Google Scholar 

  • Hansen, J., & Cairns, D. (1995). Icarus: source generator based real-time recognition of speech in noisy stressful and lombard effect environments. Speech Communication, 16(4), 391–422.

    Article  Google Scholar 

  • Hoque, M. E., Yeasin, M., & Louwerse, M. M. (2006). Robust recognition of emotion from speech. In Intelligent virtual agents. Lecture notes in computer science (pp. 42–53). Berlin: Springer.

    Chapter  Google Scholar 

  • Hua, L. Z., Yu, H., & Hua, W. R. (2005). A novel source analysis method by matching spectral characters of LF model with STRAIGHT spectrum. Berlin: Springer.

    Google Scholar 

  • I, A. I., & Scordilis, M. S. (2001). Spoken emotion recognition using glottal symmetry. EURASIP Journal on Advances in Signal Processing, 1(11).

  • Iida, A., Campbell, N., Higuchi, F., & Yasumura, M. (2003). A corpus-based speech synthesis system with emotion. Speech Communication, 40, 161–187.

    Article  MATH  Google Scholar 

  • Iliev, A. I., Scordilis, M. S., Papa, J. P., & Falco, A. X. (2010). Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech and Language, 24(3), 445–460.

    Article  Google Scholar 

  • Iliou, T., & Anagnostopoulos, C. N. (2009). Statistical evaluation of speech features for emotion recognition. In Fourth international conference on digital telecommunications, Colmar, France, July 2009 (pp. 121–126).

    Chapter  Google Scholar 

  • Iriondo, I., Guaus, R., Rodrguez, A., Lzaro, P., Montoya, N., Blanco, J. M., Bernadas, D., Oliver, J. M., Tena, D., & Longhi, L. (2000). Validation of an acoustical modeling of emotional expression in Spanish using speech synthesis techniques. In ITRW on speech and emotion, New Castle, Northern Ireland, UK, Sept. 2000.

    Google Scholar 

  • Kamaruddin, N., & Wahab, A. (2009). Features extraction for speech emotion. Journal of Computational Methods in Science and Engineering, 9(9), 1–12.

    MATH  Google Scholar 

  • Kao, Y. H., & Lee, L. S. (2006). Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language. In INTERSPEECH -ICSLP, Pittsburgh, Pennsylvania, Sept. 2006 (pp. 1814–1817).

    Google Scholar 

  • Kodukula, S. R. M. (2009). Significance of excitation source information for speech analysis. PhD thesis, Dept. of Computer Science, IIT, Madras.

  • Koolagudi, S. G., & Rao, K. S. (2010). Real life emotion classification using VOP and pitch based spectral features. In INDICON, (Kolkata, INDIA), Jadavpur University. New York: IEEE Press.

    Google Scholar 

  • Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: speech database for emotion analysis. Communications in computer and information science, LNCS. Berlin: Springer.

    Google Scholar 

  • Koolagudi, S. G., Reddy, R., & Rao, K. S. (2010). Emotion recognition from speech signal using epoch parameters. In International conference on signal processing and communications (SPCOM), IISc, Bangalore, India, July 2010 (pp. 1–5). New York: IEEE Press.

    Chapter  Google Scholar 

  • Kumar, K. S., Reddy, M. S. H., Murty, K. S. R., & Yegnanarayana, B. (2009). Analysis of laugh signals for detecting in continuous speech. In INTERSPEECH-09, Brighton, UK, 6–10 September 2009 (pp. 1591–1594).

    Google Scholar 

  • Kwon, O., Chan, K., Hao, J., & Lee, T. (2003). Emotion recognition by speech signals. In Eurospeech, Geneva (pp. 125–128).

    Google Scholar 

  • Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Audio, Speech, and Language Processing, 13, 293–303.

    Article  Google Scholar 

  • Lee, C. M., Narayanan, S., & Pieraccini, R. (2001). Recognition of negative emotion in the human speech signals. In Workshop on automatic speech recognition and understanding, Dec. 2001.

    Google Scholar 

  • Liu, J. H. L., & Palm, G. (1997). On the use of features from prediction residual signal in speaker recognition. In European conf. speech processing and technology (EUROSPEECH) (pp. 313–316).

    Google Scholar 

  • Luengo, I., Navas, E., Hernez, I., & Snchez, J. (2005). Automatic emotion recognition using prosodic parameters. In INTERSPEECH, Lisbon, Portugal, Sept. 2005 (pp. 493–496).

    Google Scholar 

  • Lugger, M., & Yang, B. (2007). The relevance of voice quality features in speaker independent emotion recognition. In ICASSP, Honolulu, Hawaii, USA, May 2007 (pp. IV17–IV20). New York: IEEE Press.

    Google Scholar 

  • Makarova, V., & Petrushin, V. A. (2002). RUSLANA: A database of Russian emotional utterances. In International conference on spoken language processing (ICSLP 02) (pp. 2041–2044).

    Google Scholar 

  • Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580.

    Article  Google Scholar 

  • McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., & Stroeve, S. (2000). Approaching automatic recognition of emotion from voice: A rough benchmark. In ISCA workshop on speech and emotion, Belfast.

    Google Scholar 

  • McMahon, E., Cowie, R., Kasderidis, S., Taylor, J., & Kollias, S. (2003). What chance that a DC could recognize hazardous mental states from sensor inputs? In Tales of the disappearing computer, Santorini, Greece.

    Google Scholar 

  • Montro, J. M., Gutterrez-Arriola, J., Colas, J., Enriquez, E., & Pardo, J. M. (1999). Analysis and modeling of emotional speech in Spanish. In Proc. int. conf. on phonetic sciences (pp. 957–960).

    Google Scholar 

  • Mubarak, O. M., Ambikairajah, E., & Epps, J. (2005). Analysis of an MFCC-based audio indexing system for efficient coding of multimedia sources. In 8th international symposium on signal processing and its applications, Sydney, Australia, Aug. 2005.

    Google Scholar 

  • Murray, I. R., & Arnott, J. L. (1995). Implementation and testing of a system for producing emotion by rule in synthetic speech. Speech Communication, 16, 369–390.

    Article  Google Scholar 

  • Murray, I. R., Arnott, J. L., & Rohwer, E. A. (1996). Emotional stress in synthetic speech: Progress and future directions. Speech Communication, 20, 85–91.

    Article  Google Scholar 

  • Nakatsu, R., Nicholson, J., & Tosa, N. (2000). Emotion recognition and its application to computer agents with spontaneous interactive capabilities. Knowledge-Based Systems, 13, 497–504.

    Article  Google Scholar 

  • Neiberg, D., Elenius, K., & Laskowski, K. (2006). Emotion recognition in spontaneous speech using GMMs. In INTERSPEECH 2006 - ICSLP, Pittsburgh, Pennsylvania, 17–19 September 2006 (pp. 809–812).

    Google Scholar 

  • Nicholson, J., Takahashi, K., & Nakatsu, R. (1999). Emotion recognition in speech using neural networks. In 6th international conference on neural information processing (ICONIP-99), Perth, WA, Australia, Aug. 1999 (pp. 495–501).

    Google Scholar 

  • Nicholson, J., Takahashi, K., & Nakatsu, R. (2000). Emotion recognition in speech using neural networks. Neural Computing & Applications, 11, 290–296.

    Article  Google Scholar 

  • Nordstrand, M., Svanfeldt, G., Granstrom, B., & House, D. (2004). Measurements of articulatory variation in expressive speech for a set of Swedish vowels. Speech Communication, 44, 187–196.

    Article  Google Scholar 

  • Nwe, T. L., Foo, S. W., & Silva, L. C. D. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.

    Article  Google Scholar 

  • O’Shaughnessy, D. (1987). Speech communication human and machine. Reading: Addison-Wesley.

    Google Scholar 

  • Oudeyer, P. Y. (2003). The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies, 59, 157–183.

    Article  Google Scholar 

  • Pao, T. L., Chen, Y. T., Yeh, J. H., & Liao, W. Y. (2005). Combining acoustic features for improved emotion recognition in Mandarin speech. In J. Tao, T. Tan, & R. Picard (Eds.), ACII. LNCS (pp. 279–285). Berlin: Springer.

    Google Scholar 

  • Pao, T. L., Chen, Y. T., Yeh, J. H., Cheng, Y. M., & Chien, C. S. (2007). LNCS: Vol. 4738. Feature combination for better differentiating anger from neutral in Mandarin emotional speech. Berlin: Springer.

    Google Scholar 

  • Pereira, C. (2000). Dimensions of emotional meaning in speech. In Proc. ISCA workshop on speech and emotion, Belfast, Northern Ireland, 2000 (pp. 25–28).

    Google Scholar 

  • Petrushin, V. (1999). Emotion in speech: recognition and application to call centres. Artificial neural networks in engineering (ANNIE).

  • Petrushin, V. A. (1999). Emotion in speech: Recognition and application to call centers. In Proceedings of the 1999 conference on artificial neural networks in engineering (ANNIE 99).

    Google Scholar 

  • Petrushin, V. A. (2000). Emotion recognition in speech signal: Experimental study, development and application. In Proc. int. conf. spoken language processing, Beijing, China.

    Google Scholar 

  • Polzin, T., & Waibel, A. (2000). Emotion sensitive human computer interfaces. In ISCA workshop on speech and emotion, Belfast, 2000 (pp. 201–206).

    Google Scholar 

  • Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Rahurkar, M., & Hansen, J. H. L. (2002). Frequency band analysis for stress detection using a Teager energy operator based feature. In Proc. int. conf. on spoken language processing (ICSLP’02) (pp. 2021–2024).

    Google Scholar 

  • Rao, K. S., & Yegnanarayana, B. (2006). Prosody modification using instants of significant excitation. IEEE Transactions on Audio, Speech, and Language Processing, 14, 972–980.

    Article  Google Scholar 

  • Rao, K. S., Prasanna, S. R. M., & Sagar, T. V. (2007a). Emotion recognition using multilevel prosodic information. In Workshop on image and signal processing (WISP-2007), Guwahati, India, Dec. 2007. Guwahati: IIT Guwahati.

    Google Scholar 

  • Rao, K. S., Prasanna, S. R. M., & Yegnanarayana, B. (2007b). Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Processing Letters, 14, 762–765.

    Article  Google Scholar 

  • Rao, K. S., Reddy, R., Maity, S., & Koolagudi, S. G. (2010). Characterization of emotions using the dynamics of prosodic features. In International conference on speech prosody, Chicago, USA, May 2010.

    Google Scholar 

  • Sagar, T. V. (2007). Characterisation and synthesis of emotions in speech using prosodic features. Master’s thesis, Dept. of Electronics and communications Engineering, Indian Institute of Technology Guwahati.

  • Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40, 227–256.

    Article  MATH  Google Scholar 

  • Scherer, K. R., Grandjean, D., Johnstone, L. T., & Klasmeyer, T. B. G. (2002). Acoustic correlates of task load and stress. In International conference on spoken language processing ICSLP 02, Colorado, 2002 (pp. 2017–2020).

    Google Scholar 

  • Schroder, M. (2001). Emotional speech synthesis: A review. In Seventh European conference on speech communication and technology, Eurospeech Aalborg, Denmark, Sept. 2001.

    Google Scholar 

  • Schroder, M. (2003). Experimental study of affect bursts. Speech Communication, 40(1–2). Special issue on speech and emotion.

  • Schroder, M., & Cowie, R. (2006). Issues in emotion-oriented computing toward a shared understanding. In Workshop on emotion and computing (HUMAINE).

    Google Scholar 

  • Schroder, M., & Grice, M. (2003). Expressing vocal effort in concatenative synthesis. In International conference on phonetic sciences ICPhS 03, Barcelona.

    Google Scholar 

  • Schroder, M., Cowie, R., Douglas-Cowie, E., Westerdijk, M., & Gielen, S. (2001). Acoustic correlates of emotion dimensions in view of speech synthesis. In 7th European conference on speech communication and technology, Aalborg, Denmark, Sept. 2001.

    Google Scholar 

  • Schuller, B., Rigoll, G., & Lang, M. (2004). Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In Proc. IEEE int. conf. acoust., speech, signal processing (pp. 577–580). New York: IEEE Press.

    Google Scholar 

  • Seshadri, G. P., & Yegnanarayana, B. (2009). Perceived loudness of speech based on the characteristics of glottal excitation source. The Journal of the Acoustical Society of America, 126, 2061–2071.

    Article  Google Scholar 

  • Sigmund, M. (2007). Spectral analysis of speech under stress. International Journal of Computer Science and Network Security, 7, 170–172.

    Google Scholar 

  • Slaney, M., & McRoberts, G. (2003). BabyEars: a recognition system for affective vocalizations. Speech Communication, 39, 367–384.

    Article  MATH  Google Scholar 

  • Tato, R., Santos, R., & Pardo, R. K. J. (2002). Emotional space improves emotion recognition. In 7th international conference on spoken language processing, Denver, Colorado, USA, Sept. 16–20, 2002.

    Google Scholar 

  • Thevenaz, P., & Hugli, H. (1995). Usefulness of LPC residue in textindependent speaker verification. Speech Communication, 17, 145–157.

    Article  Google Scholar 

  • Tischer, B. (1995). Acoustic correlates of perceived emotional stress.

  • Ververidis, D., & Kotropoulos, C. (2006). A state of the art review on emotional speech databases. In Eleventh Australasian international conference on speech science and technology, Auckland, New Zealand, Dec. 2006.

    Google Scholar 

  • Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48, 1162–1181.

    Article  Google Scholar 

  • Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In ICASSP (pp. I593–I596). New York: IEEE Press.

    Google Scholar 

  • Vidrascu, L., & Devillers, L. (2005). Real-life emotion representation and detection in call centers data. In J. Tao, T. Tan, & R. Picard (Eds.), LNCS: Vol. 3784. ACII (pp. 739–746). Berlin: Springer.

    Google Scholar 

  • Wakita, H. (1976). Residual energy of linear prediction to vowel and speaker recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24, 270–271.

    Article  Google Scholar 

  • Wang, Y., & Guan, L. (2004). An investigation of speech-based human emotion recognition. In IEEE 6th workshop on multimedia signal processing (pp. 15–18). New York: IEEE Press.

    Chapter  Google Scholar 

  • Wang, Y., Du, S., & Zhan, Y. (2008). Adaptive and optimal classification of speech emotion recognition. In Fourth international conference on natural computation, Oct. 2008 (pp. 407–411).

    Chapter  Google Scholar 

  • Werner, S., & Keller, E. (1994). Prosodic aspects of speech. In E. Keller (Ed.), Fundamentals of speech synthesis and speech recognition: basic concepts, state of the art, the future challenges (pp. 23–40). Chichester: Wiley.

    Google Scholar 

  • Williams, C., & Stevens, K. (1972). Emotions and speech: some acoustical correlates. The Journal of the Acoustical Society of America, 52(4), 1238–1250.

    Article  Google Scholar 

  • Williams, C. E., & Stevens, K. N. (1981). Vocal correlates of emotional states. Speech Evaluation in Psychiatry, 189–220.

  • Wu, C. H., Chuang, Z. J., & Lin, Y. C. (2006). Emotion recognition from text using semantic labels and separable mixture models. ACM transactions on Asian language information processing (TALIP), 5, 165–182.

    Article  Google Scholar 

  • Wu, S., Falk, T. H., & Chan, W. Y. (2009). Automatic recognition of speech emotion using long-term spectro-temporal features. In 16th international conference on digital signal processing, Santorini-Hellas, 5–7 July 2009 (pp. 1–6). New York: IEEE Press.

    Google Scholar 

  • Yegnanarayana, B., Murthy, P. S., Avendano, C., & Hermansky, H. (1998). Enhancement of reverberant speech using lp residual. In IEEE international conference on acoustics, speech and signal processing, Seattle, WA, USA, May 1998 (Vol. 1, pp. 405–408).

    Google Scholar 

  • Yegnanarayana, B., Prasanna, S. R. M., & Rao, K. S. (2002). Speech enhancement using excitation source information. In Proc. IEEE int. conf. acoust., speech, signal processing, Orlando, Florida, USA, May 2002 (Vol. 1, pp. 541–544).

    Google Scholar 

  • Yegnanarayana, B., Swamy, R. K., & Murty, K. S. R. (2009). Determining mixing parameters from multispeaker data using speech-specific information. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1196–1207.

    Article  Google Scholar 

  • Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., & Narayanan, S. (2004). An acoustic study of emotions expressed in speech. In Int. conf. on spoken language processing (ICSLP 2004), Jeju Island, Korea, Oct. 2004.

    Google Scholar 

  • Yu, F., Chang, E., Xu, Y. Q., & Shum, H. Y. (2001a). Emotion detection from speech to enrich multimedia content. In Proc. IEEE Pacific Rim conference on multimedia, Beijing (pp. 550–557).

    Google Scholar 

  • Yu, F., Chang, E., Xu, Y. Q., & Shum, H. Y. (2001b). Emotion detection from speech to enrich multimedia content. In Second IEEE Pacific-Rim conference on multimedia, Beijing, China, Oct. 2001.

    Google Scholar 

  • Yuan, J., Shen, L., & Chen, F. (2002). The acoustic realization of anger, fear, joy and sadness in Chinese. In International conference on spoken language processing (ICSLP 02), Denver, Colorado, USA, Sept. 2002 (pp. 2025–2028).

    Google Scholar 

  • Zhang, S. (2008). Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In Sun, et al. (Eds.), Advances in neural networks. Lecture notes in computer science (pp. 457–464). Berlin: Springer.

    Google Scholar 

  • Zhou, G., Hansen, J. H. L., & Kaiser, J. F. (2001). Nonlinear feature based classification of speech under stress. IEEE Transactions on Audio, Speech, and Language Processing, 9, 201–216.

    Article  Google Scholar 

  • Zhou, Y., Sun, Y., Yang, L., & Yan, Y. (2009). Applying articulatory features to speech emotion recognition. In International conference on research challenges in computer science, ICRCCS, 28–29 Dec. 2009 (pp. 73–76).

    Chapter  Google Scholar 

  • Zhou, Y., Sun, Y., Zhang, J., & Yan, Y. (2009). Speech emotion recognition using both spectral and prosodic features. In International conference on information engineering and computer science, ICIECS, Wuhan, Dec. 19–20, 2009 (pp. 1–4). New York: IEEE Press.

    Google Scholar 

  • Zhu, A., & Luo, Q. (2007). Study on speech emotion recognition system in E-learning. In J. Jacko (Ed.), Human computer interaction, Part III, HCII. LNCS (pp. 544–552). Berlin: Springer.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shashidhar G. Koolagudi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koolagudi, S.G., Rao, K.S. Emotion recognition from speech: a review. Int J Speech Technol 15, 99–117 (2012). https://doi.org/10.1007/s10772-011-9125-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-011-9125-1

Keywords

Navigation