Skip to main content

Advertisement

Log in

Emotion recognition and adaptation in spoken dialogue systems

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The involvement of emotional states in intelligent spoken human-computer interfaces has evolved to a recent field of research. In this article we describe the enhancements and optimizations of a speech-based emotion recognizer jointly operating with automatic speech recognition. We argue that the knowledge about the textual content of an utterance can improve the recognition of the emotional content. Having outlined the experimental setup we present results and demonstrate the capability of a post-processing algorithm combining multiple speech-emotion recognizers. For the dialogue management we propose a stochastic approach comprising a dialogue model and an emotional model interfering with each other in a combined dialogue-emotion model. These models are trained from dialogue corpora and being assigned different weighting factors they determine the course of the dialogue.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • André, E., Rehm, M., Minker, W., & Bühler, D. (2004). Endowing spoken language dialogue systems with emotional intelligence. In Tutorial and research workshop affective dialogue systems (pp. 178–187), Irsee, Germany, June 2004.

  • Batliner, A., Hacker, C., Steidl, S., Nöth, E., & Haas, J. (2004). User states, user strategies, and system performance: how to match the one with the other. In International conference on language resources and evaluation (LREC) (pp. 171–174), Lisbon, Portugal.

  • Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9/10), 341–345.

    Google Scholar 

  • Bui, T. H., Zwiers, J., Poel, M., & Nijholt, A. (2006). Toward affective dialogue modeling using partially observable Markov decision processes. In Proceedings of workshop emotion and computing, 29th annual German conference on artificial intelligence, Bremen, Germany, June 2006.

  • Burkhardt, F., Ajmera, J., Englert, R., Stegmann, J., & Burleson, W. (2006). Detecting anger in automated voice portal dialogs. In International conference on speech and language processing (ICSLP) (pp. 1053–1056), Pittsburgh, USA, September 2006.

  • Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., & Weiss, B. (2005). A database of German emotional speech. In European conference on speech and language processing (EUROSPEECH) (pp. 1517–1520), Lisbon, Portugal, September 2005.

  • Cowie, R., & Cornelius, R.R. (2003). Describing the emotional states that are expressed in speech. In Speech communication (Vol. 40, pp. 5–32).

  • Fiscus, J.G. (1997). A post-processing system to yield reduced word error rates: recogniser output voting error reduction (ROVER). In Proceedings of the IEEE workshop on automatic speech recognition and understanding (pp. 347–352), Santa Barbara, USA.

  • Kwon, O.-W., Chan, K., Hao, J., & Lee, T.-W. (2003). Emotion recognition by speech signals. In European conference on speech and language processing (EUROSPEECH) (pp. 125–128), Geneva, Switzerland.

  • Larson, J.A. (2001). VoiceXML 2.0 and the W3C speech interface framework. In IEEE workshop on automatic speech recognition and understanding (ASRU) (pp. 5–8).

  • Levin, E., Pieraccini, R., & Eckert, W. (2000). A stochastic model of human machine interaction for learning dialog strategies. IEEE Transactions on Speech and Audio Processing, 8(1), 11–23.

    Article  Google Scholar 

  • Litman, D.J., & Pan, S. (2002). Designing and evaluating an adaptive spoken dialogue system. User Modeling and User-Adapted Interaction, 12, 111–137.

    Article  MATH  Google Scholar 

  • Luengo, I., Navas, E., Hernáez, I., & Sánchez, J. (2005). Automatic emotion recognition using prosodic parameters. In International conference on speech and language processing (ICSLP) (pp. 493–496), Lisbon, Portugal, September 2005.

  • McTear, M.F. (2004). Spoken dialogue technology—toward the conversational user interface. Berlin: Springer.

    Google Scholar 

  • Neiberg, D., Elenius, K., & Laskowski, K. (2006). Emotion recognition in spontaneous speech using GMMs. In International conference on speech and language processing (ICSLP) (pp. 809–812), Pittsburgh, USA, September 2006.

  • Pittermann, A., & Pittermann, J. (2006). Getting bored with HTK? Using HMMs for emotion recognition. In 8th international conference on signal processing (ICSP) (Vol. 1, pp. 704–707), Guilin, China, November 2006.

  • Pittermann, J., & Pittermann, A. (2007). A data-oriented approach to integrate emotions in adaptive dialogue management. In International conference on intelligent user interfaces (IUI) (pp. 270–273), Honolulu, USA, January 2007.

  • Pittermann, J., Pittermann, A., Meng, H., & Minker, W. (2007). Towards an emotion-sensitive spoken dialogue system—classification and dialogue modeling. In 3rd IET international conference on intelligent environments, Ulm, Germany, September 2007.

  • Pittermann, J., Pittermann, A., & Minker, W. (2007). Design and implementation of adaptive dialogue strategies for speech-based interfaces. Journal of Ubiquitous Computing and Intelligence, 1(2), 145–152.

    Article  Google Scholar 

  • Reeves, B., & Nass, C. (1996). The media equation: how people treat computers, television, and new media like real people and places. Cambridge: Cambridge University Press.

    Google Scholar 

  • Schmitt, A., Hank, C., & Liscombe, J. (2008). Detecting problematic calls wit automated agents. In 4th IEEE tutorial an research workshop on perception and interactive technologies for speech-based systems, Irsee, Germany.

  • Vogt, T., & André, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In International conference on language resources and evaluation (LREC) (pp. 1123–1126), Genova, Italy, May 2006.

  • Williams, J.D., Poupart, P., & Young, S. (2005). Partially observable Markov decision processes with continuous observations for dialogue management. In Proceedings of the 6th SIGdial workshop on discourse and dialogue, Lisbon, Portugal.

  • Young, S. (1994). The HTK hidden Markov model toolkit: design and philosophy (Tech. Rep. CUED/F-INFENG/TR152). Cambridge University Engineering Department, UK.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wolfgang Minker.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pittermann, J., Pittermann, A. & Minker, W. Emotion recognition and adaptation in spoken dialogue systems. Int J Speech Technol 13, 49–60 (2010). https://doi.org/10.1007/s10772-010-9068-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-010-9068-y

Keywords

Navigation