Abstract
The involvement of emotional states in intelligent spoken human-computer interfaces has evolved to a recent field of research. In this article we describe the enhancements and optimizations of a speech-based emotion recognizer jointly operating with automatic speech recognition. We argue that the knowledge about the textual content of an utterance can improve the recognition of the emotional content. Having outlined the experimental setup we present results and demonstrate the capability of a post-processing algorithm combining multiple speech-emotion recognizers. For the dialogue management we propose a stochastic approach comprising a dialogue model and an emotional model interfering with each other in a combined dialogue-emotion model. These models are trained from dialogue corpora and being assigned different weighting factors they determine the course of the dialogue.
Similar content being viewed by others
References
André, E., Rehm, M., Minker, W., & Bühler, D. (2004). Endowing spoken language dialogue systems with emotional intelligence. In Tutorial and research workshop affective dialogue systems (pp. 178–187), Irsee, Germany, June 2004.
Batliner, A., Hacker, C., Steidl, S., Nöth, E., & Haas, J. (2004). User states, user strategies, and system performance: how to match the one with the other. In International conference on language resources and evaluation (LREC) (pp. 171–174), Lisbon, Portugal.
Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9/10), 341–345.
Bui, T. H., Zwiers, J., Poel, M., & Nijholt, A. (2006). Toward affective dialogue modeling using partially observable Markov decision processes. In Proceedings of workshop emotion and computing, 29th annual German conference on artificial intelligence, Bremen, Germany, June 2006.
Burkhardt, F., Ajmera, J., Englert, R., Stegmann, J., & Burleson, W. (2006). Detecting anger in automated voice portal dialogs. In International conference on speech and language processing (ICSLP) (pp. 1053–1056), Pittsburgh, USA, September 2006.
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., & Weiss, B. (2005). A database of German emotional speech. In European conference on speech and language processing (EUROSPEECH) (pp. 1517–1520), Lisbon, Portugal, September 2005.
Cowie, R., & Cornelius, R.R. (2003). Describing the emotional states that are expressed in speech. In Speech communication (Vol. 40, pp. 5–32).
Fiscus, J.G. (1997). A post-processing system to yield reduced word error rates: recogniser output voting error reduction (ROVER). In Proceedings of the IEEE workshop on automatic speech recognition and understanding (pp. 347–352), Santa Barbara, USA.
Kwon, O.-W., Chan, K., Hao, J., & Lee, T.-W. (2003). Emotion recognition by speech signals. In European conference on speech and language processing (EUROSPEECH) (pp. 125–128), Geneva, Switzerland.
Larson, J.A. (2001). VoiceXML 2.0 and the W3C speech interface framework. In IEEE workshop on automatic speech recognition and understanding (ASRU) (pp. 5–8).
Levin, E., Pieraccini, R., & Eckert, W. (2000). A stochastic model of human machine interaction for learning dialog strategies. IEEE Transactions on Speech and Audio Processing, 8(1), 11–23.
Litman, D.J., & Pan, S. (2002). Designing and evaluating an adaptive spoken dialogue system. User Modeling and User-Adapted Interaction, 12, 111–137.
Luengo, I., Navas, E., Hernáez, I., & Sánchez, J. (2005). Automatic emotion recognition using prosodic parameters. In International conference on speech and language processing (ICSLP) (pp. 493–496), Lisbon, Portugal, September 2005.
McTear, M.F. (2004). Spoken dialogue technology—toward the conversational user interface. Berlin: Springer.
Neiberg, D., Elenius, K., & Laskowski, K. (2006). Emotion recognition in spontaneous speech using GMMs. In International conference on speech and language processing (ICSLP) (pp. 809–812), Pittsburgh, USA, September 2006.
Pittermann, A., & Pittermann, J. (2006). Getting bored with HTK? Using HMMs for emotion recognition. In 8th international conference on signal processing (ICSP) (Vol. 1, pp. 704–707), Guilin, China, November 2006.
Pittermann, J., & Pittermann, A. (2007). A data-oriented approach to integrate emotions in adaptive dialogue management. In International conference on intelligent user interfaces (IUI) (pp. 270–273), Honolulu, USA, January 2007.
Pittermann, J., Pittermann, A., Meng, H., & Minker, W. (2007). Towards an emotion-sensitive spoken dialogue system—classification and dialogue modeling. In 3rd IET international conference on intelligent environments, Ulm, Germany, September 2007.
Pittermann, J., Pittermann, A., & Minker, W. (2007). Design and implementation of adaptive dialogue strategies for speech-based interfaces. Journal of Ubiquitous Computing and Intelligence, 1(2), 145–152.
Reeves, B., & Nass, C. (1996). The media equation: how people treat computers, television, and new media like real people and places. Cambridge: Cambridge University Press.
Schmitt, A., Hank, C., & Liscombe, J. (2008). Detecting problematic calls wit automated agents. In 4th IEEE tutorial an research workshop on perception and interactive technologies for speech-based systems, Irsee, Germany.
Vogt, T., & André, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In International conference on language resources and evaluation (LREC) (pp. 1123–1126), Genova, Italy, May 2006.
Williams, J.D., Poupart, P., & Young, S. (2005). Partially observable Markov decision processes with continuous observations for dialogue management. In Proceedings of the 6th SIGdial workshop on discourse and dialogue, Lisbon, Portugal.
Young, S. (1994). The HTK hidden Markov model toolkit: design and philosophy (Tech. Rep. CUED/F-INFENG/TR152). Cambridge University Engineering Department, UK.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pittermann, J., Pittermann, A. & Minker, W. Emotion recognition and adaptation in spoken dialogue systems. Int J Speech Technol 13, 49–60 (2010). https://doi.org/10.1007/s10772-010-9068-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-010-9068-y