Emotion recognition and adaptation in spoken dialogue systems

Pittermann, Johannes; Pittermann, Angela; Minker, Wolfgang

doi:10.1007/s10772-010-9068-y

Emotion recognition and adaptation in spoken dialogue systems

Published: 09 March 2010

Volume 13, pages 49–60, (2010)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Johannes Pittermann¹,
Angela Pittermann¹ &
Wolfgang Minker¹

471 Accesses
38 Citations
3 Altmetric
Explore all metrics

Abstract

The involvement of emotional states in intelligent spoken human-computer interfaces has evolved to a recent field of research. In this article we describe the enhancements and optimizations of a speech-based emotion recognizer jointly operating with automatic speech recognition. We argue that the knowledge about the textual content of an utterance can improve the recognition of the emotional content. Having outlined the experimental setup we present results and demonstrate the capability of a post-processing algorithm combining multiple speech-emotion recognizers. For the dialogue management we propose a stochastic approach comprising a dialogue model and an emotional model interfering with each other in a combined dialogue-emotion model. These models are trained from dialogue corpora and being assigned different weighting factors they determine the course of the dialogue.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

André, E., Rehm, M., Minker, W., & Bühler, D. (2004). Endowing spoken language dialogue systems with emotional intelligence. In Tutorial and research workshop affective dialogue systems (pp. 178–187), Irsee, Germany, June 2004.
Batliner, A., Hacker, C., Steidl, S., Nöth, E., & Haas, J. (2004). User states, user strategies, and system performance: how to match the one with the other. In International conference on language resources and evaluation (LREC) (pp. 171–174), Lisbon, Portugal.
Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9/10), 341–345.
Google Scholar
Bui, T. H., Zwiers, J., Poel, M., & Nijholt, A. (2006). Toward affective dialogue modeling using partially observable Markov decision processes. In Proceedings of workshop emotion and computing, 29th annual German conference on artificial intelligence, Bremen, Germany, June 2006.
Burkhardt, F., Ajmera, J., Englert, R., Stegmann, J., & Burleson, W. (2006). Detecting anger in automated voice portal dialogs. In International conference on speech and language processing (ICSLP) (pp. 1053–1056), Pittsburgh, USA, September 2006.
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., & Weiss, B. (2005). A database of German emotional speech. In European conference on speech and language processing (EUROSPEECH) (pp. 1517–1520), Lisbon, Portugal, September 2005.
Cowie, R., & Cornelius, R.R. (2003). Describing the emotional states that are expressed in speech. In Speech communication (Vol. 40, pp. 5–32).
Fiscus, J.G. (1997). A post-processing system to yield reduced word error rates: recogniser output voting error reduction (ROVER). In Proceedings of the IEEE workshop on automatic speech recognition and understanding (pp. 347–352), Santa Barbara, USA.
Kwon, O.-W., Chan, K., Hao, J., & Lee, T.-W. (2003). Emotion recognition by speech signals. In European conference on speech and language processing (EUROSPEECH) (pp. 125–128), Geneva, Switzerland.
Larson, J.A. (2001). VoiceXML 2.0 and the W3C speech interface framework. In IEEE workshop on automatic speech recognition and understanding (ASRU) (pp. 5–8).
Levin, E., Pieraccini, R., & Eckert, W. (2000). A stochastic model of human machine interaction for learning dialog strategies. IEEE Transactions on Speech and Audio Processing, 8(1), 11–23.
Article Google Scholar
Litman, D.J., & Pan, S. (2002). Designing and evaluating an adaptive spoken dialogue system. User Modeling and User-Adapted Interaction, 12, 111–137.
Article MATH Google Scholar
Luengo, I., Navas, E., Hernáez, I., & Sánchez, J. (2005). Automatic emotion recognition using prosodic parameters. In International conference on speech and language processing (ICSLP) (pp. 493–496), Lisbon, Portugal, September 2005.
McTear, M.F. (2004). Spoken dialogue technology—toward the conversational user interface. Berlin: Springer.
Google Scholar
Neiberg, D., Elenius, K., & Laskowski, K. (2006). Emotion recognition in spontaneous speech using GMMs. In International conference on speech and language processing (ICSLP) (pp. 809–812), Pittsburgh, USA, September 2006.
Pittermann, A., & Pittermann, J. (2006). Getting bored with HTK? Using HMMs for emotion recognition. In 8th international conference on signal processing (ICSP) (Vol. 1, pp. 704–707), Guilin, China, November 2006.
Pittermann, J., & Pittermann, A. (2007). A data-oriented approach to integrate emotions in adaptive dialogue management. In International conference on intelligent user interfaces (IUI) (pp. 270–273), Honolulu, USA, January 2007.
Pittermann, J., Pittermann, A., Meng, H., & Minker, W. (2007). Towards an emotion-sensitive spoken dialogue system—classification and dialogue modeling. In 3rd IET international conference on intelligent environments, Ulm, Germany, September 2007.
Pittermann, J., Pittermann, A., & Minker, W. (2007). Design and implementation of adaptive dialogue strategies for speech-based interfaces. Journal of Ubiquitous Computing and Intelligence, 1(2), 145–152.
Article Google Scholar
Reeves, B., & Nass, C. (1996). The media equation: how people treat computers, television, and new media like real people and places. Cambridge: Cambridge University Press.
Google Scholar
Schmitt, A., Hank, C., & Liscombe, J. (2008). Detecting problematic calls wit automated agents. In 4th IEEE tutorial an research workshop on perception and interactive technologies for speech-based systems, Irsee, Germany.
Vogt, T., & André, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In International conference on language resources and evaluation (LREC) (pp. 1123–1126), Genova, Italy, May 2006.
Williams, J.D., Poupart, P., & Young, S. (2005). Partially observable Markov decision processes with continuous observations for dialogue management. In Proceedings of the 6th SIGdial workshop on discourse and dialogue, Lisbon, Portugal.
Young, S. (1994). The HTK hidden Markov model toolkit: design and philosophy (Tech. Rep. CUED/F-INFENG/TR152). Cambridge University Engineering Department, UK.

Download references

Author information

Authors and Affiliations

Institute of Information Technology, University of Ulm, Ulm, Germany
Johannes Pittermann, Angela Pittermann & Wolfgang Minker

Authors

Johannes Pittermann
View author publications
You can also search for this author in PubMed Google Scholar
Angela Pittermann
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Minker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wolfgang Minker.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pittermann, J., Pittermann, A. & Minker, W. Emotion recognition and adaptation in spoken dialogue systems. Int J Speech Technol 13, 49–60 (2010). https://doi.org/10.1007/s10772-010-9068-y

Download citation

Received: 23 December 2009
Accepted: 01 March 2010
Published: 09 March 2010
Issue Date: March 2010
DOI: https://doi.org/10.1007/s10772-010-9068-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Emotion recognition and adaptation in spoken dialogue systems

Abstract

Access this article

Similar content being viewed by others

Transformer models for text-based emotion detection: a review of BERT-based approaches

Artificial intelligence moving serious gaming: Presenting reusable game AI components

Imagined speech classification exploiting EEG power spectrum features

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Emotion recognition and adaptation in spoken dialogue systems

Abstract

Access this article

Similar content being viewed by others

Transformer models for text-based emotion detection: a review of BERT-based approaches

Artificial intelligence moving serious gaming: Presenting reusable game AI components

Imagined speech classification exploiting EEG power spectrum features

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation