Abstract
In this article we present an overview of our current research activities falling into the scope of developing advanced spoken language dialogue systems. These systems need to react flexibly and adaptively depending on the current status of the user and the situation of use. In particular, they require emotion recognition and adaptive dialogue management techniques. Advanced dialogue systems also need proactive capabilities to act as intelligent assistants to their users.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
André, E., Rehm, M., Minker, W., & Bühler, D. (2004). Endowing spoken language dialogue systems with emotional intelligence. In Tutorial and research workshop affective dialogue systems (pp. 178–187). Irsee (Germany), June 2004.
Barthelmess, P., & Ellis, C. A. (2005). The Neem platform: an evolvable framework for perceptual collaborative applications. Journal of Intelligent Information Systems, 2, September 2005.
Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9/10), 341–345.
Bosma, W., & André, E. (2004). Exploiting emotions to disambiguate dialogue acts. In Proceedings of the 9th international conference on intelligent user interface (IUI) (pp. 85–92). ACM Press.
Bry, F., & Yahya, A. (1996). Minimal model generation with positive unit hyper-resolution tableaux. In P. Miglioli, U. Moscato, D. Mundici & M. Ornaghi (Eds.), Proceedings of theorem proving with tableaux and related methods, 5th international workshop, TABLEAUX’96, Terrasini, Palermo, Italy. Springer.
Bühler, D., & Hamerich, S. (2004). Towards embedding VoiceXML applications through compilation. In Workshop Dialogsysteme mit XML-Technologien, Berliner XML Tage. Berlin, Germany, October 2004.
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Proceedings of ICSLP (pp. 1517–1520), September 2005.
Chu-Carroll, J., & Carberry, S. (2000). Conflict resolution in collaborative planning dialogues. International Journal of Human-Computer Studies, 53, 969–1015.
ECMA (1999). ECMA-262: ECMAscript language specification. European Computer Manufacturers’ Association (ECMA).
Ferguson, G., & Allen, J. (1998). TRIPS: An integrated intelligent problem-solving assistant. In AAAI/IAAI (pp. 567–572).
Fiscus, J. G. (1997). A post-processing system to yield reduced word error rates: recogniser output voting error reduction (ROVER). In Proceedings of the IEEE workshop on automatic speech recognition and understanding (pp. 347–352). Santa Barbara, USA.
Larsson, S., & Traum, D. R. (2000). Information state and dialogue management in the TRINDI dialogue move engine toolkit. Natural Language Engineering, 6(3–4), 323–340.
Litman, D. J., & Pan, S. (2002). Designing and evaluating an adaptive spoken dialogue system. User Modeling and User-Adapted Interaction, 12, 111–137.
Minker, W., Pittermann, J., Pittermann, A., Strauss, P.-M., & Bühler, D. (2006). Next-generation human–computer interfaces—towards intelligent, adaptive and proactive spoken language dialogue systems. In 2nd IEE international conference on intelligent environments, Athens (Greece), July 2006.
Minker, W., Pittermann, J., Pittermann, A., Strauss, P.-M., & Bühler, D. (2008). Speech communication at the leading edge. Hauppauge: Nova Science Publishers. Intelligent and Empathic Speech Interfaces, Chapter 3.
Pittermann, A., & Pittermann, J. (2006). Getting bored with HTK? Using HMMs for emotion recognition. In 8th international conference on signal processing (ICSP), Guilin, China, November 2006.
Pittermann, J., & Pittermann, A. (2006). Integrating emotion recognition into an adaptive spoken language dialogue system. In 2nd IET international conference on intelligent environments, Athens, Greece, July 2006.
Pittermann, J., & Pittermann, A. (2007). A data-oriented approach to integrate emotions in adaptive dialogue management. In International conference on intelligent user interfaces (IUI), (pp. 270–273), Honolulu, USA, January 2007.
Pittermann, J., Pittermann, A., Meng, H., & Minker, W. (2007). Towards an emotion-sensitive spoken dialogue system—classification and dialogue modeling. In 3rd IET international conference on intelligent environments. Ulm, Germany, September 2007.
Pittermann, J., Rittinger, A., & Minker, W. (2005). Flexible dialogue management in intelligent human–machine interfaces. In The IEE international workshop on intelligent environments, Univ. of Essex, Colchester, UK.
Qu, Y., & Green, N. (2002). A constraint-based approach for cooperative information-seeking dialogues. In Proceedings of international natural language generation conference, INLG02, New York, NY.
Reeves, B., & Nass, C. (1996). The media equation: how people treat computers, television, and new media like real people and places. Cambridge: Cambridge University Press.
Renals, S. (2004). Ami: augmented multi-party interaction.
Stiefelhagen, R., Steusloff, H., & Waibel, A. (2004). CHIL—computers in the human interaction loop. In Proceedings of NIST ICASSP meeting recognition workshop, Montreal, Canada.
Strauß, P.-M. (2006). A SLDS for perception and interaction in multi-user environments. In Proceedings of the 2nd IET international conference on intelligent environments 2006, Athens, Greece.
Strauß, P.-M., Hoffmann, H., Minker, W., Neumann, H., Palm, G., Scherer, S., Schwenker, F., Traue, H., Walter, W., & Weidenbacher, U. (2006). Wizard-of-Oz data collection for perception and interaction in multi-user environments. In 5th international conference on language resources and evaluation (LREC), Genova, Italy.
Strauß, P.-M., Hoffmann, H., & Scherer, S. (2007). Evaluation and user acceptance of a dialogue system using wizard-of-oz recordings. In Proceedings of the 3rd IET international conference on intelligent environments 2007, Ulm, Germany.
Strauß, P.-M., & Jahn, M. (2007). Using frame semantics on a domain dependent corpus. In Workshop on modeling and representation in computational semantics (MRCS), Hyderabad, India.
Weidenbacher, U., Layher, G., Bayerl, P., & Neumann, H. (2006). Detection of head pose and gaze direction for human–computer interaction. In International tutorial and research workshop on perception and interactive technologies (PIT 2006), Kloster Irsee, Germany, LNCS (vol. 4021, pp. 9–19). Berlin: Springer.
Young, S. (1994). The HTK Hidden Markov Model Toolkit: Design and philosophy. Cambridge University Engineering Department, UK, Tech. Rep. CUED/F-INFENG/TR152.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Minker, W., Pittermann, J., Pittermann, A. et al. Challenges in speech-based human–computer interfaces. Int J Speech Technol 10, 109–119 (2007). https://doi.org/10.1007/s10772-009-9023-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-009-9023-y