Skip to main content
Log in

Challenges in speech-based human–computer interfaces

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this article we present an overview of our current research activities falling into the scope of developing advanced spoken language dialogue systems. These systems need to react flexibly and adaptively depending on the current status of the user and the situation of use. In particular, they require emotion recognition and adaptive dialogue management techniques. Advanced dialogue systems also need proactive capabilities to act as intelligent assistants to their users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • André, E., Rehm, M., Minker, W., & Bühler, D. (2004). Endowing spoken language dialogue systems with emotional intelligence. In Tutorial and research workshop affective dialogue systems (pp. 178–187). Irsee (Germany), June 2004.

  • Barthelmess, P., & Ellis, C. A. (2005). The Neem platform: an evolvable framework for perceptual collaborative applications. Journal of Intelligent Information Systems, 2, September 2005.

  • Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9/10), 341–345.

    Google Scholar 

  • Bosma, W., & André, E. (2004). Exploiting emotions to disambiguate dialogue acts. In Proceedings of the 9th international conference on intelligent user interface (IUI) (pp. 85–92). ACM Press.

  • Bry, F., & Yahya, A. (1996). Minimal model generation with positive unit hyper-resolution tableaux. In P. Miglioli, U. Moscato, D. Mundici & M. Ornaghi (Eds.), Proceedings of theorem proving with tableaux and related methods, 5th international workshop, TABLEAUX’96, Terrasini, Palermo, Italy. Springer.

  • Bühler, D., & Hamerich, S. (2004). Towards embedding VoiceXML applications through compilation. In Workshop Dialogsysteme mit XML-Technologien, Berliner XML Tage. Berlin, Germany, October 2004.

  • Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Proceedings of ICSLP (pp. 1517–1520), September 2005.

  • Chu-Carroll, J., & Carberry, S. (2000). Conflict resolution in collaborative planning dialogues. International Journal of Human-Computer Studies, 53, 969–1015.

    Article  MATH  Google Scholar 

  • ECMA (1999). ECMA-262: ECMAscript language specification. European Computer Manufacturers’ Association (ECMA).

  • Ferguson, G., & Allen, J. (1998). TRIPS: An integrated intelligent problem-solving assistant. In AAAI/IAAI (pp. 567–572).

  • Fiscus, J. G. (1997). A post-processing system to yield reduced word error rates: recogniser output voting error reduction (ROVER). In Proceedings of the IEEE workshop on automatic speech recognition and understanding (pp. 347–352). Santa Barbara, USA.

  • Larsson, S., & Traum, D. R. (2000). Information state and dialogue management in the TRINDI dialogue move engine toolkit. Natural Language Engineering, 6(3–4), 323–340.

    Article  Google Scholar 

  • Litman, D. J., & Pan, S. (2002). Designing and evaluating an adaptive spoken dialogue system. User Modeling and User-Adapted Interaction, 12, 111–137.

    Article  MATH  Google Scholar 

  • Minker, W., Pittermann, J., Pittermann, A., Strauss, P.-M., & Bühler, D. (2006). Next-generation human–computer interfaces—towards intelligent, adaptive and proactive spoken language dialogue systems. In 2nd IEE international conference on intelligent environments, Athens (Greece), July 2006.

  • Minker, W., Pittermann, J., Pittermann, A., Strauss, P.-M., & Bühler, D. (2008). Speech communication at the leading edge. Hauppauge: Nova Science Publishers. Intelligent and Empathic Speech Interfaces, Chapter 3.

    Google Scholar 

  • Pittermann, A., & Pittermann, J. (2006). Getting bored with HTK? Using HMMs for emotion recognition. In 8th international conference on signal processing (ICSP), Guilin, China, November 2006.

  • Pittermann, J., & Pittermann, A. (2006). Integrating emotion recognition into an adaptive spoken language dialogue system. In 2nd IET international conference on intelligent environments, Athens, Greece, July 2006.

  • Pittermann, J., & Pittermann, A. (2007). A data-oriented approach to integrate emotions in adaptive dialogue management. In International conference on intelligent user interfaces (IUI), (pp. 270–273), Honolulu, USA, January 2007.

  • Pittermann, J., Pittermann, A., Meng, H., & Minker, W. (2007). Towards an emotion-sensitive spoken dialogue system—classification and dialogue modeling. In 3rd IET international conference on intelligent environments. Ulm, Germany, September 2007.

  • Pittermann, J., Rittinger, A., & Minker, W. (2005). Flexible dialogue management in intelligent human–machine interfaces. In The IEE international workshop on intelligent environments, Univ. of Essex, Colchester, UK.

  • Qu, Y., & Green, N. (2002). A constraint-based approach for cooperative information-seeking dialogues. In Proceedings of international natural language generation conference, INLG02, New York, NY.

  • Reeves, B., & Nass, C. (1996). The media equation: how people treat computers, television, and new media like real people and places. Cambridge: Cambridge University Press.

    Google Scholar 

  • Renals, S. (2004). Ami: augmented multi-party interaction.

  • Stiefelhagen, R., Steusloff, H., & Waibel, A. (2004). CHIL—computers in the human interaction loop. In Proceedings of NIST ICASSP meeting recognition workshop, Montreal, Canada.

  • Strauß, P.-M. (2006). A SLDS for perception and interaction in multi-user environments. In Proceedings of the 2nd IET international conference on intelligent environments 2006, Athens, Greece.

  • Strauß, P.-M., Hoffmann, H., Minker, W., Neumann, H., Palm, G., Scherer, S., Schwenker, F., Traue, H., Walter, W., & Weidenbacher, U. (2006). Wizard-of-Oz data collection for perception and interaction in multi-user environments. In 5th international conference on language resources and evaluation (LREC), Genova, Italy.

  • Strauß, P.-M., Hoffmann, H., & Scherer, S. (2007). Evaluation and user acceptance of a dialogue system using wizard-of-oz recordings. In Proceedings of the 3rd IET international conference on intelligent environments 2007, Ulm, Germany.

  • Strauß, P.-M., & Jahn, M. (2007). Using frame semantics on a domain dependent corpus. In Workshop on modeling and representation in computational semantics (MRCS), Hyderabad, India.

  • Weidenbacher, U., Layher, G., Bayerl, P., & Neumann, H. (2006). Detection of head pose and gaze direction for human–computer interaction. In International tutorial and research workshop on perception and interactive technologies (PIT 2006), Kloster Irsee, Germany, LNCS (vol. 4021, pp. 9–19). Berlin: Springer.

    Google Scholar 

  • Young, S. (1994). The HTK Hidden Markov Model Toolkit: Design and philosophy. Cambridge University Engineering Department, UK, Tech. Rep. CUED/F-INFENG/TR152.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wolfgang Minker.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Minker, W., Pittermann, J., Pittermann, A. et al. Challenges in speech-based human–computer interfaces. Int J Speech Technol 10, 109–119 (2007). https://doi.org/10.1007/s10772-009-9023-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-009-9023-y

Keywords

Navigation