Skip to main content
Log in

Constraining User Response via Multimodal Dialog Interface

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper presents the results of an experiment comparing two different designs of an automated dialog interface. We compare a multimodal design utilizing text displays coordinated with spoken prompts to a voice-only version of the same application. Our results show that the text-coordinated version is more efficient in terms of word recognition and number of out-of-grammar responses, and is equal to the voice-only version in terms of user satisfaction. We argue that this type of multimodal dialog interface effectively constrains user response to allow for better speech recognition without increasing cognitive load or compromising the naturalness of the interaction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Baber, C., Johnson, G., and Cleaver, D. (1997).Factors affecting users' choice of words in speech-based interaction with public technology. International Journal of Speech Technology, 2(1):45–49.

    Google Scholar 

  • Baca, J. (1998). Comparing effects of navigational interface modalities on speaker prosodics. Assets '98, Proceedings of the Third International ACMConference on Assistive Technologies. Marina del Rey: ACM, pp. 3–10.

  • Baddeley, A. (1992). Working memory. Science, 255(5044):556–559.

    Google Scholar 

  • Balentine, B. (1999). Re-engineering the speech menu. In D. Gardner-Bonneau (Ed.), Human Factors and Voice Interactive Systems. Boston: Kluwer, pp. 205–235.

    Google Scholar 

  • Becchetti, C. and Ricotti, L.P. (1999). Speech Recognition: Theory and C++Implementation.West Sussex, England: JohnWiley and Sons.

    Google Scholar 

  • Boyce, S. (1999).Spoken natural language dialog systems: User interface issues for the future. In D. Gardner-Bonneau (Ed.), Human Factors and Voice Interactive Systems. Boston: Kluwer, pp. 37–61.

    Google Scholar 

  • Boyce, S. (2000). Natural spoken dialog systems for telephony applications. Communications of the ACM, 43(9):29–34.

    Article  Google Scholar 

  • David, P. and Hirshman, E. (1998). Dual-mode presentation and its effect on implicit and explicit memory. American Journal of Psychology, 111(1):77–88.

    Google Scholar 

  • Gardner-Bonneau, D. (1999). Guidelines for speech-enabled IVR application design. In D. Gardner-Bonneau (Ed.), Human Factors and Voice Interactive Systems. Boston: Kluwer, pp. 147–162.

    Google Scholar 

  • Goolkasian, P. (2000). Pictures, words, and sounds: From which format are we best able to reason? The Journal of General Psychology, 127(4):439–459.

    Google Scholar 

  • Grasso, M. and Finin, T. (1997). Task integration in multimodal speech recognition environments. Crossroads, 3(3):19–22.

    Google Scholar 

  • Hardy, H., Baker, K., Devillers, L., Lamel, L., Rosset, S., Strzalkowski, T., Ursu, C., and Webb, N. (2002). Multi-layer dialogue annotation for automated multilingual customer service. Proceedings of the ISLEWorkshop on Dialogue Tagging for Multi-Modal Human Computer Interaction. Edinburgh.

  • Karsenty, L. (2002). Shifting the design philosophy of spoken natural language dialog: From invisible to transparent systems. International Journal of Speech Technology, 5:147–157.

    Article  Google Scholar 

  • Martin, A. and Przybocki, M. (2001). Analysis of results. 2001 NIST Large Vocabulary Conversational Speech Recognition Workshop.

  • Mayer, R. and Moreno, R. (1998). A split-attention effect in multimedia learning: Evidence for dual processing systems in 258. working memory. Journal of Educational Psychology, 90(2):312–320.

    Article  Google Scholar 

  • Mayer, R., Moreno, R., Borrie, M., and Vagge, S. (1999). Maximizing constructivist learning from multimedia communications by minimizing cognitive load. Journal of Educational Psychology, 91(4):638–643.

    Article  Google Scholar 

  • Mousavi, S.Y., Low, R., and Sweller, J. (1995). Reducing cognitive load by mixing auditory and visual presentation modes. Journal of Educational Psychology, 87(2):319–334.

    Article  Google Scholar 

  • Novick, D., Hansen, B., Sutton, S., and Marshall, C. (1999). Limiting factors of automated telephone dialogs. In D. Gardner-Bonneau (Ed.), Human Factors and Voice Interactive Systems. Boston: Kluwer, pp. 163–186.

    Google Scholar 

  • Shneiderman, B. (1997). Designing the User Interface. 3rd ed. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Velayo, R.S. and Quirk, C. (2000).How do presentation modality and strategy use influence memory for paired concepts? Journal of Instructional Psychology, 27(6):126–135.

    Google Scholar 

  • Walker, M., Fromer, J., Di Fabbrizio, G., Mestel, C., and Hindle, D. (1998). What can I say?: Evaluating a spoken language interface to email. Proceedings of the Conference on Human Factors in Computing Systems. NY: ACM, pp. 582–589.

    Google Scholar 

  • Yeung, A. (1999). Cognitive load and learner expertise: Splitattention and redundancy effects in reading comprehension tasks with vocabulary definitions. The Journal of Experimental Education, 67(3):197–212.

    Google Scholar 

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baker, K., Mckenzie, A., Biermann, A. et al. Constraining User Response via Multimodal Dialog Interface. International Journal of Speech Technology 7, 251–258 (2004). https://doi.org/10.1023/B:IJST.0000037069.82313.57

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:IJST.0000037069.82313.57

Navigation