Constraining User Response via Multimodal Dialog Interface

Baker, Kirk; Mckenzie, Ashley; Biermann, Alan; Webelhuth, Gert

doi:10.1023/B:IJST.0000037069.82313.57

Constraining User Response via Multimodal Dialog Interface

Published: October 2004

Volume 7, pages 251–258, (2004)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Kirk Baker,
Ashley Mckenzie,
Alan Biermann &
…
Gert Webelhuth

56 Accesses
4 Citations
Explore all metrics

Abstract

This paper presents the results of an experiment comparing two different designs of an automated dialog interface. We compare a multimodal design utilizing text displays coordinated with spoken prompts to a voice-only version of the same application. Our results show that the text-coordinated version is more efficient in terms of word recognition and number of out-of-grammar responses, and is equal to the voice-only version in terms of user satisfaction. We argue that this type of multimodal dialog interface effectively constrains user response to allow for better speech recognition without increasing cognitive load or compromising the naturalness of the interaction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AI-based chatbots in customer service and their effects on user compliance

Article Open access 17 March 2020

Martin Adam, Michael Wessel & Alexander Benlian

Why People Use Chatbots

The Chatbot Usability Scale: the Design and Pilot of a Usability Scale for Interaction with AI-Based Conversational Agents

Article Open access 21 July 2021

Simone Borsci, Alessio Malizia, … Alan Chamberlain

References

Baber, C., Johnson, G., and Cleaver, D. (1997).Factors affecting users' choice of words in speech-based interaction with public technology. International Journal of Speech Technology, 2(1):45–49.
Google Scholar
Baca, J. (1998). Comparing effects of navigational interface modalities on speaker prosodics. Assets '98, Proceedings of the Third International ACMConference on Assistive Technologies. Marina del Rey: ACM, pp. 3–10.
Baddeley, A. (1992). Working memory. Science, 255(5044):556–559.
Google Scholar
Balentine, B. (1999). Re-engineering the speech menu. In D. Gardner-Bonneau (Ed.), Human Factors and Voice Interactive Systems. Boston: Kluwer, pp. 205–235.
Google Scholar
Becchetti, C. and Ricotti, L.P. (1999). Speech Recognition: Theory and C++Implementation.West Sussex, England: JohnWiley and Sons.
Google Scholar
Boyce, S. (1999).Spoken natural language dialog systems: User interface issues for the future. In D. Gardner-Bonneau (Ed.), Human Factors and Voice Interactive Systems. Boston: Kluwer, pp. 37–61.
Google Scholar
Boyce, S. (2000). Natural spoken dialog systems for telephony applications. Communications of the ACM, 43(9):29–34.
Article Google Scholar
David, P. and Hirshman, E. (1998). Dual-mode presentation and its effect on implicit and explicit memory. American Journal of Psychology, 111(1):77–88.
Google Scholar
Gardner-Bonneau, D. (1999). Guidelines for speech-enabled IVR application design. In D. Gardner-Bonneau (Ed.), Human Factors and Voice Interactive Systems. Boston: Kluwer, pp. 147–162.
Google Scholar
Goolkasian, P. (2000). Pictures, words, and sounds: From which format are we best able to reason? The Journal of General Psychology, 127(4):439–459.
Google Scholar
Grasso, M. and Finin, T. (1997). Task integration in multimodal speech recognition environments. Crossroads, 3(3):19–22.
Google Scholar
Hardy, H., Baker, K., Devillers, L., Lamel, L., Rosset, S., Strzalkowski, T., Ursu, C., and Webb, N. (2002). Multi-layer dialogue annotation for automated multilingual customer service. Proceedings of the ISLEWorkshop on Dialogue Tagging for Multi-Modal Human Computer Interaction. Edinburgh.
Karsenty, L. (2002). Shifting the design philosophy of spoken natural language dialog: From invisible to transparent systems. International Journal of Speech Technology, 5:147–157.
Article Google Scholar
Martin, A. and Przybocki, M. (2001). Analysis of results. 2001 NIST Large Vocabulary Conversational Speech Recognition Workshop.
Mayer, R. and Moreno, R. (1998). A split-attention effect in multimedia learning: Evidence for dual processing systems in 258. working memory. Journal of Educational Psychology, 90(2):312–320.
Article Google Scholar
Mayer, R., Moreno, R., Borrie, M., and Vagge, S. (1999). Maximizing constructivist learning from multimedia communications by minimizing cognitive load. Journal of Educational Psychology, 91(4):638–643.
Article Google Scholar
Mousavi, S.Y., Low, R., and Sweller, J. (1995). Reducing cognitive load by mixing auditory and visual presentation modes. Journal of Educational Psychology, 87(2):319–334.
Article Google Scholar
Novick, D., Hansen, B., Sutton, S., and Marshall, C. (1999). Limiting factors of automated telephone dialogs. In D. Gardner-Bonneau (Ed.), Human Factors and Voice Interactive Systems. Boston: Kluwer, pp. 163–186.
Google Scholar
Shneiderman, B. (1997). Designing the User Interface. 3rd ed. Reading, MA: Addison-Wesley.
Google Scholar
Velayo, R.S. and Quirk, C. (2000).How do presentation modality and strategy use influence memory for paired concepts? Journal of Instructional Psychology, 27(6):126–135.
Google Scholar
Walker, M., Fromer, J., Di Fabbrizio, G., Mestel, C., and Hindle, D. (1998). What can I say?: Evaluating a spoken language interface to email. Proceedings of the Conference on Human Factors in Computing Systems. NY: ACM, pp. 582–589.
Google Scholar
Yeung, A. (1999). Cognitive load and learner expertise: Splitattention and redundancy effects in reading comprehension tasks with vocabulary definitions. The Journal of Experimental Education, 67(3):197–212.
Google Scholar

Download references

Authors

Kirk Baker
View author publications
You can also search for this author in PubMed Google Scholar
Ashley Mckenzie
View author publications
You can also search for this author in PubMed Google Scholar
Alan Biermann
View author publications
You can also search for this author in PubMed Google Scholar
Gert Webelhuth
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baker, K., Mckenzie, A., Biermann, A. et al. Constraining User Response via Multimodal Dialog Interface. International Journal of Speech Technology 7, 251–258 (2004). https://doi.org/10.1023/B:IJST.0000037069.82313.57

Download citation

Issue Date: October 2004
DOI: https://doi.org/10.1023/B:IJST.0000037069.82313.57

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constraining User Response via Multimodal Dialog Interface

Abstract

Access this article

Similar content being viewed by others

AI-based chatbots in customer service and their effects on user compliance

Why People Use Chatbots

The Chatbot Usability Scale: the Design and Pilot of a Usability Scale for Interaction with AI-Based Conversational Agents

References

Rights and permissions

About this article

Cite this article

Navigation

Constraining User Response via Multimodal Dialog Interface

Abstract

Access this article

Similar content being viewed by others

AI-based chatbots in customer service and their effects on user compliance

Why People Use Chatbots

The Chatbot Usability Scale: the Design and Pilot of a Usability Scale for Interaction with AI-Based Conversational Agents

References

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation