An interface for mutual disambiguation of recognition errors in a multimodal navigational assistant

Lieberman, Henry; Chu, Amy

doi:10.1007/s00530-006-0052-y

An interface for mutual disambiguation of recognition errors in a multimodal navigational assistant

Regular paper
Published: 12 September 2006

Volume 12, pages 393–402, (2007)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Henry Lieberman¹ &
Amy Chu¹

54 Accesses
3 Citations
Explore all metrics

Abstract

Users often have tasks that can be accomplished with the aid of multiple media – for example with text, sound and pictures. For example, communicating an urban navigation route can be expressed with pictures and text. Today’s mobile devices have multimedia capabilities; cell phones have cameras, displays, sound output, and (soon) speech recognition. Potentially, these multimedia capabilities can be used for multimedia-intensive tasks, but two things stand in the way. First, recognition of visual input and speech recognition still remain unreliable. Second, the mechanics of integrating multiple media and recognition systems remains daunting for users. We address both these issues in a system, MARCO, multimodal agent for route construction. MARCO collects route information by taking pictures of landmarks, accompanied by verbal directions. We combine results from off-the-shelf speech recognition and optical character recognition to achieve better recognition of route landmarks than either recognition system alone. MARCO automatically produces an illustrated, step-by-step guide to the route.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Test-Bed for Text-to-Speech-Based Pedestrian Navigation Systems

Extensible Multimodal Annotation for Intelligent Interactive Systems

SARA: Singapore’s Automated Responsive Assistant, A Multimodal Dialogue System for Touristic Information

References

Abowd G.D., Mankoff J., Hudson S.E. (2000): OOPS: a toolkit supporting mediation techniques for resolving ambiguity in recognition-based interfaces Special Issue on Calligraphic Interfaces. Comput Graph. 24(6): 819–834
Article Google Scholar
Almeida, L., Amdal, I., Beires, N., Boualem, M., Boves, L., den Os, E., Filoche, P., Gomes, R. , Eikeset Kudsen, J., Kvale, K., Rugelbak, J., Tallec, C., Warakagoda, N.: Implementing and evaluating a multimodal and multilingual touris guide. In: van Kuppevelt, J. et al. (eds.) Proceedings of the International CLASS Workshop on Natural, Intelligent and Effective Interaction in Multimodal Dialogue Systems, 28–29 June 2002, pp. 1–7. Copenhagen, Denmark
Alvarado, C., Randall, D.: Resolving ambiguities to create a natural sketch based interface. In: Proceedings of IJCAI-2001 (2001)
Cheyer A., Julia, L.: Multimodal maps: an agent-based approach. In: Bunt H., Beun R.J., Borghuis T. (eds) Multimodal Human–Computer Communication. Lecture Notes in Artificial Intelligence, vol. 1374, Springer, Berlin Heidelberg New York, pp. 111–121
Donath, J., Karahalios, K., Rozier, J.: HearThere: an augmented reality system of linked audio. ICAD (2000) http://www.icad.org/websiteV2.0/Conferences/ICAD2000/ ICAD2000.html
Haritaoglu, I.: InfoScope: link from real world to digital information space. In: Proceedings of Ubicomp, Atlanta, pp. 247–255 (2001)
Johnston, M., Srinivas B., Gunaranjan V.: MATCH: multimodal access to city help. In: Automatic Speech Recognition and Understanding Workshop, Madonna Di Campiglio, Trento, Italy (2001)
Kushmerick, N., Thomas, B.: Adaptive information extraction: core technologies for information agents. In: Intelligent Information Agents R&D in Europe: An Agent Link perspective. Springer, Berlin Heidelberg New York (in press, 2002)
Lee, P., Tversky, B.: Pictorial and verbal tools for conveying routes. In: Proceedings of COSIT, pp. 51–64 (1999)
Lieberman, H.: Integrating user interface agents with conventional applications. In: International Conference on Intelligent User Interfaces, San Francisco (1998)
Lieberman, H.: Out of many, one: reliable results from unreliable recognition. In: Proceedings of CHI ’02, Minneapolis, MN. ACM Press, New York (2002)
Lieberman, H., Rosenzweig, E., Singh, P.A. (2001): An agent for annotating and retrieving images. IEEE Comput. 34(7), 57–61
Google Scholar
Oviatt, S.L.: Designing robust multimodal systems for universal access. In: Workshop on Universal Accessibility of Ubiquitous Computing, pp. 71–74. ACM Press, New York (2001)
viatt, S.L.: Multimodal system processing in mobile environments. In: Proceedings of UIST, pp. 21–30. ACM Press, New York (2000)
Oviatt, S.L. Mutual disambiguation of recognition errors in a multimodal architecture. In: Proceedings of CHI, pp. 576–583. ACM Press, New York (1999)
Oviatt S.L. (2000): Taming recognition errors with a multimodal interface. CACM 43(9): 45–51
Google Scholar

Download references

Author information

Authors and Affiliations

MIT Media Laboratory, 20 Ames Street, Cambridge, MA, 02139-4307, USA
Henry Lieberman & Amy Chu

Authors

Henry Lieberman
View author publications
You can also search for this author in PubMed Google Scholar
Amy Chu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Henry Lieberman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lieberman, H., Chu, A. An interface for mutual disambiguation of recognition errors in a multimodal navigational assistant. Multimedia Systems 12, 393–402 (2007). https://doi.org/10.1007/s00530-006-0052-y

Download citation

Published: 12 September 2006
Issue Date: March 2007
DOI: https://doi.org/10.1007/s00530-006-0052-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An interface for mutual disambiguation of recognition errors in a multimodal navigational assistant

Abstract

Access this article

Similar content being viewed by others

A Test-Bed for Text-to-Speech-Based Pedestrian Navigation Systems

Extensible Multimodal Annotation for Intelligent Interactive Systems

SARA: Singapore’s Automated Responsive Assistant, A Multimodal Dialogue System for Touristic Information

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation