Abstract
Users often have tasks that can be accomplished with the aid of multiple media – for example with text, sound and pictures. For example, communicating an urban navigation route can be expressed with pictures and text. Today’s mobile devices have multimedia capabilities; cell phones have cameras, displays, sound output, and (soon) speech recognition. Potentially, these multimedia capabilities can be used for multimedia-intensive tasks, but two things stand in the way. First, recognition of visual input and speech recognition still remain unreliable. Second, the mechanics of integrating multiple media and recognition systems remains daunting for users. We address both these issues in a system, MARCO, multimodal agent for route construction. MARCO collects route information by taking pictures of landmarks, accompanied by verbal directions. We combine results from off-the-shelf speech recognition and optical character recognition to achieve better recognition of route landmarks than either recognition system alone. MARCO automatically produces an illustrated, step-by-step guide to the route.
Similar content being viewed by others
References
Abowd G.D., Mankoff J., Hudson S.E. (2000): OOPS: a toolkit supporting mediation techniques for resolving ambiguity in recognition-based interfaces Special Issue on Calligraphic Interfaces. Comput Graph. 24(6): 819–834
Almeida, L., Amdal, I., Beires, N., Boualem, M., Boves, L., den Os, E., Filoche, P., Gomes, R. , Eikeset Kudsen, J., Kvale, K., Rugelbak, J., Tallec, C., Warakagoda, N.: Implementing and evaluating a multimodal and multilingual touris guide. In: van Kuppevelt, J. et al. (eds.) Proceedings of the International CLASS Workshop on Natural, Intelligent and Effective Interaction in Multimodal Dialogue Systems, 28–29 June 2002, pp. 1–7. Copenhagen, Denmark
Alvarado, C., Randall, D.: Resolving ambiguities to create a natural sketch based interface. In: Proceedings of IJCAI-2001 (2001)
Cheyer A., Julia, L.: Multimodal maps: an agent-based approach. In: Bunt H., Beun R.J., Borghuis T. (eds) Multimodal Human–Computer Communication. Lecture Notes in Artificial Intelligence, vol. 1374, Springer, Berlin Heidelberg New York, pp. 111–121
Donath, J., Karahalios, K., Rozier, J.: HearThere: an augmented reality system of linked audio. ICAD (2000) http://www.icad.org/websiteV2.0/Conferences/ICAD2000/ ICAD2000.html
Haritaoglu, I.: InfoScope: link from real world to digital information space. In: Proceedings of Ubicomp, Atlanta, pp. 247–255 (2001)
Johnston, M., Srinivas B., Gunaranjan V.: MATCH: multimodal access to city help. In: Automatic Speech Recognition and Understanding Workshop, Madonna Di Campiglio, Trento, Italy (2001)
Kushmerick, N., Thomas, B.: Adaptive information extraction: core technologies for information agents. In: Intelligent Information Agents R&D in Europe: An Agent Link perspective. Springer, Berlin Heidelberg New York (in press, 2002)
Lee, P., Tversky, B.: Pictorial and verbal tools for conveying routes. In: Proceedings of COSIT, pp. 51–64 (1999)
Lieberman, H.: Integrating user interface agents with conventional applications. In: International Conference on Intelligent User Interfaces, San Francisco (1998)
Lieberman, H.: Out of many, one: reliable results from unreliable recognition. In: Proceedings of CHI ’02, Minneapolis, MN. ACM Press, New York (2002)
Lieberman, H., Rosenzweig, E., Singh, P.A. (2001): An agent for annotating and retrieving images. IEEE Comput. 34(7), 57–61
Oviatt, S.L.: Designing robust multimodal systems for universal access. In: Workshop on Universal Accessibility of Ubiquitous Computing, pp. 71–74. ACM Press, New York (2001)
viatt, S.L.: Multimodal system processing in mobile environments. In: Proceedings of UIST, pp. 21–30. ACM Press, New York (2000)
Oviatt, S.L. Mutual disambiguation of recognition errors in a multimodal architecture. In: Proceedings of CHI, pp. 576–583. ACM Press, New York (1999)
Oviatt S.L. (2000): Taming recognition errors with a multimodal interface. CACM 43(9): 45–51
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lieberman, H., Chu, A. An interface for mutual disambiguation of recognition errors in a multimodal navigational assistant. Multimedia Systems 12, 393–402 (2007). https://doi.org/10.1007/s00530-006-0052-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-006-0052-y