Skip to main content
Log in

An interface for mutual disambiguation of recognition errors in a multimodal navigational assistant

  • Regular paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Users often have tasks that can be accomplished with the aid of multiple media – for example with text, sound and pictures. For example, communicating an urban navigation route can be expressed with pictures and text. Today’s mobile devices have multimedia capabilities; cell phones have cameras, displays, sound output, and (soon) speech recognition. Potentially, these multimedia capabilities can be used for multimedia-intensive tasks, but two things stand in the way. First, recognition of visual input and speech recognition still remain unreliable. Second, the mechanics of integrating multiple media and recognition systems remains daunting for users. We address both these issues in a system, MARCO, multimodal agent for route construction. MARCO collects route information by taking pictures of landmarks, accompanied by verbal directions. We combine results from off-the-shelf speech recognition and optical character recognition to achieve better recognition of route landmarks than either recognition system alone. MARCO automatically produces an illustrated, step-by-step guide to the route.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abowd G.D., Mankoff J., Hudson S.E. (2000): OOPS: a toolkit supporting mediation techniques for resolving ambiguity in recognition-based interfaces Special Issue on Calligraphic Interfaces. Comput Graph. 24(6): 819–834

    Article  Google Scholar 

  2. Almeida, L., Amdal, I., Beires, N., Boualem, M., Boves, L., den Os, E., Filoche, P., Gomes, R. , Eikeset Kudsen, J., Kvale, K., Rugelbak, J., Tallec, C., Warakagoda, N.: Implementing and evaluating a multimodal and multilingual touris guide. In: van Kuppevelt, J. et al. (eds.) Proceedings of the International CLASS Workshop on Natural, Intelligent and Effective Interaction in Multimodal Dialogue Systems, 28–29 June 2002, pp. 1–7. Copenhagen, Denmark

  3. Alvarado, C., Randall, D.: Resolving ambiguities to create a natural sketch based interface. In: Proceedings of IJCAI-2001 (2001)

  4. Cheyer A., Julia, L.: Multimodal maps: an agent-based approach. In: Bunt H., Beun R.J., Borghuis T. (eds) Multimodal Human–Computer Communication. Lecture Notes in Artificial Intelligence, vol. 1374, Springer, Berlin Heidelberg New York, pp. 111–121

  5. Donath, J., Karahalios, K., Rozier, J.: HearThere: an augmented reality system of linked audio. ICAD (2000) http://www.icad.org/websiteV2.0/Conferences/ICAD2000/ ICAD2000.html

  6. Haritaoglu, I.: InfoScope: link from real world to digital information space. In: Proceedings of Ubicomp, Atlanta, pp. 247–255 (2001)

  7. Johnston, M., Srinivas B., Gunaranjan V.: MATCH: multimodal access to city help. In: Automatic Speech Recognition and Understanding Workshop, Madonna Di Campiglio, Trento, Italy (2001)

  8. Kushmerick, N., Thomas, B.: Adaptive information extraction: core technologies for information agents. In: Intelligent Information Agents R&D in Europe: An Agent Link perspective. Springer, Berlin Heidelberg New York (in press, 2002)

  9. Lee, P., Tversky, B.: Pictorial and verbal tools for conveying routes. In: Proceedings of COSIT, pp. 51–64 (1999)

  10. Lieberman, H.: Integrating user interface agents with conventional applications. In: International Conference on Intelligent User Interfaces, San Francisco (1998)

  11. Lieberman, H.: Out of many, one: reliable results from unreliable recognition. In: Proceedings of CHI ’02, Minneapolis, MN. ACM Press, New York (2002)

  12. Lieberman, H., Rosenzweig, E., Singh, P.A. (2001): An agent for annotating and retrieving images. IEEE Comput. 34(7), 57–61

    Google Scholar 

  13. Oviatt, S.L.: Designing robust multimodal systems for universal access. In: Workshop on Universal Accessibility of Ubiquitous Computing, pp. 71–74. ACM Press, New York (2001)

  14. viatt, S.L.: Multimodal system processing in mobile environments. In: Proceedings of UIST, pp. 21–30. ACM Press, New York (2000)

  15. Oviatt, S.L. Mutual disambiguation of recognition errors in a multimodal architecture. In: Proceedings of CHI, pp. 576–583. ACM Press, New York (1999)

  16. Oviatt S.L. (2000): Taming recognition errors with a multimodal interface. CACM 43(9): 45–51

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Henry Lieberman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lieberman, H., Chu, A. An interface for mutual disambiguation of recognition errors in a multimodal navigational assistant. Multimedia Systems 12, 393–402 (2007). https://doi.org/10.1007/s00530-006-0052-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-006-0052-y

Keywords

Navigation