Skip to main content
Log in

Flexible and robust multimodal interfaces for universal access

  • Special issue on multimodality: a step towards universal access
  • Published:
Universal Access in the Information Society Aims and scope Submit manuscript

Abstract

Multimodal interfaces are inherently flexible, which is a key feature that makes them suitable for both universal access and next-generation mobile computing. Recent studies also have demonstrated that multimodal architectures can improve the performance stability and overall robustness of the recognition-based component technologies they incorporate (e.g., speech, vision, pen input). This paper reviews data from two recent studies in which a multimodal architecture suppressed errors and stabilized system performance for accented speakers and during mobile use. It concludes with a discussion of key issues in the design of future multimodal interfaces for diverse user groups.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Adjoudani A, Benoit C (1995) Audio-visual speech recognition compared across two architectures. In: Proceedings of the Eurospeech Conference, Madrid, Spain, 1995, vol 2, pp 1563–1566

  2. Benoit C, Martin J-C, Pelachaud C, Schomaker L, Suhm B (2000) Audio-visual and multimodal speech-based systems. In: Gibbon D, Mertins I, Moore R (eds) Handbook of multimodal and spoken dialogue systems: resources, terminology and product evaluation. Kluwer, Boston, Mass.

  3. Cohen PR, Johnston M, McGee D, Oviatt S, Pittman J, Smith I, Chen L, Clow J (1997) QuickSet: multimodal interaction for distributed applications. In: Proceedings of the fifth ACM international multimedia conference, Seattle, Wash., 9–13 November 1997. ACM Press, New York

  4. Fell H, Delta H, Peterson R, Ferrier L, Mooraj Z, Valleau M (1994) Using the baby babble-blanket for infants with motor problems: an empirical study. In: Proceedings of the first annual conference on assistive technologies (ASSETS’94). Marina del Rey, Calif., 31 October – 1 November 1994. ACM Press, New York

  5. Fuster-Duran A (1996) Perception of conflicting audio-visual speech: an examination across Spanish and German. In: Stork DG, Hennecke ME (eds) Speechreading by humans and machines: models, systems and applications. Springer, Berlin Heidelberg New York

  6. Karshmer AI, Blattner M (eds) (1998) Proceedings of the fifth annual conference on assistive technologies (ASSETS’98). Marina del Rey, Calif., 15–17 April 1998. ACM Press, New York

  7. Kricos PB (1996) Differences in visual intelligibility across talkers. In: Stork DG, Hennecke ME (eds) Speechreading by humans and machines: models, systems and applications. Springer, Berlin Heidelberg New York

  8. Markinson R (1993) Personal communication, University of California at San Francisco Medical School

  9. Massaro DW (1996) Bimodal speech perception: a progress report. In: Stork DG, Hennecke ME (eds) Speechreading by humans and machines: models, systems and applications. Springer, Berlin Heidelberg New York

  10. Oviatt SL (1996) Multimodal interfaces for dynamic interactive maps. In: Proceedings of the conference on human factors in computing systems (CHI’96), Vancouver, 13–18 April 1996. ACM Press, New York

  11. Oviatt SL (1999) Mutual disambiguation of recognition errors in a multimodal architecture. In: Proceedings of the conference on human factors in computing systems (CHI’99), Pittsburgh, Pa., 15–20 May 1999. ACM Press, New York

  12. Oviatt SL (1999) Ten myths of multimodal interaction. Commun ACM 42(11):74–81

    Google Scholar 

  13. Oviatt SL (2000) Multimodal system processing in mobile environments. In: Proceedings of the thirteenth annual ACM Symposium on user interface software technology. ACM Press, New York

  14. Oviatt SL (2002) Breaking the robustness barrier: recent progress on the design of robust multimodal systems. In: Zelkowitz M (ed) Advances in computers, vol 56. Academic Press, New York

  15. Oviatt SL, Cohen PR, Wu L, Vergo J, Duncan E, Suhm B, Bers J, Holzman T, Winograd T, Landay J, Larson J, Ferro D (2000) Designing the user interface for multimodal speech and gesture applications: state-of-the-art systems and research directions. Human Comput Interaction 15(26):3–322. Reprinted in: Carroll J (ed) Human–computer interaction in the new millennium, Addison-Wesley, Reading, Mass., 2001

  16. Oviatt SL, DeAngeli A, Kuhn K (1997) Integration and synchronization of input modes during multimodal human–computer interaction. In: Proceedings of the conference on human factors in computing systems (CHI’97), Atlanta, Ga., 22–27 March 1997. ACM Press, New York

  17. Potamianos A, Narayanan S, Lee S (1997) Automatic speech recognition for children. In: Proceedings of European conference on speech communication and technology (EUROSPEECH’97), vol 5, Rhodes, Greece, 22–25 September 1997

  18. Robert-Ribes J, Schwartz J-L, Lallouache T, Escudier P (1998) Complementarity and synergy in bimodal speech: auditory, visual, and audio-visual identification of French oral vowels in noise. J Acoust Soc Am 103(6):3677–3689

    Google Scholar 

  19. Sekiyama K, Tohkura Y (1991) McGurk effect in non-English listeners: few visual effects for Japanese subjects hearing Japanese syllables of high auditory intelligibility. J Acoust Soc Am 90:1797–1805

    Google Scholar 

  20. Tomlinson J, Russell MJ, Brooke NM (1996) Integrating audio and visual information to provide highly robust speech recognition. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, Munich, 21–24 April 1997. IEEE Press, Atlanta

  21. Turk M, Robertson G (eds) (2000) Perceptual user interfaces (Special issue). Commun ACM 43(3):32–70

    Google Scholar 

  22. Wilpon J, Jacobsen C (1997) A study of speech recognition for children and the elderly. In: Proceedings of the international conference on acoustics, speech and signal processing, Munich, 21–24 April 1997. IEEE Press, Atlanta

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Oviatt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oviatt, S. Flexible and robust multimodal interfaces for universal access. UAIS 2, 91–95 (2003). https://doi.org/10.1007/s10209-002-0041-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10209-002-0041-7

Keywords

Navigation