Skip to main content
  • 2832 Accesses

Abstract

Recent research in multimodal speech and gesture input for Augmented Reality (AR) applications is described. Although multimodal input has been researched for desktop applications, 3D graphics and virtual reality interfaces, there have been very few multimodal AR applications. We review previous work in the area and then present our own multimodal interfaces and user studies conducted with those interfaces. Based on these studies we provide design guidelines for developing effective multimodal AR experiences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • ARToolKit website (2012). http://www.hitl.washington.edu/artoolkit.

  • Azuma, R. T. (1997). A survey of augmented reality. Presence: Teleoperators and Virtual Environments, 6(4), 355–385.

    Google Scholar 

  • Bolt, R. A. (1980). “Put-that-there”: Voice and gesture at the graphics interface. In Proc. annual conference on computer graphics and interactive techniques (pp. 262–270).

    Google Scholar 

  • Borgefors, G. (1986). Distance transformations in digital images. In Computer vision, graphics and image processing (pp. 344–371).

    Google Scholar 

  • Broll, W., Stoerring, M., & Mottram, C. (2003). The augmented round table—a new interface to urban planning and architectural design. In Proc. INTERACT’03 (pp. 1103–1104).

    Google Scholar 

  • Chai, D., & Bouzerdoum, A. (2000). A Bayesian approach to skin color classification in YCbCr color space. In Proceedings of IEEE TENCONO’00 (Vol. 2, pp. 421–424).

    Google Scholar 

  • Chu, C. P., Dani, T. H., & Gadh, R. (1997). Multimodal interface for a virtual reality based computer aided design system. In Proceedings of 1997 IEEE international conference on robotics and automation (Vol. 2, pp. 1329–1334).

    Google Scholar 

  • Ciger, J., Gutierrez, M., Vexo, F., & Thalmann, D. (2003). The magic wand. In Proceedings of the 19th spring conference on computer graphics (pp. 119–124).

    Google Scholar 

  • Cohen, P. R., & Sullivan, J. W. (1989). Synergistic user of direct manipulation and natural language. In Proc. CHI ’89 (pp. 227–233).

    Google Scholar 

  • Cohen, P. R., Dalrymple, M., Pereira, F. C. N., Sullivan, J. W., Gargan Jr., R. A., Schlossberg, J. L., & Tyler, S. W. (1989). Synergistic use of direct manipulation and natural language. In Proceedings of ACM conference on human factors in computing systems (CHI ’89) (pp. 227–233).

    Google Scholar 

  • Cohen, P. R., Johnston, M., McGee, D., & Oviatt, S. (1997). QuickSet: Multimodal interaction for distributed applications. In Proc. international conference on multimedia (pp. 31–40).

    Chapter  Google Scholar 

  • Denecke, M. (2002). Rapid prototyping for spoken dialogue systems. In Proceedings of the 19th international conference on computational linguistics (Vol. 1, pp. 1–7).

    Chapter  Google Scholar 

  • Hauptmann, A. G. (1989). Speech and gestures for graphic image manipulation. In Proc. CHI ’89 (pp. 241–245).

    Google Scholar 

  • Heidemann, G., Bax, I., & Bekel, H. (2004). Multimodal interaction in an augmented reality scenario. In Proceedings of international conference on multimodal interfaces (ICMI’04) (pp. 53–60).

    Chapter  Google Scholar 

  • Holzapfel, H., Nickel, K., & Stiefelhagen, R. (2004). Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures. In Proceedings of the 6th international conference on multimodal interfaces (pp. 175–182). New York: ACM Press.

    Chapter  Google Scholar 

  • ICE website (2012). http://www.zeroc.com/ice.html.

  • Irawati, S., Green, S., Billinghurst, M., Duenser, A., & Ko, H. (2006a). Move the couch where?: Developing an augmented reality multimodal interface. In Proc. ICAT ’06 (pp. 1–4).

    Google Scholar 

  • Irawati, S., Green, S., Billinghurst, M., Duenser, A., & Ko, H. (2006b). An evaluation of an augmented reality multimodal interface using speech and paddle gestures. In Proc. of ICAT ’06 (pp. 272–283).

    Google Scholar 

  • Ishii, H., & Ullmer, B. (1997). Tangible bits: Towards seamless interfaces between people, bits and atoms. In Proceedings of CHI ‘97, Atlanta, Georgia, USA (pp. 234–241). New York: ACM Press.

    Google Scholar 

  • Kaiser, E., Olwal, A., McGee, D., Benko, H., Corradini, A., Li, X., Cohen, P., & Feiner, S. (2003). Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality. In Proceedings of international conference on multimodal interfaces (ICMI ‘03) (pp. 12–19).

    Chapter  Google Scholar 

  • Kato, H., Billinghurst, M., Poupyrev, I., Imamoto, K., & Tachibana, K. (2000). Virtual object manipulation on a table-top AR environment. In Proceedings of the international symposium on augmented reality (ISAR 2000) (pp. 111–119).

    Chapter  Google Scholar 

  • Kay, P. (1993). Speech driven graphics: A user interface. Journal of Microcomputer Applications, 16, 223–231.

    Article  Google Scholar 

  • Kölsch, M., Turk, M., & Tobias, H. (2004). Vision-based interfaces for mobility. In Proceedings of MobiQuitous’04 (pp. 86–94).

    Google Scholar 

  • Kölsch, M., Turk, M., & Tobias, H. (2006). Multimodal interaction with a wearable augmented reality system. IEEE Computer Graphics and Applications, 26(3), 62–71.

    Article  Google Scholar 

  • Koons, D. B., & Sparrell, C. J. (1994). ICONIC: Speech and depictive gestures at the human-machine interface. In Proc. CHI ’94 (pp. 453–454).

    Google Scholar 

  • Krum, D. M., Omotesto, O., Ribarsky, W., Starner, T., & Hodges, L. F. (2002). Speech and gesture control of a whole earth 3D visualization environment. In Proceedings of joint Eurographics-IEEE TCVG symposium on visualization (pp. 195–200).

    Google Scholar 

  • Latoschik, M. E. (2001). A gesture processing framework for multimodal interaction in virtual reality. In Proc. AFRIGRAPH 2001 (pp. 95–100).

    Chapter  Google Scholar 

  • LaViola, J. J. Jr. (1999). A multimodal interface framework for using hand gestures and speech in virtual environment applications. In Gesture-based communication in human-computer interaction (pp. 303–341).

    Chapter  Google Scholar 

  • Lee, M., & Billinghurst, M. (2008). A wizard of oz study for an AR multimodal interface. In Proceedings of international conference on multimodal interfaces (ICMI ‘08) (pp. 249–256).

    Google Scholar 

  • Lucente, M., Zwart, G. J., & George, A. D. (1998). Visualization space: A testbed for deviceless multimodal user interface. In Proceedings of AAAI spring symposium on intelligent environments. AAAI TR SS-98-02.

    Google Scholar 

  • Microsoft Speech API (2012). http://en.wikipedia.org/wiki/Microsoft_Speech_API.

  • Nakashima, K., Machida, T., Kiyokawa, K., & Takemura, H. (2005). A 2D-3D integrated environment for cooperative work. In Proc. VRST ’05 (pp. 16–22).

    Chapter  Google Scholar 

  • Olwal, A., Benko, H., & Feiner, S. (2003). SenseShapes: Using statistical geometry for object selection in a multimodal augmented reality system. In Proceedings of international symposium on mixed and augmented reality (ISMAR ’03) (pp. 300–301).

    Chapter  Google Scholar 

  • Oviatt, S., Coulson, R., & Lunsford, R. (2004). When do we interact multimodally? Cognitive load and multimodal communication patterns. In Proc. ICMI ’04 (pp. 129–136).

    Chapter  Google Scholar 

  • Point Grey Research Inc (2009). http://www.ptgrey.com/products/stereo.asp.

  • Rauschert, I., Agrawal, P., Sharmar, R., Fuhrmann, S., Brewer, I., MacEachren, A., Wang, H., & Cai, G. (2002) Designing a human-centered, multimodal GIS interface to support emergency management. In Proceedings of geographic information system (pp. 119–124).

    Google Scholar 

  • Sutherland, I. (1965). The ultimate display. In International federation of information processing (Vol. 2, pp. 506–508).

    Google Scholar 

  • Tse, E., Greenberg, S., & Shen, C. (2006). GSI DEMO: Multiuser gesture/speech interaction over digital tables by wrapping single user applications. In Proc. ICMI ’06 (pp. 76–83).

    Chapter  Google Scholar 

  • Weimer, D., & Ganapathy, S. K. (1989). A synthetic visual environment with hand gesturing and voice input. In Proc. CHI ’89 (pp. 235–240).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Billinghurst .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag London Limited

About this chapter

Cite this chapter

Billinghurst, M., Lee, M. (2012). Multimodal Interfaces for Augmented Reality. In: Dill, J., Earnshaw, R., Kasik, D., Vince, J., Wong, P. (eds) Expanding the Frontiers of Visual Analytics and Visualization. Springer, London. https://doi.org/10.1007/978-1-4471-2804-5_25

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-2804-5_25

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-2803-8

  • Online ISBN: 978-1-4471-2804-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics