Multimodal Interfaces for Augmented Reality

Billinghurst, Mark; Lee, Minkyung

doi:10.1007/978-1-4471-2804-5_25

Mark Billinghurst⁶ &
Minkyung Lee⁶

2832 Accesses

Abstract

Recent research in multimodal speech and gesture input for Augmented Reality (AR) applications is described. Although multimodal input has been researched for desktop applications, 3D graphics and virtual reality interfaces, there have been very few multimodal AR applications. We review previous work in the area and then present our own multimodal interfaces and user studies conducted with those interfaces. Based on these studies we provide design guidelines for developing effective multimodal AR experiences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

ARToolKit website (2012). http://www.hitl.washington.edu/artoolkit.
Azuma, R. T. (1997). A survey of augmented reality. Presence: Teleoperators and Virtual Environments, 6(4), 355–385.
Google Scholar
Bolt, R. A. (1980). “Put-that-there”: Voice and gesture at the graphics interface. In Proc. annual conference on computer graphics and interactive techniques (pp. 262–270).
Google Scholar
Borgefors, G. (1986). Distance transformations in digital images. In Computer vision, graphics and image processing (pp. 344–371).
Google Scholar
Broll, W., Stoerring, M., & Mottram, C. (2003). The augmented round table—a new interface to urban planning and architectural design. In Proc. INTERACT’03 (pp. 1103–1104).
Google Scholar
Chai, D., & Bouzerdoum, A. (2000). A Bayesian approach to skin color classification in YCbCr color space. In Proceedings of IEEE TENCONO’00 (Vol. 2, pp. 421–424).
Google Scholar
Chu, C. P., Dani, T. H., & Gadh, R. (1997). Multimodal interface for a virtual reality based computer aided design system. In Proceedings of 1997 IEEE international conference on robotics and automation (Vol. 2, pp. 1329–1334).
Google Scholar
Ciger, J., Gutierrez, M., Vexo, F., & Thalmann, D. (2003). The magic wand. In Proceedings of the 19th spring conference on computer graphics (pp. 119–124).
Google Scholar
Cohen, P. R., & Sullivan, J. W. (1989). Synergistic user of direct manipulation and natural language. In Proc. CHI ’89 (pp. 227–233).
Google Scholar
Cohen, P. R., Dalrymple, M., Pereira, F. C. N., Sullivan, J. W., Gargan Jr., R. A., Schlossberg, J. L., & Tyler, S. W. (1989). Synergistic use of direct manipulation and natural language. In Proceedings of ACM conference on human factors in computing systems (CHI ’89) (pp. 227–233).
Google Scholar
Cohen, P. R., Johnston, M., McGee, D., & Oviatt, S. (1997). QuickSet: Multimodal interaction for distributed applications. In Proc. international conference on multimedia (pp. 31–40).
Chapter Google Scholar
Denecke, M. (2002). Rapid prototyping for spoken dialogue systems. In Proceedings of the 19th international conference on computational linguistics (Vol. 1, pp. 1–7).
Chapter Google Scholar
Hauptmann, A. G. (1989). Speech and gestures for graphic image manipulation. In Proc. CHI ’89 (pp. 241–245).
Google Scholar
Heidemann, G., Bax, I., & Bekel, H. (2004). Multimodal interaction in an augmented reality scenario. In Proceedings of international conference on multimodal interfaces (ICMI’04) (pp. 53–60).
Chapter Google Scholar
Holzapfel, H., Nickel, K., & Stiefelhagen, R. (2004). Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures. In Proceedings of the 6th international conference on multimodal interfaces (pp. 175–182). New York: ACM Press.
Chapter Google Scholar
ICE website (2012). http://www.zeroc.com/ice.html.
Irawati, S., Green, S., Billinghurst, M., Duenser, A., & Ko, H. (2006a). Move the couch where?: Developing an augmented reality multimodal interface. In Proc. ICAT ’06 (pp. 1–4).
Google Scholar
Irawati, S., Green, S., Billinghurst, M., Duenser, A., & Ko, H. (2006b). An evaluation of an augmented reality multimodal interface using speech and paddle gestures. In Proc. of ICAT ’06 (pp. 272–283).
Google Scholar
Ishii, H., & Ullmer, B. (1997). Tangible bits: Towards seamless interfaces between people, bits and atoms. In Proceedings of CHI ‘97, Atlanta, Georgia, USA (pp. 234–241). New York: ACM Press.
Google Scholar
Kaiser, E., Olwal, A., McGee, D., Benko, H., Corradini, A., Li, X., Cohen, P., & Feiner, S. (2003). Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality. In Proceedings of international conference on multimodal interfaces (ICMI ‘03) (pp. 12–19).
Chapter Google Scholar
Kato, H., Billinghurst, M., Poupyrev, I., Imamoto, K., & Tachibana, K. (2000). Virtual object manipulation on a table-top AR environment. In Proceedings of the international symposium on augmented reality (ISAR 2000) (pp. 111–119).
Chapter Google Scholar
Kay, P. (1993). Speech driven graphics: A user interface. Journal of Microcomputer Applications, 16, 223–231.
Article Google Scholar
Kölsch, M., Turk, M., & Tobias, H. (2004). Vision-based interfaces for mobility. In Proceedings of MobiQuitous’04 (pp. 86–94).
Google Scholar
Kölsch, M., Turk, M., & Tobias, H. (2006). Multimodal interaction with a wearable augmented reality system. IEEE Computer Graphics and Applications, 26(3), 62–71.
Article Google Scholar
Koons, D. B., & Sparrell, C. J. (1994). ICONIC: Speech and depictive gestures at the human-machine interface. In Proc. CHI ’94 (pp. 453–454).
Google Scholar
Krum, D. M., Omotesto, O., Ribarsky, W., Starner, T., & Hodges, L. F. (2002). Speech and gesture control of a whole earth 3D visualization environment. In Proceedings of joint Eurographics-IEEE TCVG symposium on visualization (pp. 195–200).
Google Scholar
Latoschik, M. E. (2001). A gesture processing framework for multimodal interaction in virtual reality. In Proc. AFRIGRAPH 2001 (pp. 95–100).
Chapter Google Scholar
LaViola, J. J. Jr. (1999). A multimodal interface framework for using hand gestures and speech in virtual environment applications. In Gesture-based communication in human-computer interaction (pp. 303–341).
Chapter Google Scholar
Lee, M., & Billinghurst, M. (2008). A wizard of oz study for an AR multimodal interface. In Proceedings of international conference on multimodal interfaces (ICMI ‘08) (pp. 249–256).
Google Scholar
Lucente, M., Zwart, G. J., & George, A. D. (1998). Visualization space: A testbed for deviceless multimodal user interface. In Proceedings of AAAI spring symposium on intelligent environments. AAAI TR SS-98-02.
Google Scholar
Microsoft Speech API (2012). http://en.wikipedia.org/wiki/Microsoft_Speech_API.
Nakashima, K., Machida, T., Kiyokawa, K., & Takemura, H. (2005). A 2D-3D integrated environment for cooperative work. In Proc. VRST ’05 (pp. 16–22).
Chapter Google Scholar
Olwal, A., Benko, H., & Feiner, S. (2003). SenseShapes: Using statistical geometry for object selection in a multimodal augmented reality system. In Proceedings of international symposium on mixed and augmented reality (ISMAR ’03) (pp. 300–301).
Chapter Google Scholar
Oviatt, S., Coulson, R., & Lunsford, R. (2004). When do we interact multimodally? Cognitive load and multimodal communication patterns. In Proc. ICMI ’04 (pp. 129–136).
Chapter Google Scholar
Point Grey Research Inc (2009). http://www.ptgrey.com/products/stereo.asp.
Rauschert, I., Agrawal, P., Sharmar, R., Fuhrmann, S., Brewer, I., MacEachren, A., Wang, H., & Cai, G. (2002) Designing a human-centered, multimodal GIS interface to support emergency management. In Proceedings of geographic information system (pp. 119–124).
Google Scholar
Sutherland, I. (1965). The ultimate display. In International federation of information processing (Vol. 2, pp. 506–508).
Google Scholar
Tse, E., Greenberg, S., & Shen, C. (2006). GSI DEMO: Multiuser gesture/speech interaction over digital tables by wrapping single user applications. In Proc. ICMI ’06 (pp. 76–83).
Chapter Google Scholar
Weimer, D., & Ganapathy, S. K. (1989). A synthetic visual environment with hand gesturing and voice input. In Proc. CHI ’89 (pp. 235–240).
Google Scholar

Download references

Author information

Authors and Affiliations

HIT Lab NZ, University of Canterbury, Christchurch, New Zealand
Mark Billinghurst & Minkyung Lee

Authors

Mark Billinghurst
View author publications
You can also search for this author in PubMed Google Scholar
Minkyung Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mark Billinghurst .

Editor information

Editors and Affiliations

School of Interactive Arts & Technology, Simon Fraser University, 102 Ave 250-13450, Surrey, V3T 0A3, British Columbia, Canada
John Dill
School of Computing, Informatics & Media, Centre for Visual Computing, University of Bradford, Richmond Road, Bradford, BD7 1DP, United Kingdom
Rae Earnshaw
The Boeing Company, South Trenton Street, Seattle, 98124, Washington, USA
David Kasik
Bournemouth Media School, National Centre for Computer Animation, Bournemouth University, Talbot Campus, Poole, BH12 5BB, Dorset, United Kingdom
John Vince
Pacific Northwest National Laboratory, Battelle Boulevard 902, Richland, 99352, Washington, USA
Pak Chung Wong

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Billinghurst, M., Lee, M. (2012). Multimodal Interfaces for Augmented Reality. In: Dill, J., Earnshaw, R., Kasik, D., Vince, J., Wong, P. (eds) Expanding the Frontiers of Visual Analytics and Visualization. Springer, London. https://doi.org/10.1007/978-1-4471-2804-5_25

Download citation

DOI: https://doi.org/10.1007/978-1-4471-2804-5_25
Publisher Name: Springer, London
Print ISBN: 978-1-4471-2803-8
Online ISBN: 978-1-4471-2804-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics