Skip to main content
Log in

A usability study of multimodal input in an augmented reality environment

  • Original Article
  • Published:
Virtual Reality Aims and scope Submit manuscript

Abstract

In this paper, we describe a user study evaluating the usability of an augmented reality (AR) multimodal interface (MMI). We have developed an AR MMI that combines free-hand gesture and speech input in a natural way using a multimodal fusion architecture. We describe the system architecture and present a study exploring the usability of the AR MMI compared with speech-only and 3D-hand-gesture-only interaction conditions. The interface was used in an AR application for selecting 3D virtual objects and changing their shape and color. For each interface condition, we measured task completion time, the number of user and system errors, and user satisfactions. We found that the MMI was more usable than the gesture-only interface conditions, and users felt that the MMI was more satisfying to use than the speech-only interface conditions; however, it was neither more effective nor more efficient than the speech-only interface. We discuss the implications of this research for designing AR MMI and outline directions for future work. The findings could also be used to help develop MMIs for a wider range of AR applications, for example, in AR navigation tasks, mobile AR interfaces, or AR game applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Azuma RT (1997) A survey of augmented reality. Presence: Teleoperators Virtual Environ 6(4):355–385

    Google Scholar 

  • Bevan N (1995) Measuring usability as quality of use. Softw Qual J 4:111–150

    Article  Google Scholar 

  • Bolt RA (1980) Put-that-there: voice and gesture at the graphics interface. Proc Annu Conf Comput Graph Interact Tech 14(3):262–270

  • Borgefors G (1986) Distance transformations in digital images. Comput Vis Graph Image Process 34:344–371

    Article  Google Scholar 

  • Chai D, Bouzerdoum A (2000) A Bayesian approach to skin color classification in YCbCr color space. Proc IEEE TENCONO’00 2:421–424

    Google Scholar 

  • Chu CP, Dani TH, Gadh R (1997) Multimodal interface for a virtual reality based computer aided design system. Proc IEEE Int Conf Robot Automat 2:1329–1334

    Article  Google Scholar 

  • Cohen PR, Sullivan JW (1989) Synergistic user of direct manipulation and natural language. In: CHI'89 Proceedings of the SIGCHI conference on human factors in computing systems. ACM Press, New York, pp 227–233

  • Cohen PR, Johnston M, McGee D, Oviatt S, Pittman J, Smith I, Chen L, Clow J (1997) QuickSet: multimodal interaction for distributed applications. In: Proceedings of the fifth ACM international conference on multimedia. ACM Press, New York, pp 31–40

  • Fels S, Hinton G (1995) Glove-TalkII: an adaptive gesture-to-formant interface. In: CHI'95 Proceedings of the SIGCHI conference on human factors in computing systems. ACM Press, New York, pp 456–463

  • Frøkjær E, Hertzum M, Hornbæk K (2000) Measuring usability: are effectiveness, efficiency, and satisfaction really correlated? CHI Conf Proc 2(1):345–352

    Google Scholar 

  • Hartley R, Zisserman A (2004) Multiple view geometry in computer vision. Cambridge University Press, Cambridge

  • Hauptmann AG (1989) Speech and gestures for graphic image manipulation. CHI Conf Proc 241–245

  • Heidemann G, Bax I, Bekel H (2004) Multimodal interaction in an augmented reality scenario. In: ICMI’04 Proceedings of the 6th international conference on multimodal interfaces. ACM, New York, pp 53–60

  • Irawati S, Green S, Billinghurst M, Duenser A, Ko H (2006a) Move the couch where? Developing an augmented reality multimodal interface. ICAT: 1–4

  • Irawati S, Green S, Billinghurst M, Duenser A, Ko H (2006b) An evaluation of an augmented reality multimodal interface using speech and paddle gestures. In: Advances in artificial reality and tele-existence, Lecture notes in computer science, vol 4282. pp 272–283

  • LaViola Jr. JJ (1999) A multimodal interface framework for using hand gestures and speech in virtual environment applications. Gesture-Based Commun Hum Comp Interact 303–341

  • Kaiser E, Olwal A, McGee D, Benko H, Corradini A, Li X, Cohen P, Feiner S (2003) Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality. Proceedings of international conference on multimodal interfaces 12–19

  • Kato H, Billinghurst M, Poupyrev I, Imamoto K, Tachibana K (2000) Virtual object manipulation on a table-top AR environment. In: Proceedings of the international symposium on augmented reality (ISAR 2000). Munich, Germany, pp 111–119

  • Kölsch M, Turk M, Tobias H (2004) Vision-based interfaces for mobility. Proc MobiQuitous’04 86–94

  • Kölsch M, Turk M, Tobias H (2006) Multimodal interaction with a wearable augmented reality system. IEEE Comput Graph Appl 26(3):62–71

    Article  Google Scholar 

  • Koons DB, Sparrell CJ (1994) ICONIC: speech and depictive gestures at the human-machine interface. In: CHI'94 Conference companion on human factors in computing systems. ACM, New York, pp 453–454

  • Krum DM, Omotesto O, Ribarsky W, Starner T, Hodges LF (2002) Speech and gesture control of a whole earth 3D visualization environment. Proc Jt Eurograph-IEEE TCVG Symp Vis 195–200

  • Latoschik ME (2001) A gesture processing framework for multimodal interaction in virtual reality. AFRIGRAPH 2001:95–100

    Article  Google Scholar 

  • Lee M, Billinghurst M (2008) A wizard of Oz study for an AR multimodal interface. Proc Int Conf Multimod Interfaces 249–256

  • Lucente M, Zwart GJ, George AD (1998) Visualization space: a testbed for deviceless multimodal user interface. Proc AAAI Spring Symp Intell Environ. AAAI TR SS-98-02

  • Olwal A, Benko H, Feiner S (2003) Sense shapes: using statistical geometry for object selection in a multimodal augmented reality system. Proc Int Symp Mix Augment Real 300–301

  • Oviatt S, Coulson R, Lunsford R (2004) When Do We Interact Multimodally? Cognitive load and multimodal communication patterns. Proc Int Conf Multimod Interfaces 129–136

  • Point Grey Research Inc (2009) http://www.ptgrey.com/products/stereo.asp. Accessed 20 Nov 2009 [26]

  • Quek F, McNeil D, Bryll R, Duncan S, Ma X, Kirbas C, McCullough KE, Ansari R (2002) Multimodal human discourse: gesture and speech. TOCHI 9(3):171–193

    Article  Google Scholar 

  • Rauschert I, Agrawal P, Sharmar R, Fuhrmann S, Brewer I, MacEachren A, Wang H, Cai G (2002) Designing a human-centered, multimodal GIS interface to support emergency management. Proc Geogrc Inf Syst 119–124

  • Shneiderman B (2000) The limits of speech recognition. Commun ACM 43(9):63–65

    Article  Google Scholar 

  • Tse E, Greenberg S, Shen C (2006) GSI DEMO: Multiuser gesture/speech interaction over digital tables by wrapping single user applications. Proc Int Conf Multimod Interfaces 76–83

  • Weimer D, Genapathy SK (1989) A synthetic visual environment with hand gesturing and voice input. Proc Conf Hum Factors Comput Syst 235–240

  • Zhang Z (2000) A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 1330–1334

Download references

Acknowledgments

This work was supported by the DigiLog Miniature Augmented Reality Research Program funded by KAIST Research Foundation. It was supported by the Global Frontier R&D Program on <Human-centered Interaction for Coexistence> funded by the National Research Foundation of Korea grant funded by the Korean Government (MSIP) (NRF-2010-0029751).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minkyung Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, M., Billinghurst, M., Baek, W. et al. A usability study of multimodal input in an augmented reality environment. Virtual Reality 17, 293–305 (2013). https://doi.org/10.1007/s10055-013-0230-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10055-013-0230-0

Keywords

Navigation