ABSTRACT
Progress in computer vision and speech recognition technologies has recently enabled multimodal interfaces that use speech and gestures. These technologies o er promising alternatives to existing interfaces because they emulate the natural way in which humans communicate. However, no systematic work has been reported that formally evaluates the new speech/gesture interfaces. This paper is concerned with formal experimental evaluation of new human-computer interactions enabled by speech and hand gestures.The paper describes an experiment conducted with 23 subjects that evaluates selection strategies for interaction with large screen displays. The multimodal interface designed for this experiment does not require the user to be in physical contact with any device. Video cameras and long range microphones are used as input for the system. Three selection strategies are evaluated and results for Different target sizes and positions are reported in terms of accuracy, selection times and user preference. Design implications for vision/speech based interfaces are inferred from these results. This study also raises new question and topics for future research.
- ACM. 2001 Workshop on Perceptive User Interfaces (PUI '01), to be held on November 11--14 2001. Google ScholarDigital Library
- Advanced Interface Technologies, Inc. http://www.advancedinterfaces.com.Google Scholar
- H. Ando, Y. Kitahara, and N. Hataoka. Evaluation of multimodal interface using spoken language and pointing gesture on interior design system. In International Conference on Spoken Language Processing, pages 567--570, 1994.Google Scholar
- T. Baudel and M. Beaudouin-Lafon. Charade: Remote control of objects using free-hand gestures. Communications of the ACM, 36(7):28--35, 1993. Google ScholarDigital Library
- R. Bolt. Put-that-there: voice and gesture at the graphics interface. Computer Graphics, 14(3):262--270, 1980. Google ScholarDigital Library
- S. A. Douglas, A. E. Kirkpatrick, and I. S. MacKenzie. Testing pointing device performance and user assessment with the ISO 9241, part 9 standard. Proceeding of the CHI 99 conference on Human factors in computing systems, pages 215--222, 1999. Google ScholarDigital Library
- D. Franklin. Cooperating with people: the intelligent classroom. In AAAI/IAAI, pages 555--560, 1998. Google ScholarDigital Library
- G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais. The vocabulary problem in human-system communication. Communications of the ACM, 30(11):964--971, 1987. Google ScholarDigital Library
- E. Graham and C. L. MacKenzie. Pointing on a computer display. Conference companion on Human factors in computing systems, pages 314--315, 1995. Google ScholarDigital Library
- M. A. Grasso, D. S. Ebert, and T. W. Finin. The integrality of speech in multimodal interfaces. ACM Transactions on Computer-Human Interaction, 5(4):303--325, 1998. Google ScholarDigital Library
- ISO. Report number ISO/TC 159/SC4/WG3 N147: Ergonomic requirements for office work with visual display terminals (VDTs) - part 9 - requirements for non-keyboard input devices (ISO 9241-9). International Organisation for Standardisation, 1998.Google Scholar
- D. B. Koons, C. J. Sparrell, and K. R. Thorisson. Integrating simultaneous input from speech, gaze, and hand gestures. In AAAI Workshop on Intelligent Multimedia Interfaces, pages 257--276, 1991. Google ScholarDigital Library
- I. S. MacKenzie and A. Oniszczak. A comparison of three selection techniques for touchpads. Proceedings of the ACM Conference on Human Factors in Computing Systems - CHI '98, pages 336--343, 1998. Google ScholarDigital Library
- L. Mark, G. Zwart, and A. George. Visualization space: A testbed for deviceless multimodal user interface. In Intelligent Environments 98, AAAI Spring Symposium Series, pages 87--92, 1998.Google Scholar
- X. Ren and S. Moriya. Improving selection performance on pen-based systems: a study of pen-based interaction for selection tasks. ACM Transactions on Computer-Human Interaction, 7(3):384--416, 2000. Google ScholarDigital Library
- E. Schapira. Experimental evaluation of vision and speech based multimodal interfaces. Master's thesis, The Pennsylvania State University, Aug. 2001.Google Scholar
- R. Sharma, V. Pavlovic, and T. Huang. Toward multimodal human-computer interface. Proc. IEEE, Special issue on Multimedia Signal Processing, 86(5):853--869, 1998.Google ScholarCross Ref
- R. Sharma, I. Poddar, E. Ozyildiz, S. Kettebekov, H. Kim, and T. S. Huang. Toward interpretation of natural speech/gesture for spatial planning on a virtual map. In In Proc. 1999 Advanced Display Federated Laboratory Symposium, pages 35--39, Adelphi, MD, 1999.Google Scholar
- B. Suhm, B. Myers, and A. Waibel. Multimodal error correction for speech user interfaces. ACM Transactions on Computer-Human Interaction, 8(1):60--98, 2001. Google ScholarDigital Library
- Experimental evaluation of vision and speech based multimodal interfaces
Recommendations
Multimodal error correction for speech user interfaces
Although commercial dictation systems and speech-enabled telephone voice user interfaces have become readily available, speech recognition errors remain a serious problem in the design and implementation of speech user interfaces. Previous work ...
An exploration of gesture-speech multimodal patterns for touch interfaces
IndiaHCI '11: Proceedings of the 3rd Indian Conference on Human-Computer InteractionMultimodal interfaces that integrate multiple input modalities such as speech, gestures, gaze, and so on have shown considerable promise in terms of higher task efficiency, lower error rates and higher user satisfaction. However, the adoption of such ...
Multimodal speech and pen interfaces
The Handbook of Multimodal-Multisensor InterfacesThis chapter describes interfaces that enable users to combine digital pen and speech input for interacting with computing systems. Such interfaces promise natural and efficient interaction, taking advantage of skills that users have developed over many ...
Comments