ABSTRACT
This work explores how to use the gaze and the speech command simultaneously to select an object on the screen. Multimodal systems have long been a key mean to reduce the recognition errors of individual components. But the multimodal system generates errors as well. This present study tries to classify the multimodal errors, analyze the reasons causing these errors, and propose the solutions for eliminating them. The goal of this study is to gain insight into multimodal integration errors, and to develop an error self-recoverable multimodal architecture so as to make the error-prone recognition technologies perform at a more stable and robust level within multimodal architecture.
- Bolt, R.A. Put-that-there: Voice and Gesture in the graphics interface. in Proceedings of the ACM conference on Computer Graphics (New York, 1980), 262--270. Google ScholarDigital Library
- Campana, E. Baldridge, J. Dowding, J. Hockey, B. A. Remington, R.W. Stone, L.S. Using eye movements to determine referents in a spoken dialogue system. in Proceedings of Perceptive User Interface (Orland, FL, 2001). Google ScholarDigital Library
- Jacob, R. J. K. (1995). Eye tracking in advanced interface design, In W. Barfield & T. Furness (Ed.), Advanced Interface Design and Virtual Environments (pp. 258--288). Oxford: Oxford University Press. Google ScholarDigital Library
- Koons, D.B., Sparrell, C.J., & Thorisson, K.R. Integrating simultaneous input from speech, gaze and hand gestures. M.Maybury (Eds.), Intelligent Multimedia Interfaces. Menlo Park, CA: MIT Press, 1993, 257--276. Google ScholarDigital Library
- Neal, J.G. Thielman C.Y. Dobes A. Haller S.M. Shapiro S.C. Natural language with integrated deictic and graphic gestures. M.T. Maybury & W. Wahlster(Eds.), Readings In Intelligent User Interfaces. San Francisco, CA: Morgan Kaufmann Press, 1998, 38--51. Google ScholarDigital Library
- Oviatt S.L. Mutual disambiguation of recognition errors in a multimodal Architecture. in Proceedings of CHI '99 (New York, N.Y., 1999). ACM Press, 576--583. Google ScholarDigital Library
- Tanaka, K. A robust selection system using real-time multi-modal user-agent interactions. in Proceedings of the 1999 conference on Intelligent User Interfaces (Redondo Beach CA, 1999), 105--108. Google ScholarDigital Library
- Zhang Q.H., Imamiya A. & Go K. Text entry application based on gaze pointing. in Proceedings of 7th ERCIM Workshop User Interfaces For All (Paris, 2002). 87--102.Google Scholar
Index Terms
- Overriding errors in a speech and gaze multimodal architecture
Recommendations
Resolving ambiguities of a gaze and speech interface
ETRA '04: Proceedings of the 2004 symposium on Eye tracking research & applicationsThe recognition ambiguity of a recognition-based user interface is inevitable. Multimodal architecture should be an effective means to reduce the ambiguity, and contribute to error avoidance and recovery, compared with a unimodal one. But does the ...
Multimodal error correction for speech user interfaces
Although commercial dictation systems and speech-enabled telephone voice user interfaces have become readily available, speech recognition errors remain a serious problem in the design and implementation of speech user interfaces. Previous work ...
Robust object-identification from inaccurate recognition-based inputs
AVI '04: Proceedings of the working conference on Advanced visual interfacesEyesight and speech are two channels that humans naturally use to communicate with each other. However both the eye tracking and the speech recognition technique existing are still far from perfect. This work explored how to integrate two (or more) ...
Comments