Article

Overriding errors in a speech and gaze multimodal architecture

Authors:
Qiaohui Zhang

University of Yamanashi, Japan

University of Yamanashi, Japan
View Profile

,
Atsumi Imamiya

University of Yamanashi, Japan

University of Yamanashi, Japan
View Profile

,
Kentaro Go

University of Yamanashi, Japan

University of Yamanashi, Japan
View Profile

,
Xiaoyang Mao

University of Yamanashi, Japan

University of Yamanashi, Japan
View Profile

IUI '04: Proceedings of the 9th international conference on Intelligent user interfacesJanuary 2004Pages 346–348https://doi.org/10.1145/964442.964527

Published:13 January 2004Publication History

IUI '04: Proceedings of the 9th international conference on Intelligent user interfaces

Pages 346–348

ABSTRACT

This work explores how to use the gaze and the speech command simultaneously to select an object on the screen. Multimodal systems have long been a key mean to reduce the recognition errors of individual components. But the multimodal system generates errors as well. This present study tries to classify the multimodal errors, analyze the reasons causing these errors, and propose the solutions for eliminating them. The goal of this study is to gain insight into multimodal integration errors, and to develop an error self-recoverable multimodal architecture so as to make the error-prone recognition technologies perform at a more stable and robust level within multimodal architecture.

References

Bolt, R.A. Put-that-there: Voice and Gesture in the graphics interface. in Proceedings of the ACM conference on Computer Graphics (New York, 1980), 262--270. Google ScholarDigital Library
Campana, E. Baldridge, J. Dowding, J. Hockey, B. A. Remington, R.W. Stone, L.S. Using eye movements to determine referents in a spoken dialogue system. in Proceedings of Perceptive User Interface (Orland, FL, 2001). Google ScholarDigital Library
Jacob, R. J. K. (1995). Eye tracking in advanced interface design, In W. Barfield & T. Furness (Ed.), Advanced Interface Design and Virtual Environments (pp. 258--288). Oxford: Oxford University Press. Google ScholarDigital Library
Koons, D.B., Sparrell, C.J., & Thorisson, K.R. Integrating simultaneous input from speech, gaze and hand gestures. M.Maybury (Eds.), Intelligent Multimedia Interfaces. Menlo Park, CA: MIT Press, 1993, 257--276. Google ScholarDigital Library
Neal, J.G. Thielman C.Y. Dobes A. Haller S.M. Shapiro S.C. Natural language with integrated deictic and graphic gestures. M.T. Maybury & W. Wahlster(Eds.), Readings In Intelligent User Interfaces. San Francisco, CA: Morgan Kaufmann Press, 1998, 38--51. Google ScholarDigital Library
Oviatt S.L. Mutual disambiguation of recognition errors in a multimodal Architecture. in Proceedings of CHI '99 (New York, N.Y., 1999). ACM Press, 576--583. Google ScholarDigital Library
Tanaka, K. A robust selection system using real-time multi-modal user-agent interactions. in Proceedings of the 1999 conference on Intelligent User Interfaces (Redondo Beach CA, 1999), 105--108. Google ScholarDigital Library
Zhang Q.H., Imamiya A. & Go K. Text entry application based on gaze pointing. in Proceedings of 7th ERCIM Workshop User Interfaces For All (Paris, 2002). 87--102.Google Scholar

Index Terms

Overriding errors in a speech and gaze multimodal architecture
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods

Recommendations

Resolving ambiguities of a gaze and speech interface
ETRA '04: Proceedings of the 2004 symposium on Eye tracking research & applications

The recognition ambiguity of a recognition-based user interface is inevitable. Multimodal architecture should be an effective means to reduce the ambiguity, and contribute to error avoidance and recovery, compared with a unimodal one. But does the ...
Read More
Multimodal error correction for speech user interfaces

Although commercial dictation systems and speech-enabled telephone voice user interfaces have become readily available, speech recognition errors remain a serious problem in the design and implementation of speech user interfaces. Previous work ...
Read More
Robust object-identification from inaccurate recognition-based inputs
AVI '04: Proceedings of the working conference on Advanced visual interfaces

Eyesight and speech are two channels that humans naturally use to communicate with each other. However both the eye tracking and the speech recognition technique existing are still far from perfect. This work explored how to integrate two (or more) ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IUI '04: Proceedings of the 9th international conference on Intelligent user interfaces
January 2004
396 pages
ISBN:1581138156
DOI:10.1145/964442
Conference Chair:
Jean Vanderdonckt
University Louvain (BE)
,
Program Chairs:
Nuno Jardim Nunes
University of Madeira (PT)
,
Charles Rich
Mitsubishi Electric Research Laboratories (US)
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 January 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
eye tracking
mjultimodal architecture
multimodal errors
recognition errors
speech input
Qualifiers
- Article
Conference

Acceptance Rates
IUI '04 Paper Acceptance Rate72of140submissions,51%Overall Acceptance Rate746of2,811submissions,27%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 394
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Overriding errors in a speech and gaze multimodal architecture

IUI '04: Proceedings of the 9th international conference on Intelligent user interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Resolving ambiguities of a gaze and speech interface

Multimodal error correction for speech user interfaces

Robust object-identification from inaccurate recognition-based inputs