Mutual disambiguation of recognition errors in a multimodel architecture

Author:
Sharon Oviatt

Center for Human-Computer Communication, Oregon Graduate Institute of Science and Technology, P.O. Box 91000, Portland, OR

Center for Human-Computer Communication, Oregon Graduate Institute of Science and Technology, P.O. Box 91000, Portland, OR
View Profile

CHI '99: Proceedings of the SIGCHI conference on Human Factors in Computing SystemsMay 1999Pages 576–583https://doi.org/10.1145/302979.303163

Published:01 May 1999Publication History

CHI '99: Proceedings of the SIGCHI conference on Human Factors in Computing Systems

Pages 576–583

ABSTRACT

As a new generation of multimodal/media systems begins to define itself, researchers are attempting to learn how to combine different modes into strategically integrated whole systems. In theory, well designed multimodal systems should be able to integrate complementary modalities in a manner that supports mutual disambiguation (MD) of errors and leads to more robust performance. In this study, over 2,000 multimodal utterances by both native and accented speakers of English were processed by a multimodal system, and then logged and analyzed. The results confirmed that multimodal systems can indeed support significant levels of MD, and also higher levels of MD for the more challenging accented users. As a result, although speech recognition as a stand-alone performed far more poorly for accented speakers, their multimodal recognition rates did not differ from those of native speakers. Implications are discussed for the development of future multimodal architectures that can perform in a more robust and stable manner than individual recognition technologies. Also discussed is the design of interfaces that support diversity in tangible ways, and that function well under challenging real-world usage conditions,

References

1.Bolt, R.A. Put that there: Voice and gesture at the graphics interface. Computer Graphics, 1980, 14 (3): 262-270. Google ScholarDigital Library
2.Carpenter, R. The logic of typed feature structures. Cambridge, MA.: Cambridge University Press, 1992. Google ScholarDigital Library
3.Clow, J. & Oviatt, S. L. STAMP: A suite of tools for analyzing multimodal system processing, Proceedings of the International Conference on Spoken Language Processing, in press.Google Scholar
4.Cohen, P., Dalrymple, M., Moran, D., Pereira, F. Synergistic use of direct manipulation and natural language, CHI '89 Conference Proceedings, ACM/Addison Wesley: New York, NY, 1989, 227-234. Google ScholarDigital Library
5.Cohen, P., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L. and Clow, J. Quickset: Multimodal interaction for distributed applications. Proceedings of the Fifth ACM International Multimedia Conference, New York, NY: ACM Press, 1997, 31-40. Google ScholarDigital Library
6.Johnston, M., Cohen, P.R., McGee, D., Oviatt, S.L., Pittman, J.A. & Smith, I. Unification-based multimodal integration. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, San Francisco, CA.: Morgan Kaufmann, 1997, 281-288. Google ScholarDigital Library
7.Koons, D.B., Sparrell, C.J. & Thorisson, K.R. Integrating simultaneous input from speech, gaze, and hand gestures. In Intelligent Multimedia Interfaces, M. Maybury, Ed. MIT Press: Menlo Park, CA, 1993, 257-276. Google ScholarDigital Library
8.Neal, J.G. & Shapiro, S.C. Intelligent multi-media interface technology. In Intelligent User Interfaces, J. Sullivan & S. Tyler, Eds. ACM: New York, 1991, 11-43. Google ScholarDigital Library
9.Oviatt, S.L. Ten myths of multimodal interaction, Communications of the ACM, in press. Google ScholarDigital Library
10.Oviatt, S.L. Multimodal interactive maps: Designing for human performance, Human-Computer Interaction, 1997, 12 (1 & 2) 93-129. Google ScholarCross Ref
11.Oviatt, S.L. Pen/voice: Complementary multimodal communication, Proceedings of Speech Tech 92, New York, NY.Google Scholar
12.Oviatt, S.L., Bernard, J. & Levow, G. Linguistic adaptations during spoken and multimodal error resolution, Language and Speech, in press.Google Scholar
13.Oviatt, S.L., Cohen, P. & Wang, M. Toward interface design for human language technology: Modality and structure as determinants of linguistic complexity, Speech Communication, 1994, 15 (3-4), 283-300. Google ScholarDigital Library
14.Oviatt, S. L., DeAngeli, A. & Kuhn, K. Integration and synchronization of input modes during multimodal humancomputer interaction, Proceedings of the CHI 97 Conference, New York, NY: ACM Press, 415-422. Google ScholarDigital Library
15.Oviatt, S. L. & Kuhn, K. Referential features and linguistic indirection in multimodal language, Proceedings of the International Conference on Spoken Language Processing, in press.Google Scholar
16.Oviatt, S. L. & Olsen, E. Integration themes in multimodal human-computer interaction, Proceedings of the International Conference on Spoken Language Processing, (ed. by Shirai, Furui & Kakehi), Acoustical Society of Japan, 1994, vol. 2, 551-554.Google Scholar

Index Terms

Mutual disambiguation of recognition errors in a multimodel architecture
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition
2. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

Robust object-identification from inaccurate recognition-based inputs
AVI '04: Proceedings of the working conference on Advanced visual interfaces

Eyesight and speech are two channels that humans naturally use to communicate with each other. However both the eye tracking and the speech recognition technique existing are still far from perfect. This work explored how to integrate two (or more) ...
Read More
Resolving ambiguities of a gaze and speech interface
ETRA '04: Proceedings of the 2004 symposium on Eye tracking research & applications

The recognition ambiguity of a recognition-based user interface is inevitable. Multimodal architecture should be an effective means to reduce the ambiguity, and contribute to error avoidance and recovery, compared with a unimodal one. But does the ...
Read More
Multimodal system processing in mobile environments
UIST '00: Proceedings of the 13th annual ACM symposium on User interface software and technology
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '99: Proceedings of the SIGCHI conference on Human Factors in Computing Systems
May 1999
632 pages
ISBN:0201485591
DOI:10.1145/302979
Chairmen:
Marian G. Williams,
Mark W. Altom
Copyright © 1999 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 1999
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
diverse users
multimodal architecture
mutual disambiguation
rebust performances
recognition errors
speech and pen input
Qualifiers
- Article
Conference

Acceptance Rates
CHI '99 Paper Acceptance Rate78of312submissions,25%Overall Acceptance Rate6,199of26,314submissions,24%
More
Upcoming Conference
CHI '24

Sponsor:

sigchi

CHI Conference on Human Factors in Computing Systems

May 11 - 16, 2024

Honolulu , HI , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 143
  Total Citations
  View Citations
- 1,097
  Total Downloads
- Downloads (Last 12 months)60
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mutual disambiguation of recognition errors in a multimodel architecture

CHI '99: Proceedings of the SIGCHI conference on Human Factors in Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Robust object-identification from inaccurate recognition-based inputs

Resolving ambiguities of a gaze and speech interface

Multimodal system processing in mobile environments