ABSTRACT
As a new generation of multimodal/media systems begins to define itself, researchers are attempting to learn how to combine different modes into strategically integrated whole systems. In theory, well designed multimodal systems should be able to integrate complementary modalities in a manner that supports mutual disambiguation (MD) of errors and leads to more robust performance. In this study, over 2,000 multimodal utterances by both native and accented speakers of English were processed by a multimodal system, and then logged and analyzed. The results confirmed that multimodal systems can indeed support significant levels of MD, and also higher levels of MD for the more challenging accented users. As a result, although speech recognition as a stand-alone performed far more poorly for accented speakers, their multimodal recognition rates did not differ from those of native speakers. Implications are discussed for the development of future multimodal architectures that can perform in a more robust and stable manner than individual recognition technologies. Also discussed is the design of interfaces that support diversity in tangible ways, and that function well under challenging real-world usage conditions,
- 1.Bolt, R.A. Put that there: Voice and gesture at the graphics interface. Computer Graphics, 1980, 14 (3): 262-270. Google ScholarDigital Library
- 2.Carpenter, R. The logic of typed feature structures. Cambridge, MA.: Cambridge University Press, 1992. Google ScholarDigital Library
- 3.Clow, J. & Oviatt, S. L. STAMP: A suite of tools for analyzing multimodal system processing, Proceedings of the International Conference on Spoken Language Processing, in press.Google Scholar
- 4.Cohen, P., Dalrymple, M., Moran, D., Pereira, F. Synergistic use of direct manipulation and natural language, CHI '89 Conference Proceedings, ACM/Addison Wesley: New York, NY, 1989, 227-234. Google ScholarDigital Library
- 5.Cohen, P., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L. and Clow, J. Quickset: Multimodal interaction for distributed applications. Proceedings of the Fifth ACM International Multimedia Conference, New York, NY: ACM Press, 1997, 31-40. Google ScholarDigital Library
- 6.Johnston, M., Cohen, P.R., McGee, D., Oviatt, S.L., Pittman, J.A. & Smith, I. Unification-based multimodal integration. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, San Francisco, CA.: Morgan Kaufmann, 1997, 281-288. Google ScholarDigital Library
- 7.Koons, D.B., Sparrell, C.J. & Thorisson, K.R. Integrating simultaneous input from speech, gaze, and hand gestures. In Intelligent Multimedia Interfaces, M. Maybury, Ed. MIT Press: Menlo Park, CA, 1993, 257-276. Google ScholarDigital Library
- 8.Neal, J.G. & Shapiro, S.C. Intelligent multi-media interface technology. In Intelligent User Interfaces, J. Sullivan & S. Tyler, Eds. ACM: New York, 1991, 11-43. Google ScholarDigital Library
- 9.Oviatt, S.L. Ten myths of multimodal interaction, Communications of the ACM, in press. Google ScholarDigital Library
- 10.Oviatt, S.L. Multimodal interactive maps: Designing for human performance, Human-Computer Interaction, 1997, 12 (1 & 2) 93-129. Google ScholarCross Ref
- 11.Oviatt, S.L. Pen/voice: Complementary multimodal communication, Proceedings of Speech Tech 92, New York, NY.Google Scholar
- 12.Oviatt, S.L., Bernard, J. & Levow, G. Linguistic adaptations during spoken and multimodal error resolution, Language and Speech, in press.Google Scholar
- 13.Oviatt, S.L., Cohen, P. & Wang, M. Toward interface design for human language technology: Modality and structure as determinants of linguistic complexity, Speech Communication, 1994, 15 (3-4), 283-300. Google ScholarDigital Library
- 14.Oviatt, S. L., DeAngeli, A. & Kuhn, K. Integration and synchronization of input modes during multimodal humancomputer interaction, Proceedings of the CHI 97 Conference, New York, NY: ACM Press, 415-422. Google ScholarDigital Library
- 15.Oviatt, S. L. & Kuhn, K. Referential features and linguistic indirection in multimodal language, Proceedings of the International Conference on Spoken Language Processing, in press.Google Scholar
- 16.Oviatt, S. L. & Olsen, E. Integration themes in multimodal human-computer interaction, Proceedings of the International Conference on Spoken Language Processing, (ed. by Shirai, Furui & Kakehi), Acoustical Society of Japan, 1994, vol. 2, 551-554.Google Scholar
Index Terms
- Mutual disambiguation of recognition errors in a multimodel architecture
Recommendations
Robust object-identification from inaccurate recognition-based inputs
AVI '04: Proceedings of the working conference on Advanced visual interfacesEyesight and speech are two channels that humans naturally use to communicate with each other. However both the eye tracking and the speech recognition technique existing are still far from perfect. This work explored how to integrate two (or more) ...
Resolving ambiguities of a gaze and speech interface
ETRA '04: Proceedings of the 2004 symposium on Eye tracking research & applicationsThe recognition ambiguity of a recognition-based user interface is inevitable. Multimodal architecture should be an effective means to reduce the ambiguity, and contribute to error avoidance and recovery, compared with a unimodal one. But does the ...
Multimodal system processing in mobile environments
UIST '00: Proceedings of the 13th annual ACM symposium on User interface software and technology
Comments