skip to main content
10.1145/302979.303163acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
Article
Free Access

Mutual disambiguation of recognition errors in a multimodel architecture

Authors Info & Claims
Published:01 May 1999Publication History

ABSTRACT

As a new generation of multimodal/media systems begins to define itself, researchers are attempting to learn how to combine different modes into strategically integrated whole systems. In theory, well designed multimodal systems should be able to integrate complementary modalities in a manner that supports mutual disambiguation (MD) of errors and leads to more robust performance. In this study, over 2,000 multimodal utterances by both native and accented speakers of English were processed by a multimodal system, and then logged and analyzed. The results confirmed that multimodal systems can indeed support significant levels of MD, and also higher levels of MD for the more challenging accented users. As a result, although speech recognition as a stand-alone performed far more poorly for accented speakers, their multimodal recognition rates did not differ from those of native speakers. Implications are discussed for the development of future multimodal architectures that can perform in a more robust and stable manner than individual recognition technologies. Also discussed is the design of interfaces that support diversity in tangible ways, and that function well under challenging real-world usage conditions,

References

  1. 1.Bolt, R.A. Put that there: Voice and gesture at the graphics interface. Computer Graphics, 1980, 14 (3): 262-270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.Carpenter, R. The logic of typed feature structures. Cambridge, MA.: Cambridge University Press, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.Clow, J. & Oviatt, S. L. STAMP: A suite of tools for analyzing multimodal system processing, Proceedings of the International Conference on Spoken Language Processing, in press.Google ScholarGoogle Scholar
  4. 4.Cohen, P., Dalrymple, M., Moran, D., Pereira, F. Synergistic use of direct manipulation and natural language, CHI '89 Conference Proceedings, ACM/Addison Wesley: New York, NY, 1989, 227-234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.Cohen, P., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L. and Clow, J. Quickset: Multimodal interaction for distributed applications. Proceedings of the Fifth ACM International Multimedia Conference, New York, NY: ACM Press, 1997, 31-40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.Johnston, M., Cohen, P.R., McGee, D., Oviatt, S.L., Pittman, J.A. & Smith, I. Unification-based multimodal integration. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, San Francisco, CA.: Morgan Kaufmann, 1997, 281-288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.Koons, D.B., Sparrell, C.J. & Thorisson, K.R. Integrating simultaneous input from speech, gaze, and hand gestures. In Intelligent Multimedia Interfaces, M. Maybury, Ed. MIT Press: Menlo Park, CA, 1993, 257-276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.Neal, J.G. & Shapiro, S.C. Intelligent multi-media interface technology. In Intelligent User Interfaces, J. Sullivan & S. Tyler, Eds. ACM: New York, 1991, 11-43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.Oviatt, S.L. Ten myths of multimodal interaction, Communications of the ACM, in press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.Oviatt, S.L. Multimodal interactive maps: Designing for human performance, Human-Computer Interaction, 1997, 12 (1 & 2) 93-129. Google ScholarGoogle ScholarCross RefCross Ref
  11. 11.Oviatt, S.L. Pen/voice: Complementary multimodal communication, Proceedings of Speech Tech 92, New York, NY.Google ScholarGoogle Scholar
  12. 12.Oviatt, S.L., Bernard, J. & Levow, G. Linguistic adaptations during spoken and multimodal error resolution, Language and Speech, in press.Google ScholarGoogle Scholar
  13. 13.Oviatt, S.L., Cohen, P. & Wang, M. Toward interface design for human language technology: Modality and structure as determinants of linguistic complexity, Speech Communication, 1994, 15 (3-4), 283-300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.Oviatt, S. L., DeAngeli, A. & Kuhn, K. Integration and synchronization of input modes during multimodal humancomputer interaction, Proceedings of the CHI 97 Conference, New York, NY: ACM Press, 415-422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.Oviatt, S. L. & Kuhn, K. Referential features and linguistic indirection in multimodal language, Proceedings of the International Conference on Spoken Language Processing, in press.Google ScholarGoogle Scholar
  16. 16.Oviatt, S. L. & Olsen, E. Integration themes in multimodal human-computer interaction, Proceedings of the International Conference on Spoken Language Processing, (ed. by Shirai, Furui & Kakehi), Acoustical Society of Japan, 1994, vol. 2, 551-554.Google ScholarGoogle Scholar

Index Terms

  1. Mutual disambiguation of recognition errors in a multimodel architecture

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CHI '99: Proceedings of the SIGCHI conference on Human Factors in Computing Systems
        May 1999
        632 pages
        ISBN:0201485591
        DOI:10.1145/302979

        Copyright © 1999 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 May 1999

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        CHI '99 Paper Acceptance Rate78of312submissions,25%Overall Acceptance Rate6,199of26,314submissions,24%

        Upcoming Conference

        CHI '24
        CHI Conference on Human Factors in Computing Systems
        May 11 - 16, 2024
        Honolulu , HI , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader