skip to main content
10.1145/1040830.1040850acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
Article

Linguistic theories in efficient multimodal reference resolution: an empirical investigation

Published:10 January 2005Publication History

ABSTRACT

Multimodal conversational interfaces provide a natural means for users to communicate with computer systems through multiple modalities such as speech, gesture, and gaze. To build effective multimodal interfaces, understanding user multimodal inputs is important. Previous linguistic and cognitive studies indicate that user language behavior does not occur randomly, but rather follows certain linguistic and cognitive principles. Therefore, this paper investigates the use of linguistic theories in multimodal interpretation. In particular, we present a greedy algorithm that incorporates Conversation Implicature and Givenness Hierarchy for efficient multimodal reference resolution. Empirical studies indicate that this algorithm significantly reduces the complexity in multimodal reference resolution compared to a previous graph-matching approach. One major advantage of this greedy algorithm is that the prior linguistic and cognitive knowledge can be used to guide the search and significantly prune the search space. Because of its simplicity and generality, this approach has the potential to improve the robustness of interpretation and provide a more practical solution to multimodal input interpretation.

References

  1. Chai, J., Hong, P., Zhou, M. X., and Prasov, Z. 2004c. Optimization in Multimodal Interpretation. In Proceedings of ACL, 2004, pp. 1--8. Barcelona, Spain. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chai, J., Prasov, Z, and Hong, P. 2004b. Performance Evaluation and Error Analysis for Multimodal Reference Resolution in a Conversational System. Proceedings of HLT-NAACL 2004 (Companion Volumn). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chai, J., Hong, P., and Zhou, M. X. 2004a. A Probabilistic Approach to Reference Resolution in Multimodal User Interfaces, Proceedings of 9th International Conference on Intelligent User Interfaces (IUI): 70--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chai, J., Pan, S., Zhou, M., and Houck, K. Context-based Multimodal Interpretation in Conversational Systems. Proceeding of 4th ICMI, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cohen, P., The Pragmatics of Referring and Modality of Communication, Computational Linguistics, 10, pp 97--146. 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cohen, P., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., and Clow, J. Quickset: Multimodal Interaction for Distributed Applications. Proceedings of ACM Multimedia, 1996. pp. 31--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Grice, H. P. Logic and Conversation. In Cole, P., and Morgan, J., eds. Speech Acts. New York, New York: Academic Press. 41--58. 1975.Google ScholarGoogle ScholarCross RefCross Ref
  8. Grosz, B. J. and Sidner, C. Attention, intention, and the structure of discourse. Computational Linguistics, 12(3):175--204. 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gundel, J. K., Hedberg, N., and Zacharski, R. Cognitive Status and the Form of Referring Expressions in Discourse. Language 69(2):274--307. 1993.Google ScholarGoogle Scholar
  10. Huls, C., Bos, E., and Classen, W. 1995. Automatic Referent Resolution of Deictic and Anaphoric Expressions. Computational Linguistics, 21(1):59--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Johnston, M, Cohen, P., McGee, D., Oviatt, S., Pittman, J. and Smith, I. Unification-based Multimodal Integration, Proceedings of ACL'97, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Johnston, M. Unification-based Multimodal parsing, Proceedings of COLING-ACL'98, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Johnston, M. and Bangalore, S. Finite-state multimodal parsing and understanding. Proc. COLING'00. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kehler, A. Cognitive Status and Form of Reference in Multimodal Human-Computer Interaction, Proceedings of AAAI'01, 2000, pp. 685--689. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Oviatt, S. L. Multimodal interfaces for dynamic interactive maps. In Proceedings of Conference on Human Factors in Computing Systems: CHI '96, 1996, pp. 95--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Oviatt, S., DeAngeli, A., and Kuhn, K., Integration and Synchronization of Input Modes during Multimodal Human-Computer Interaction, In Proceedings of Conference on Human Factors in Computing Systems: CHI '97, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Oviatt, S. L. Mutual Disambiguation of Recognition Errors in a Multimodal Architecture. In Proceedings of Conference on Human Factors in Computing Systems: CHI '99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Oviatt, S.L., Multimodal System Processing in Mobile Environments. In Proceedings of the Thirteenth Annual ACM Symposium on User Interface Software Technology (UIST'2000), 21--30. New York: ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Wahlster, W., User and Discourse Models for Multimodal Communication. Intelligent User Interfaces, M. Maybury and W. Wahlster (eds.), 1998, pp 359--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Zancanaro, M., Stock, O., and Strapparava, C. 1997. Multimodal Interaction for Information Access: Exploiting Cohesion. Computational Intelligence 13(7):439--464.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Linguistic theories in efficient multimodal reference resolution: an empirical investigation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          IUI '05: Proceedings of the 10th international conference on Intelligent user interfaces
          January 2005
          344 pages
          ISBN:1581138946
          DOI:10.1145/1040830

          Copyright © 2005 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 January 2005

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate746of2,811submissions,27%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader