skip to main content
10.1145/1040830.1040850acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
Article

Linguistic theories in efficient multimodal reference resolution: an empirical investigation

Published: 10 January 2005 Publication History

Abstract

Multimodal conversational interfaces provide a natural means for users to communicate with computer systems through multiple modalities such as speech, gesture, and gaze. To build effective multimodal interfaces, understanding user multimodal inputs is important. Previous linguistic and cognitive studies indicate that user language behavior does not occur randomly, but rather follows certain linguistic and cognitive principles. Therefore, this paper investigates the use of linguistic theories in multimodal interpretation. In particular, we present a greedy algorithm that incorporates Conversation Implicature and Givenness Hierarchy for efficient multimodal reference resolution. Empirical studies indicate that this algorithm significantly reduces the complexity in multimodal reference resolution compared to a previous graph-matching approach. One major advantage of this greedy algorithm is that the prior linguistic and cognitive knowledge can be used to guide the search and significantly prune the search space. Because of its simplicity and generality, this approach has the potential to improve the robustness of interpretation and provide a more practical solution to multimodal input interpretation.

References

[1]
Chai, J., Hong, P., Zhou, M. X., and Prasov, Z. 2004c. Optimization in Multimodal Interpretation. In Proceedings of ACL, 2004, pp. 1--8. Barcelona, Spain.
[2]
Chai, J., Prasov, Z, and Hong, P. 2004b. Performance Evaluation and Error Analysis for Multimodal Reference Resolution in a Conversational System. Proceedings of HLT-NAACL 2004 (Companion Volumn).
[3]
Chai, J., Hong, P., and Zhou, M. X. 2004a. A Probabilistic Approach to Reference Resolution in Multimodal User Interfaces, Proceedings of 9th International Conference on Intelligent User Interfaces (IUI): 70--77.
[4]
Chai, J., Pan, S., Zhou, M., and Houck, K. Context-based Multimodal Interpretation in Conversational Systems. Proceeding of 4th ICMI, 2002.
[5]
Cohen, P., The Pragmatics of Referring and Modality of Communication, Computational Linguistics, 10, pp 97--146. 1984.
[6]
Cohen, P., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., and Clow, J. Quickset: Multimodal Interaction for Distributed Applications. Proceedings of ACM Multimedia, 1996. pp. 31--40.
[7]
Grice, H. P. Logic and Conversation. In Cole, P., and Morgan, J., eds. Speech Acts. New York, New York: Academic Press. 41--58. 1975.
[8]
Grosz, B. J. and Sidner, C. Attention, intention, and the structure of discourse. Computational Linguistics, 12(3):175--204. 1986.
[9]
Gundel, J. K., Hedberg, N., and Zacharski, R. Cognitive Status and the Form of Referring Expressions in Discourse. Language 69(2):274--307. 1993.
[10]
Huls, C., Bos, E., and Classen, W. 1995. Automatic Referent Resolution of Deictic and Anaphoric Expressions. Computational Linguistics, 21(1):59--79.
[11]
Johnston, M, Cohen, P., McGee, D., Oviatt, S., Pittman, J. and Smith, I. Unification-based Multimodal Integration, Proceedings of ACL'97, 1997.
[12]
Johnston, M. Unification-based Multimodal parsing, Proceedings of COLING-ACL'98, 1998.
[13]
Johnston, M. and Bangalore, S. Finite-state multimodal parsing and understanding. Proc. COLING'00. 2000.
[14]
Kehler, A. Cognitive Status and Form of Reference in Multimodal Human-Computer Interaction, Proceedings of AAAI'01, 2000, pp. 685--689.
[15]
Oviatt, S. L. Multimodal interfaces for dynamic interactive maps. In Proceedings of Conference on Human Factors in Computing Systems: CHI '96, 1996, pp. 95--102.
[16]
Oviatt, S., DeAngeli, A., and Kuhn, K., Integration and Synchronization of Input Modes during Multimodal Human-Computer Interaction, In Proceedings of Conference on Human Factors in Computing Systems: CHI '97, 1997.
[17]
Oviatt, S. L. Mutual Disambiguation of Recognition Errors in a Multimodal Architecture. In Proceedings of Conference on Human Factors in Computing Systems: CHI '99.
[18]
Oviatt, S.L., Multimodal System Processing in Mobile Environments. In Proceedings of the Thirteenth Annual ACM Symposium on User Interface Software Technology (UIST'2000), 21--30. New York: ACM Press.
[19]
Wahlster, W., User and Discourse Models for Multimodal Communication. Intelligent User Interfaces, M. Maybury and W. Wahlster (eds.), 1998, pp 359--370.
[20]
Zancanaro, M., Stock, O., and Strapparava, C. 1997. Multimodal Interaction for Information Access: Exploiting Cohesion. Computational Intelligence 13(7):439--464.

Cited By

View all
  • (2015)POWER: A domain-independent algorithm for Probabilistic, Open-World Entity Resolution2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS.2015.7353526(1230-1235)Online publication date: 28-Sep-2015
  • (2012)Learning Macro Actions from Instructional Videos Through Integration of Multiple ModalitiesInternational Journal of Social Robotics10.1007/s12369-012-0167-65:1(53-73)Online publication date: 4-Oct-2012
  • (2011)See what i'm saying?Proceedings of the ACM 2011 conference on Computer supported cooperative work10.1145/1958824.1958892(435-444)Online publication date: 19-Mar-2011
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IUI '05: Proceedings of the 10th international conference on Intelligent user interfaces
January 2005
344 pages
ISBN:1581138946
DOI:10.1145/1040830
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 January 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. multimodal input interpretation
  2. reference resolution

Qualifiers

  • Article

Conference

IUI05
IUI05: Tenth International Conference on Intelligent User Interfaces
January 10 - 13, 2005
California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2015)POWER: A domain-independent algorithm for Probabilistic, Open-World Entity Resolution2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS.2015.7353526(1230-1235)Online publication date: 28-Sep-2015
  • (2012)Learning Macro Actions from Instructional Videos Through Integration of Multiple ModalitiesInternational Journal of Social Robotics10.1007/s12369-012-0167-65:1(53-73)Online publication date: 4-Oct-2012
  • (2011)See what i'm saying?Proceedings of the ACM 2011 conference on Computer supported cooperative work10.1145/1958824.1958892(435-444)Online publication date: 19-Mar-2011
  • (2011)Using speech to identify gesture pen strokes in collaborative, multimodal device descriptionsArtificial Intelligence for Engineering Design, Analysis and Manufacturing10.1017/S089006041100006025:3(237-254)Online publication date: 11-Jul-2011
  • (2008)What's in a gaze?Proceedings of the 13th international conference on Intelligent user interfaces10.1145/1378773.1378777(20-29)Online publication date: 13-Jan-2008
  • (2007)Modeling the impact of shared visual information on collaborative referenceProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1240624.1240858(1543-1552)Online publication date: 29-Apr-2007
  • (2007)Multimodal redundancy across handwriting and speech during computer mediated human-human interactionsProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1240624.1240778(1009-1018)Online publication date: 29-Apr-2007
  • (2007)Improving Speech Recognition Using Semantic and Reference Features in a Multimodal Dialog SystemRO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication10.1109/ROMAN.2007.4415120(416-420)Online publication date: Aug-2007
  • (2006)Cognitive principles in robust multimodal interpretationJournal of Artificial Intelligence Research10.5555/1622572.162257527:1(55-83)Online publication date: 1-Sep-2006
  • (2006)What's there to talk about?Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop10.5555/1609039.1609040(7-14)Online publication date: 6-Apr-2006
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media