Article

Linguistic theories in efficient multimodal reference resolution: an empirical investigation

Authors:

Rong JinAuthors Info & Claims

IUI '05: Proceedings of the 10th international conference on Intelligent user interfaces

Pages 43 - 50

https://doi.org/10.1145/1040830.1040850

Published: 10 January 2005 Publication History

Abstract

Multimodal conversational interfaces provide a natural means for users to communicate with computer systems through multiple modalities such as speech, gesture, and gaze. To build effective multimodal interfaces, understanding user multimodal inputs is important. Previous linguistic and cognitive studies indicate that user language behavior does not occur randomly, but rather follows certain linguistic and cognitive principles. Therefore, this paper investigates the use of linguistic theories in multimodal interpretation. In particular, we present a greedy algorithm that incorporates Conversation Implicature and Givenness Hierarchy for efficient multimodal reference resolution. Empirical studies indicate that this algorithm significantly reduces the complexity in multimodal reference resolution compared to a previous graph-matching approach. One major advantage of this greedy algorithm is that the prior linguistic and cognitive knowledge can be used to guide the search and significantly prune the search space. Because of its simplicity and generality, this approach has the potential to improve the robustness of interpretation and provide a more practical solution to multimodal input interpretation.

References

[1]

Chai, J., Hong, P., Zhou, M. X., and Prasov, Z. 2004c. Optimization in Multimodal Interpretation. In Proceedings of ACL, 2004, pp. 1--8. Barcelona, Spain.

Digital Library

[2]

Chai, J., Prasov, Z, and Hong, P. 2004b. Performance Evaluation and Error Analysis for Multimodal Reference Resolution in a Conversational System. Proceedings of HLT-NAACL 2004 (Companion Volumn).

Digital Library

[3]

Chai, J., Hong, P., and Zhou, M. X. 2004a. A Probabilistic Approach to Reference Resolution in Multimodal User Interfaces, Proceedings of 9th International Conference on Intelligent User Interfaces (IUI): 70--77.

Digital Library

[4]

Chai, J., Pan, S., Zhou, M., and Houck, K. Context-based Multimodal Interpretation in Conversational Systems. Proceeding of 4th ICMI, 2002.

Digital Library

[5]

Cohen, P., The Pragmatics of Referring and Modality of Communication, Computational Linguistics, 10, pp 97--146. 1984.

Digital Library

[6]

Cohen, P., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., and Clow, J. Quickset: Multimodal Interaction for Distributed Applications. Proceedings of ACM Multimedia, 1996. pp. 31--40.

Digital Library

[7]

Grice, H. P. Logic and Conversation. In Cole, P., and Morgan, J., eds. Speech Acts. New York, New York: Academic Press. 41--58. 1975.

[8]

Grosz, B. J. and Sidner, C. Attention, intention, and the structure of discourse. Computational Linguistics, 12(3):175--204. 1986.

Digital Library

[9]

Gundel, J. K., Hedberg, N., and Zacharski, R. Cognitive Status and the Form of Referring Expressions in Discourse. Language 69(2):274--307. 1993.

[10]

Huls, C., Bos, E., and Classen, W. 1995. Automatic Referent Resolution of Deictic and Anaphoric Expressions. Computational Linguistics, 21(1):59--79.

Digital Library

[11]

Johnston, M, Cohen, P., McGee, D., Oviatt, S., Pittman, J. and Smith, I. Unification-based Multimodal Integration, Proceedings of ACL'97, 1997.

Digital Library

[12]

Johnston, M. Unification-based Multimodal parsing, Proceedings of COLING-ACL'98, 1998.

Digital Library

[13]

Johnston, M. and Bangalore, S. Finite-state multimodal parsing and understanding. Proc. COLING'00. 2000.

Digital Library

[14]

Kehler, A. Cognitive Status and Form of Reference in Multimodal Human-Computer Interaction, Proceedings of AAAI'01, 2000, pp. 685--689.

Digital Library

[15]

Oviatt, S. L. Multimodal interfaces for dynamic interactive maps. In Proceedings of Conference on Human Factors in Computing Systems: CHI '96, 1996, pp. 95--102.

Digital Library

[16]

Oviatt, S., DeAngeli, A., and Kuhn, K., Integration and Synchronization of Input Modes during Multimodal Human-Computer Interaction, In Proceedings of Conference on Human Factors in Computing Systems: CHI '97, 1997.

Digital Library

[17]

Oviatt, S. L. Mutual Disambiguation of Recognition Errors in a Multimodal Architecture. In Proceedings of Conference on Human Factors in Computing Systems: CHI '99.

Digital Library

[18]

Oviatt, S.L., Multimodal System Processing in Mobile Environments. In Proceedings of the Thirteenth Annual ACM Symposium on User Interface Software Technology (UIST'2000), 21--30. New York: ACM Press.

Digital Library

[19]

Wahlster, W., User and Discourse Models for Multimodal Communication. Intelligent User Interfaces, M. Maybury and W. Wahlster (eds.), 1998, pp 359--370.

Digital Library

[20]

Zancanaro, M., Stock, O., and Strapparava, C. 1997. Multimodal Interaction for Information Access: Exploiting Cohesion. Computational Intelligence 13(7):439--464.

Cited By

Williams TScheutz M(2015)POWER: A domain-independent algorithm for Probabilistic, Open-World Entity Resolution2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS.2015.7353526(1230-1235)Online publication date: 28-Sep-2015
https://dl.acm.org/doi/10.1109/IROS.2015.7353526
Johnson DAgah A(2012)Learning Macro Actions from Instructional Videos Through Integration of Multiple ModalitiesInternational Journal of Social Robotics10.1007/s12369-012-0167-65:1(53-73)Online publication date: 4-Oct-2012
https://doi.org/10.1007/s12369-012-0167-6
Gergle DClark AHinds PTang JWang JBardram JDucheneaut N(2011)See what i'm saying?Proceedings of the ACM 2011 conference on Computer supported cooperative work10.1145/1958824.1958892(435-444)Online publication date: 19-Mar-2011
https://dl.acm.org/doi/10.1145/1958824.1958892
Show More Cited By

Index Terms

Linguistic theories in efficient multimodal reference resolution: an empirical investigation
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI theory, concepts and models
    2. Interaction paradigms
      1. Natural language interfaces
  2. Interaction design
    1. Interaction design theory, concepts and paradigms

Recommendations

Attention driven reference resolution in multimodal contexts

In recent years a number of psycholinguistic experiments have pointed to the interaction between language and vision. In particular, the interaction between visual attention and linguistic reference. In parallel with this, several theories of discourse ...
The roles and recognition of Haptic-Ostensive actions in collaborative multimodal human-human dialogues

The RoboHelper project has the goal of developing assistive robots for the elderly. One crucial component of such a robot is a multimodal dialogue architecture, since collaborative task-oriented human-human dialogue is inherently multimodal. In this ...
Conversational Grounding in Multimodal Dialog Systems
ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction

The process of “conversational grounding” is an interactive process that has been studied extensively in cognitive science, whereby participants in a conversation check to make sure their interlocutors understand what is being referred to. This ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IUI '05: Proceedings of the 10th international conference on Intelligent user interfaces

January 2005

344 pages

ISBN:1581138946

DOI:10.1145/1040830

General Chair:
Rob St. Amant
North Carolina State University, USA
,
Program Chairs:
John Riedl
University of Minnesota, USA
,
Anthony Jameson
DFKI and International University in Germany

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 January 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

IUI05

Sponsor:

IUI05: Tenth International Conference on Intelligent User Interfaces

January 10 - 13, 2005

California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Sponsor:
sigai
sigai

30th International Conference on Intelligent User Interfaces

March 24 - 27, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
452
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Williams TScheutz M(2015)POWER: A domain-independent algorithm for Probabilistic, Open-World Entity Resolution2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS.2015.7353526(1230-1235)Online publication date: 28-Sep-2015
https://dl.acm.org/doi/10.1109/IROS.2015.7353526
Johnson DAgah A(2012)Learning Macro Actions from Instructional Videos Through Integration of Multiple ModalitiesInternational Journal of Social Robotics10.1007/s12369-012-0167-65:1(53-73)Online publication date: 4-Oct-2012
https://doi.org/10.1007/s12369-012-0167-6
Gergle DClark AHinds PTang JWang JBardram JDucheneaut N(2011)See what i'm saying?Proceedings of the ACM 2011 conference on Computer supported cooperative work10.1145/1958824.1958892(435-444)Online publication date: 19-Mar-2011
https://dl.acm.org/doi/10.1145/1958824.1958892
Herold JStahovich T(2011)Using speech to identify gesture pen strokes in collaborative, multimodal device descriptionsArtificial Intelligence for Engineering Design, Analysis and Manufacturing10.1017/S089006041100006025:3(237-254)Online publication date: 11-Jul-2011
https://doi.org/10.1017/S0890060411000060
Prasov ZChai JStaab S(2008)What's in a gaze?Proceedings of the 13th international conference on Intelligent user interfaces10.1145/1378773.1378777(20-29)Online publication date: 13-Jan-2008
https://dl.acm.org/doi/10.1145/1378773.1378777
Gergle DRose CKraut RRosson MGilmore D(2007)Modeling the impact of shared visual information on collaborative referenceProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1240624.1240858(1543-1552)Online publication date: 29-Apr-2007
https://dl.acm.org/doi/10.1145/1240624.1240858
Kaiser EBarthelmess PErdmann CCohen PRosson MGilmore D(2007)Multimodal redundancy across handwriting and speech during computer mediated human-human interactionsProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1240624.1240778(1009-1018)Online publication date: 29-Apr-2007
https://dl.acm.org/doi/10.1145/1240624.1240778
Kim KJeong MLee G(2007)Improving Speech Recognition Using Semantic and Reference Features in a Multimodal Dialog SystemRO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication10.1109/ROMAN.2007.4415120(416-420)Online publication date: Aug-2007
https://doi.org/10.1109/ROMAN.2007.4415120
Chai JPrasov ZQu S(2006)Cognitive principles in robust multimodal interpretationJournal of Artificial Intelligence Research10.5555/1622572.162257527:1(55-83)Online publication date: 1-Sep-2006
https://dl.acm.org/doi/10.5555/1622572.1622575
Gergle DPado SRead JSeretan V(2006)What's there to talk about?Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop10.5555/1609039.1609040(7-14)Online publication date: 6-Apr-2006
https://dl.acm.org/doi/10.5555/1609039.1609040
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten