skip to main content
10.1145/1891903.1891945acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Location grounding in multimodal local search

Published: 08 November 2010 Publication History

Abstract

Computational models of dialog context have often focused on unimodal spoken dialog or text, using the language itself as the primary locus of contextual information. But as we move from spoken interaction to situated multimodal interaction on mobile platforms supporting a combination of spoken dialog with graphical interaction, touch-screen input, geolocation, and other non-linguistic contextual factors, we will need more sophisticated models of context that capture the influence of these factors on semantic interpretation and dialog flow. Here we focus on how users establish the location they deem salient from the multimodal context by grounding it through interactions with a map-based query system. While many existing systems rely on geolocation to establish the location context of a query, we hypothesize that this approach often ignores the grounding actions users make, and provide an analysis of log data from one such system that reveals errors that arise from that faulty treatment of grounding. We then explore and evaluate, using live field data from a deployed multimodal search system, several different context classification techniques that attempt to learn the location contexts users make salient by grounding them through their multimodal actions.

References

[1]
Alshawi, H. 1987. Memory and Context for Language Interpretation. Cambridge, Cambridge University Press.
[2]
Bangalore, S. and Johnston, M. 2009. Robust Understanding in Multimodal Interfaces. Computational Linguistics 35:3, 345--397.
[3]
Clark, H., Schreuder, R., and Buttrick, S. 1983. Common Ground and the Understanding of Demonstrative Reference. Journal of Verbal Learning and Verbal Behavior 22, 245--258.
[4]
Clark, H. and Brennan, S. E. 1991. Grounding in Communication. In L. B. Resnick, J. M. Levine, and J. S. Teasley (eds.) Perspectives on socially shared cognition, American Psychological Association.
[5]
Clark, H. 1996. Using Language. Cambridge, MA: Cambridge University Press.
[6]
Cohen, P. R., Johnston, M., McGee, D., Oviatt, S. L., Pittman, J., Smith, I., Chen, L., and Clow, J. 1998. Multimodal Interaction for Distributed Interactive Simulation. In M. Maybury and W. Wahlster (eds.) Readings in Intelligent Interfaces. Morgan Kaufmann Publishers, San Francisco, CA, 562--571.
[7]
DiFabbrizio, G., Okken, T., and Wilpon, J. 2009. A Speech Mashup Framework for Multimodal Mobile Services. In Proceedings of the 2009 International Conference on Multimodal Interfaces, 71--78.
[8]
Ehlen, P., Zajac, R., and Rao, B. P. 2009. Location and Relevance. In E. Wilde, S. Boll, K. Cheverst, P. Fröhlich, R. Purves, and J. Schöning (eds.). Proceedings of the Second International Workshop on Location and the Web (LocWeb 2009), Boston, Massachusetts, April 2009, 17--19.
[9]
Feng, J., Bangalore, S., and Gilbert, M. 2009. Role of Natural Language Understanding in Voice Local Search. Proceedings of Interspeech 2009, 1859--1862.
[10]
Gustafson J., Bell, L., Beskow, J., Boye, J., Carlson, R., Edlund, J., Granström, B., House D., and Wirén, M. 2000. AdApt -- A Multimodal Conversational Dialogue system in an Apartment Domain. In Proceedings of ICSLP 2000. Vol. 2, 134--137.
[11]
Hastie, H., Johnston, M., and Ehlen. P. 2002. Context-sensitive Help for Multimodal Dialog. In Proceedings of the 4th IEEE Conference on Multimodal Interfaces, 93--101.
[12]
Huls, C., Bos, E., Claassen, W. 1995. Automatic Referent Resolution of Deictic and Anaphoric Expressions. Computational Linguistics. 21:1, 59--79.
[13]
Johnston, M., Bangalore, S. Vasireddy, G., Stent, A., Ehlen, P., Walker, M., Whittaker, S., Maloor, P. 2002. MATCH: An Architecture for Multimodal Dialogue Systems. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 376--383.
[14]
Kehler, A., Martin, J-C., Cheyer, A., Julia, L., Hobbs, J. and Bear, J. 1998. On Representing Salience and Reference in Multimodal Human Computer Interaction. AAAI'98 Workshop on Representations for Multi-Modal Human-Computer Interaction. Madison, WI. (USA), 33--39.
[15]
Oviatt, S. L. 1997. Multimodal Interactive Maps: Designing for Human Performance. Human Computer Interaction. 12, 93--129.
[16]
Oviatt, S. L. and Kuhn, K. 1998. Referential Features and Linguistic Indirection in Multimodal Language. Proceedings of ICSLP 1998, 286--304.
[17]
Rubine, D. 1991. Specifying Gestures by Example. Computer Graphics 25:4, 329--337.
[18]
Wahlster, W. 2006. SmartKom: Foundations of Multimodal Dialogue Systems (Cognitive Technologies). Springer-Verlag New York, Inc.
[19]
Witten, I. H. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition. Morgan Kaufmann, San Francisco.
[20]
http://speak4it.com/
[21]
http://www.google.com/mobile/apple/app.html
[22]
http://www.vlingo.com

Cited By

View all
  • (2019)Multimodal integration for interactive conversational systemsThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233798(21-76)Online publication date: 1-Jul-2019
  • (2013)A multimodal dialogue interface for mobile local searchProceedings of the companion publication of the 2013 international conference on Intelligent user interfaces companion10.1145/2451176.2451200(63-64)Online publication date: 19-Mar-2013
  • (2013)A unified framework for multimodal retrievalPattern Recognition10.1016/j.patcog.2013.05.02346:12(3358-3370)Online publication date: 1-Dec-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI-MLMI '10: International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
November 2010
311 pages
ISBN:9781450304146
DOI:10.1145/1891903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dialog
  2. gesture
  3. location-based
  4. multimodal
  5. search
  6. speech

Qualifiers

  • Research-article

Conference

ICMI-MLMI '10
Sponsor:

Acceptance Rates

ICMI-MLMI '10 Paper Acceptance Rate 41 of 100 submissions, 41%;
Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Multimodal integration for interactive conversational systemsThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233798(21-76)Online publication date: 1-Jul-2019
  • (2013)A multimodal dialogue interface for mobile local searchProceedings of the companion publication of the 2013 international conference on Intelligent user interfaces companion10.1145/2451176.2451200(63-64)Online publication date: 19-Mar-2013
  • (2013)A unified framework for multimodal retrievalPattern Recognition10.1016/j.patcog.2013.05.02346:12(3358-3370)Online publication date: 1-Dec-2013
  • (2012)Multimodal dialogue in mobile local searchProceedings of the 14th ACM international conference on Multimodal interaction10.1145/2388676.2388741(303-304)Online publication date: 22-Oct-2012
  • (2012)Collecting multimodal data in the wildProceedings of the 2012 ACM international conference on Intelligent User Interfaces10.1145/2166966.2167042(339-340)Online publication date: 14-Feb-2012
  • (2012)Multimodal interaction patterns in mobile local searchProceedings of the 2012 ACM international conference on Intelligent User Interfaces10.1145/2166966.2166970(21-24)Online publication date: 14-Feb-2012
  • (2011)Multimodal local search in Speak4itProceedings of the 16th international conference on Intelligent user interfaces10.1145/1943403.1943486(435-436)Online publication date: 13-Feb-2011
  • (2011)Speech and Multimodal Interaction in Mobile SearchIEEE Signal Processing Magazine10.1109/MSP.2011.94107328:4(40-49)Online publication date: Jul-2011
  • (2010)Speak4IT: Multimodal interaction in the wild2010 IEEE Spoken Language Technology Workshop10.1109/SLT.2010.5700840(159-160)Online publication date: Dec-2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media