A Salience-Driven Approach to Speech Recognition for Human-Robot Interaction

Lison, Pierre

doi:10.1007/978-3-642-14729-6_8

A Salience-Driven Approach to Speech Recognition for Human-Robot Interaction

Pierre Lison²¹

Conference paper

496 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6211))

Abstract

We present an implemented model for speech recognition in natural environments which relies on contextual information about salient entities to prime utterance recognition. The hypothesis underlying our approach is that, in situated human-robot interaction, speech recognition performance can be significantly enhanced by exploiting knowledge about the immediate physical environment and the dialogue history. To this end, visual salience (objects perceived in the physical scene) and linguistic salience (previously referred-to objects within the current dialogue) are integrated into a single cross-modal salience model. The model is dynamically updated as the environment evolves, and is used to establish expectations about uttered words which are most likely to be heard given the context. The update is realised by continously adapting the word-class probabilities specified in the statistical language model. The present article discusses the motivations behind our approach, describes our implementation as part of a distributed, cognitive architecture for mobile robots, and reports the evaluation results on a test suite.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Langley, P., Laird, J.E., Rogers, S.: Cognitive architectures: Research issues and challenges. Technical report, Institute for the Study of Learning and Expertise, Palo Alto, CA (2005)
Google Scholar
Moore, R.K.: Spoken language processing: piecing together the puzzle. Speech Communication: Special Issue on Bridging the Gap Between Human and Automatic Speech Processing 49, 418–435 (2007)
Google Scholar
Gruenstein, A., Wang, C., Seneff, S.: Context-sensitive statistical language modeling. In: Proceedings of INTERSPEECH 2005, pp. 17–20 (2005)
Google Scholar
Chai, J.Y., Qu, S.: A salience driven approach to robust input interpretation in multimodal conversational systems. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing 2005, Vancouver, Canada, October 2005, pp. 217–224. Association for Computational Linguistics (2005)
Google Scholar
Qu, S., Chai, J.: An exploration of eye gaze in spoken language processing for multimodal conversational interfaces. In: Proceedings of the Conference of the North America Chapter of the Association of Computational Linguistics, pp. 284–291 (2007)
Google Scholar
Roy, D., Mukherjee, N.: Towards situated speech understanding: visual context priming of language models. Computer Speech & Language 19(2), 227–248 (2005)
Article Google Scholar
Hawes, N., Sloman, A., Wyatt, J., Zillich, M., Jacobsson, H., Kruijff, G.M., Brenner, M., Berginc, G., Skocaj, D.: Towards an integrated robot with multiple cognitive functions. In: AAAI, pp. 1548–1553. AAAI Press, Menlo Park (2007)
Google Scholar
Steedman, M., Baldridge, J.: Combinatory categorial grammar. In: Borsley, R., Börjars, K. (eds.) Nontransformational Syntax: A Guide to Current Models. Blackwell, Oxford (2009)
Google Scholar
Baldridge, J., Kruijff, G.J.M.: Coupling CCG and hybrid logic dependency semantics. In: ACL’02: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp. 319–326. Association for Computational Linguistics (2002)
Google Scholar
Carroll, J., Oepen, S.: High efficiency realization for a wide-coverage unification grammar. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 165–176. Springer, Heidelberg (2005)
Chapter Google Scholar
Kruijff, G., Lison, P., Benjamin, T., Jacobsson, H., Hawes, N.: Incremental, multi-level processing for comprehending situated dialogue in human-robot interaction. In: Language and Robots: Proceedings from the Symposium (LangRo’2007), Aveiro, Portugal, December 2007, pp. 55–64 (2007)
Google Scholar
Asher, N., Lascarides, A.: Logics of Conversation. Cambridge University Press, Cambridge (2003)
Google Scholar
Jacobsson, H., Hawes, N., Kruijff, G.J., Wyatt, J.: Crossmodal content binding in information-processing architectures. In: Proceedings of the 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI), Amsterdam, The Netherlands, March 12-15 (2008)
Google Scholar
Knoeferle, P., Crocker, M.: The coordinated interplay of scene, utterance, and world knowledge: evidence from eye tracking. Cognitive Science (2006)
Google Scholar
Van Berkum, J.: Sentence comprehension in a wider discourse: Can we use ERPs to keep track of things? In: Carreiras Jr., M., Chiarcos, C. (eds.) The on-line study of sentence comprehension: Eyetracking, ERPs and beyond, pp. 229–270. Psychology Press, New York (2004)
Google Scholar
Lison, P.: Robust processing of situated spoken dialogue. In: Chiarcos, C., de Castilho, R.E., Stede, M. (eds.) Von der Form zur Bedeutung: Texte automatisch verarbeiten / From Form to Meaning: Processing Texts Automatically, Proceedings of the Biennial GSCL Conference 2009, Potsdam, Germany. Narr Verlag (2009)
Google Scholar
Roy, D.: Semiotic schemas: A framework for grounding language in action and perception. Artificial Intelligence 167(1-2), 170–205 (2005)
Article Google Scholar
Brick, T., Scheutz, M.: Incremental natural language processing for HRI. In: Proceeding of the ACM/IEEE international conference on Human-Robot Interaction (HRI’07), pp. 263–270 (2007)
Google Scholar
Zender, H., Kruijff, G.J.M.: Towards generating referring expressions in a mobile robot scenario. In: Language and Robots: Proceedings of the Symposium, Aveiro, Portugal, December 2007, pp. 101–106 (2007)
Google Scholar
Landragin, F.: Visual perception, language and gesture: A model for their understanding in multimodal dialogue systems. Signal Processing 86(12), 3578–3595 (2006)
Article MATH Google Scholar
Grosz, B.J., Sidner, C.L.: Attention, intentions, and the structure of discourse. Computational Linguistics 12(3), 175–204 (1986)
Google Scholar
Grosz, B.J., Weinstein, S., Joshi, A.K.: Centering: a framework for modeling the local coherence of discourse. Computational Linguistics 21(2), 203–225 (1995)
Google Scholar
Kelleher, J.: Integrating visual and linguistic salience for reference resolution. In: Creaney, N. (ed.) Proceedings of the 16th Irish conference on Artificial Intelligence and Cognitive Science (AICS-05), Portstewart, Northern Ireland (2005)
Google Scholar
Weilhammer, K., Stuttle, M.N., Young, S.: Bootstrapping language models for dialogue systems. In: Proceedings of INTERSPEECH 2006, Pittsburgh, PA (2006)
Google Scholar
Lison, P.: A salience-driven approach to speech recognition for human-robot interaction. In: Proceedings of the 13th ESSLLI student session, Hamburg, Germany (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Language Technology Lab, German Research Centre for Artificial Intelligence (DFKI GmbH), Saarbrücken, Germany
Pierre Lison

Authors

Pierre Lison
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Stanford University, Stanford,, CA, USA
Thomas Icard
Tilburg University, Tilburg, The Netherlands
Reinhard Muskens

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lison, P. (2010). A Salience-Driven Approach to Speech Recognition for Human-Robot Interaction. In: Icard, T., Muskens, R. (eds) Interfaces: Explorations in Logic, Language and Computation. ESSLLI ESSLLI 2008 2009. Lecture Notes in Computer Science(), vol 6211. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14729-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-14729-6_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14728-9
Online ISBN: 978-3-642-14729-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics