Skip to main content
Log in

Visual Salience and Reference Resolution in Simulated 3-D Environments

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

In this paper we present a novel false colouring-based visual saliency algorithm and illustrate how it is used in the situated language interpreter (SLI) system to ground a reference resolution framework for natural language interfaces to 3-D simulated environments. The visual saliency algorithm allows us to dynamically maintain a model of the evolving visual context. The visual saliency scores associated with the elements in the context model can be used to resolve underspecified references.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Andre, E., Herzog, G. & Rist, T. (1988). On the Simultaneous Interpretation of Real World Image Sequences and their Natural Language Description: The System SOCCER. In Proceedings of the 8th European Conference on Artificial Intelligence (ECAI-88), pp. 449–454, Pitmann.

  • Byron, D. (2003). Understanding Referring Expressions in Situated Language: Some Challenges for Real-World Agents. In Proceedings of the First International Workshop on Language Understanding and Agents for the Real World. Hokkaido University.

  • Chum, M. & Wolfe, J. (2001). Visual Attention. In Goldstein, E. B. (ed.)Blackwell Handbook of Perception, Handbooks of Experimental Psychology, 272–310. Blackwell (Chapter 9).

  • Duwe, I. & Strohner, H. (1997). Towards a Cognitive Model of Linguistic Reference. Report: 97/1-Situierte Kunstlicher Kommunikatoren 97/1, Univeristat Bielefeld.

  • Forgus, R. & Melamed, L. (1976). Perception A Cognitive Stage Approach. McGraw-Hill.

  • Fuhr, T., Socher, G., Scheering, C. & Sagerer, G. (1998). A Three-Dimensional Spatial Model for the Interpretation of Image Data. In Olivier, P. & Gapp, K. (eds.) Representation and Processing of Spatial Expressions, 103–118. Lawrence Erlbaum Associates.

  • Goldwater, S. J., Bratt, E., Gawron, J. & Dowding, J. (2000). Building a Robust Dialogue System with Limited Data. In Proceedings of the Workshop on Conversational Systems at the First Meeting of the North American Chapter of the Association of Computational Linguistics. Seattle, WA.

  • Heinke, D. & Humphreys, G. (2004). Computational Models of Visual Selective Attention: A Review. In Houghton, G. (ed.) Connectionist Models in Psychology. Psychology Press.

  • Herzog, G. (1997). Connecting Vision and Natural Language Systems. Technical Report SFB 314 Project VITRA, Universitt des Saarlandes.

  • Jording, T. & Wachsmuth, I. (2002). An Anthropomorphic Agent for the Use of Spatial Language. In Coventry, K. & Olivier, P. (eds.) Spatial Language: Cognitive and Computational Aspects, 69–86 Dordrecht: Kluwer Academic Publishers.

    Google Scholar 

  • Kelleher, J., Doris, T., Hussain, Q. & ONullain, S. (2000). SONAS: Multimodal, Multiuser Interaction with a Modelled Environment. In Nuallin, S. (ed.) Spatial Cognition-Foundation and Applications, Advances in Consciousness Research, 171–18 Amsterdam/Philadelphia: John Benjamins Publishing.

    Google Scholar 

  • Kievit, L., Piwek, P., Beun, R. & Bunt, H. (2001). Multimodal Cooperative Resolution of Referential Expressions in the DenK System. In Bunt, H. & Beun, R. (eds.) Cooperative Multimodal Communication, Lecture notes in Artificial Intelligence, Vol. 2155, 197–214. Berlin Heidelberg: Springer-Verlag.

    Google Scholar 

  • Klipple, E. & Gurney, J. (1999). Deixis to Properties in the NLVR System. In Andre, E. Massimo, P. & Rieser, H. (eds.) Proceedings of the Workshop on Deixis, Demonstration and Deictic Belief held on occasion of ESSLI XI, 58–68. Utrecht, The Netherlands.

    Google Scholar 

  • Koch, C. & Itti, L. (2001). Computational Modelling of Visual Attention. Nature Reviews Neuroscience 2(3): 194–203.

    Google Scholar 

  • Kuffner, J. & Latombe, J. (1999). Fast synthetic vision, memory, and learning models for virtual humans. In Proceedings of Computer Animation Conference (CA-99). 118–127, Geneva, Switzerland, IEEE Computer Society.

    Google Scholar 

  • Landragin, F., Bellalem, N. & Romary, L. (2001). Visual Salience and Perceptual Grouping in Multimodal Interactivity. In Proceeding of the International Workshop on Information Presentation and Natural Multimodal Dialogue (IPNMD). Verona, Italy.

  • Maybury, M. & Wahlster, W. (eds.) (1998). Readings in Intelligent User Interfaces. San Francisco, CA: Morgan Kaufman Publishers, Inc.

    Google Scholar 

  • McKevitt, P. (ed.) (1995/1996). Integration of Natural Language and Vision Processing (Vols. I-IV). Dordrecht, The Netherlands: Kluwer Academic Publishers.

    Google Scholar 

  • Noser, H., Renault, O., Thalmann, D. & Magnenat-Thalmann, N. (1995). Navigation for Digital Actors based on Synthetic Vision, Memory and Learning. Computer Graphics 19(1): 7–9.

    Google Scholar 

  • Peter, C. & O'Sullivan, C. (2002). A Memory Model for Autonomous Vitual Humans. In Proceedings of Europraphics Irish Chapter Workshop (EGIreland-02), 21–26. Dublin.

  • Posner, M. I., Snyder, C. R. & Davidson, B. J. (1980). Attention and the Detection of Signals. Journal of Experimental Psychology: General 109(2): 160–174.

    Google Scholar 

  • Russell, B. 1905. On Denoting. Mind 14: 479–493. Reprinted Logic and Knowledge (1956), pp. 39-56, R.C. Marsh ed.

  • Smith, A., Farley, B. & O'Nuallain, S. (1997). Visualization of Natural Language. In Dybjjaer, L. (ed.) Third Spoken Dialogue and Discourse Workshop: Topics in Natural Interactive Systems 1, 80–86. Odense University.

  • Spivey-Knowlton, M., Tanenhaus, M., Eberhard, K. & Sedivy, J. (1998). Integration of Visuospatial and Linguistic Information: Language Comprehension in Real Time and Real Space. In Olivier, P. & Gapp, K. (eds.) Representation and Processing of Spatial Expressions, 201–214, Lawrence Erlbaum Associates.

  • Tu, X. & Terzopoulos, D. (1994a). Artificial Fishes: Physics, Locomotion, Perception, Behaviour. In Proceedings of ACM SIGGRAPH, 43–50, Orlando, FL.

  • Tu, X. & Terzopoulos, D. (1994b). Perceptual Modelling for Behavioural Animation of Fishes. In Proceedings of the Second Pacific Conference on Computer Graphics and Applications, 185–200. Beijing, China.

  • Winograd, T. (1973). A Procedural Model of Language Understanding. In Schank, R. & Colby, K. (eds.) Computer Models of Thought and Language, 152–186. W.H. Freeman and Company.

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kelleher, J., van Genabith, J. Visual Salience and Reference Resolution in Simulated 3-D Environments. Artificial Intelligence Review 21, 253–267 (2004). https://doi.org/10.1023/B:AIRE.0000036258.60851.83

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:AIRE.0000036258.60851.83

Navigation