Abstract
In this paper we present a novel false colouring-based visual saliency algorithm and illustrate how it is used in the situated language interpreter (SLI) system to ground a reference resolution framework for natural language interfaces to 3-D simulated environments. The visual saliency algorithm allows us to dynamically maintain a model of the evolving visual context. The visual saliency scores associated with the elements in the context model can be used to resolve underspecified references.
Similar content being viewed by others
References
Andre, E., Herzog, G. & Rist, T. (1988). On the Simultaneous Interpretation of Real World Image Sequences and their Natural Language Description: The System SOCCER. In Proceedings of the 8th European Conference on Artificial Intelligence (ECAI-88), pp. 449–454, Pitmann.
Byron, D. (2003). Understanding Referring Expressions in Situated Language: Some Challenges for Real-World Agents. In Proceedings of the First International Workshop on Language Understanding and Agents for the Real World. Hokkaido University.
Chum, M. & Wolfe, J. (2001). Visual Attention. In Goldstein, E. B. (ed.)Blackwell Handbook of Perception, Handbooks of Experimental Psychology, 272–310. Blackwell (Chapter 9).
Duwe, I. & Strohner, H. (1997). Towards a Cognitive Model of Linguistic Reference. Report: 97/1-Situierte Kunstlicher Kommunikatoren 97/1, Univeristat Bielefeld.
Forgus, R. & Melamed, L. (1976). Perception A Cognitive Stage Approach. McGraw-Hill.
Fuhr, T., Socher, G., Scheering, C. & Sagerer, G. (1998). A Three-Dimensional Spatial Model for the Interpretation of Image Data. In Olivier, P. & Gapp, K. (eds.) Representation and Processing of Spatial Expressions, 103–118. Lawrence Erlbaum Associates.
Goldwater, S. J., Bratt, E., Gawron, J. & Dowding, J. (2000). Building a Robust Dialogue System with Limited Data. In Proceedings of the Workshop on Conversational Systems at the First Meeting of the North American Chapter of the Association of Computational Linguistics. Seattle, WA.
Heinke, D. & Humphreys, G. (2004). Computational Models of Visual Selective Attention: A Review. In Houghton, G. (ed.) Connectionist Models in Psychology. Psychology Press.
Herzog, G. (1997). Connecting Vision and Natural Language Systems. Technical Report SFB 314 Project VITRA, Universitt des Saarlandes.
Jording, T. & Wachsmuth, I. (2002). An Anthropomorphic Agent for the Use of Spatial Language. In Coventry, K. & Olivier, P. (eds.) Spatial Language: Cognitive and Computational Aspects, 69–86 Dordrecht: Kluwer Academic Publishers.
Kelleher, J., Doris, T., Hussain, Q. & ONullain, S. (2000). SONAS: Multimodal, Multiuser Interaction with a Modelled Environment. In Nuallin, S. (ed.) Spatial Cognition-Foundation and Applications, Advances in Consciousness Research, 171–18 Amsterdam/Philadelphia: John Benjamins Publishing.
Kievit, L., Piwek, P., Beun, R. & Bunt, H. (2001). Multimodal Cooperative Resolution of Referential Expressions in the DenK System. In Bunt, H. & Beun, R. (eds.) Cooperative Multimodal Communication, Lecture notes in Artificial Intelligence, Vol. 2155, 197–214. Berlin Heidelberg: Springer-Verlag.
Klipple, E. & Gurney, J. (1999). Deixis to Properties in the NLVR System. In Andre, E. Massimo, P. & Rieser, H. (eds.) Proceedings of the Workshop on Deixis, Demonstration and Deictic Belief held on occasion of ESSLI XI, 58–68. Utrecht, The Netherlands.
Koch, C. & Itti, L. (2001). Computational Modelling of Visual Attention. Nature Reviews Neuroscience 2(3): 194–203.
Kuffner, J. & Latombe, J. (1999). Fast synthetic vision, memory, and learning models for virtual humans. In Proceedings of Computer Animation Conference (CA-99). 118–127, Geneva, Switzerland, IEEE Computer Society.
Landragin, F., Bellalem, N. & Romary, L. (2001). Visual Salience and Perceptual Grouping in Multimodal Interactivity. In Proceeding of the International Workshop on Information Presentation and Natural Multimodal Dialogue (IPNMD). Verona, Italy.
Maybury, M. & Wahlster, W. (eds.) (1998). Readings in Intelligent User Interfaces. San Francisco, CA: Morgan Kaufman Publishers, Inc.
McKevitt, P. (ed.) (1995/1996). Integration of Natural Language and Vision Processing (Vols. I-IV). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Noser, H., Renault, O., Thalmann, D. & Magnenat-Thalmann, N. (1995). Navigation for Digital Actors based on Synthetic Vision, Memory and Learning. Computer Graphics 19(1): 7–9.
Peter, C. & O'Sullivan, C. (2002). A Memory Model for Autonomous Vitual Humans. In Proceedings of Europraphics Irish Chapter Workshop (EGIreland-02), 21–26. Dublin.
Posner, M. I., Snyder, C. R. & Davidson, B. J. (1980). Attention and the Detection of Signals. Journal of Experimental Psychology: General 109(2): 160–174.
Russell, B. 1905. On Denoting. Mind 14: 479–493. Reprinted Logic and Knowledge (1956), pp. 39-56, R.C. Marsh ed.
Smith, A., Farley, B. & O'Nuallain, S. (1997). Visualization of Natural Language. In Dybjjaer, L. (ed.) Third Spoken Dialogue and Discourse Workshop: Topics in Natural Interactive Systems 1, 80–86. Odense University.
Spivey-Knowlton, M., Tanenhaus, M., Eberhard, K. & Sedivy, J. (1998). Integration of Visuospatial and Linguistic Information: Language Comprehension in Real Time and Real Space. In Olivier, P. & Gapp, K. (eds.) Representation and Processing of Spatial Expressions, 201–214, Lawrence Erlbaum Associates.
Tu, X. & Terzopoulos, D. (1994a). Artificial Fishes: Physics, Locomotion, Perception, Behaviour. In Proceedings of ACM SIGGRAPH, 43–50, Orlando, FL.
Tu, X. & Terzopoulos, D. (1994b). Perceptual Modelling for Behavioural Animation of Fishes. In Proceedings of the Second Pacific Conference on Computer Graphics and Applications, 185–200. Beijing, China.
Winograd, T. (1973). A Procedural Model of Language Understanding. In Schank, R. & Colby, K. (eds.) Computer Models of Thought and Language, 152–186. W.H. Freeman and Company.
Rights and permissions
About this article
Cite this article
Kelleher, J., van Genabith, J. Visual Salience and Reference Resolution in Simulated 3-D Environments. Artificial Intelligence Review 21, 253–267 (2004). https://doi.org/10.1023/B:AIRE.0000036258.60851.83
Issue Date:
DOI: https://doi.org/10.1023/B:AIRE.0000036258.60851.83