An Alternative Suggestion for Vision-Language Integration in Intelligent Agents

Pastra, Katerina

doi:10.1007/11752912_75

Katerina Pastra²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3955))

Included in the following conference series:

Hellenic Conference on Artificial Intelligence

1828 Accesses

Abstract

State of the art artificial agents rely heavily on human intervention for performing vision-language integration; apart from being cost and effort effective, this intervention deprives artificial agents from the ability to react intelligently and to show intentionality when engaged in situated multimodal communication. In this paper, we suggest an alternative way of building vision-language integration prototypes with limited human intervention. The suggestions have emerged from the development of such a prototype for the verbalisation of visual scenes in a property-surveillance task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Visual cognition in multimodal large language models

Article Open access 15 January 2025

Artificial Visual Intelligence

Multimodal Semantics for Affordances and Actions

References

Pastra, K., Wilks, Y.: Vision-language integration in AI: a reality check. In: Proceedings of the 16th European Conference in Artificial Intelligence, pp. 937–941 (2004)
Google Scholar
Pastra, K.: Viewing vision-language integration as a double-grounding case. In: Proceedings of the AAAI Fall Symposium on “Achieving Human-Level Intelligence through Integrated Systems and Research”, pp. 62–69 (2004)
Google Scholar
Searle, J.: Minds, brains, and programs. Behavioral and Brain Sciences 3, 417–457 (1980)
Article Google Scholar
Harnad, S.: The symbol grounding problem. Physica D 42, 335–346 (1990)
Article Google Scholar
Pastra, K.: Vision-Language Integration: a Double-Grounding Case. PhD thesis, University of Sheffield (2005)
Google Scholar
Kanade, T., Rander, P., Narayanan, R.: Virtualised reality: constructing virtual worlds from real scenes. IEEE Multimedia 4, 34–46 (1997)
Article Google Scholar
Minsky, M.: The Society of Mind. Simon and Schuster Inc. (1986)
Google Scholar
Landau, B., Jackendoff, R.: “What” and “Where” in spatial language and cognition. Behavioural and Brain Sciences 16, 217–265 (1993)
Article Google Scholar
Kaplan, F.: Talking AIBO: First experimentation of verbal interactions with an autonomous four-legged robot. In: Proceedings of the TWENTE Workshop on Language Technology, pp. 57–63 (2000)
Google Scholar
Roy, D.: Learning visually grounded words and syntax for a scene description task. Computer speech and language 16, 353–385 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Language and Speech Processing, Artemidos 6 and Epidavrou, Maroussi, 151-25, Greece
Katerina Pastra

Authors

Katerina Pastra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department of University of Crete, Greece
Grigoris Antoniou
Institute of Computer Science, Foundation for Research & Technology – Hellas (FORTH), Vassilika Vouton, P.O. Box 1385, 71110, Heraklion, Greece
George Potamias
Institute of Informatics and Telecommunications, NCSR "Demokritos", 15310 A., Paraskevi Attikis, Greece
Costas Spyropoulos
Institute of Computer Science, FO.R.T.H., Vassilika Vouton, P.O. Box 1385, GR 71110, Heraklion, Greece
Dimitris Plexousakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pastra, K. (2006). An Alternative Suggestion for Vision-Language Integration in Intelligent Agents. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds) Advances in Artificial Intelligence. SETN 2006. Lecture Notes in Computer Science(), vol 3955. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11752912_75

Download citation

DOI: https://doi.org/10.1007/11752912_75
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34117-8
Online ISBN: 978-3-540-34118-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics