Abstract
The multi-modal tasks have started to play a significant role in the research on Artificial Intelligence. A particular example of that domain is visual-linguistic tasks, such as Visual Question Answering and its extension, Visual Dialog. In this paper, we concentrate on the Visual Dialog task and dataset. The task involves two agents. The first agent does not see an image and asks questions about the image content. The second agent sees this image and answers questions. The symbol grounding problem, or how symbols obtain their meanings, plays a crucial role in such tasks. We approach that problem from the semiotic point of view and propose the Vector Semiotic Architecture for Visual Dialog. The Vector Semiotic Architecture is a combination of the Sign-Based World Model and Vector Symbolic Architecture. The Sign-Based World Model represents agent knowledge on the high level of abstraction and allows uniform representation of different aspects of knowledge, forming a hierarchical representation of that knowledge in the form of a special kind of semantic network. The Vector Symbolic Architecture represents the computational level and allows to operate with symbols as with numerical vectors using simple element-wise operations. That combination enables grounding object representation from any level of abstraction to the sensory agent input.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal, A., et al.: VQA: visual question answering. arXiv e-prints arXiv:1505.00468 (2015)
Bartlett, F.C.: Remembering: a study in experimental and social psychology. Philosophy 8(31), 374–376 (1932)
Bernstein, A.N.: On dexterity and its development. Publishing House “Physical Culture and Sport", Moscow (1991). (in Russian)
Besold, T.R., Kühnberger, K.U.: Towards integrated neural - symbolic systems for human-level AI: two research programs helping to bridge the gaps. Biol. Inspir. Cogn. Architect. 14, 97–110 (2015)
Chomskaya, E.D.: Neuropsychology, 4th edn. Peter (2005). (in Russian)
Das, A., et al.: Visual dialog. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Gibson, J.: The Perception of the Visual World. Houghton Mifflin, Boston (1950)
Gorodetskiy, A., Shlychkova, A., Panov, A.I.: Delta schema network in model-based reinforcement learning. In: Goertzel, B., Panov, A.I., Potapov, A., Yampolskiy, R. (eds.) AGI 2020. LNCS (LNAI), vol. 12177, pp. 172–182. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52152-3_18
Gupta, A., Dollar, P., Girshick, R.: LVIS: a dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2019-June, pp. 5351–5359 (2019)
Harnad, S.: The symbol grounding problem. Physica D 42(1), 335–346 (1990)
Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Zitnick, C.L., Girshick, R.: CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning. In: CVPR (2017)
Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L., Levy, O.: SpanBERT: improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Linguist. 8, 64–77 (2019)
Kanerva, P.: Hyperdimensional computing: an introduction to computing in distributed representation with high-dimensional random vectors. Cogn. Comput. 1(2), 139–159 (2009). https://doi.org/10.1007/s12559-009-9009-8
Kiselev, G., Kovalev, A., Panov, A.I.: Spatial reasoning and planning in sign-based world model. In: Kuznetsov, S.O., Osipov, G.S., Stefanuk, V.L. (eds.) RCAI 2018. CCIS, vol. 934, pp. 1–10. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00617-4_1
Kiselev, G.A., Panov, A.I.: Synthesis of the behavior plan for group of robots with sign based world model. In: Ronzhin, A., Rigoll, G., Meshcheryakov, R. (eds.) ICR 2017. LNCS (LNAI), vol. 10459, pp. 83–94. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66471-2_10
Kovalev, A.K., Panov, A.I.: Mental actions and modelling of reasoning in semiotic approach to AGI. In: Hammer, P., Agrawal, P., Goertzel, B., Iklé, M. (eds.) AGI 2019. LNCS (LNAI), vol. 11654, pp. 121–131. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27005-6_12
Kovalev, A.K., Panov, A.I., Osipov, E.: Hyperdimensional representations in semiotic approach to AGI. In: Goertzel, B., Panov, A.I., Potapov, A., Yampolskiy, R. (eds.) AGI 2020. LNCS (LNAI), vol. 12177, pp. 231–241. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52152-3_24
Lee, K., He, L., Zettlemoyer, L.: Higher-order coreference resolution with coarse-to-fine inference. In: NAACL-HLT (2018)
Leontiev, A.N.: Psychology of the image [in russian]. Vestn. Mosk. un-ta. Ser. 14, Psychology, no. 2, pp. 3–13 (1979)
Lin, T.-Y.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Neisser, U.: Cognition and Reality: Principles and Implications of Cognitive Psychology. W. H Freeman and Company, New York (1976)
Osipov, G.S., Panov, A.I., Chudova, N.V.: Behavior control as a function of consciousness. I. World model and goal setting. J. Comput. Syst. Sci. Int. 53(4), 517–529 (2014)
Osipov, G.S.: Signs-based vs. symbolic models. In: Sidorov, G., Galicia-Haro, S.N. (eds.) MICAI 2015. LNCS (LNAI), vol. 9413, pp. 3–11. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27060-9_1
Panov, A.I.: Goal setting and behavior planning for cognitive agents. Sci. Tech. Inf. Process. 46(6), 404–415 (2019)
Piaget, J.: Les mécanismes perceptifs. Presses universitaires de France, Paris (1961). (in French)
Poddyakov, N.N.: Features of Mental Development of Preschool Children. Professional Education Publishing House, Moscow (1996). [in Russian]
Shapoval, A.V.: Description of the image structure in modern art criticism analysis. Izvestiya Samarskogo nauchnogo tsentra Rossiyskoy akademii nauk 13(2), 240–246 (2011). (in Russian)
Velichkovsky, B.M.: Cognitive science: fundamentals of the psychology of cognition. In: 2 volumes. Smysl/Akademiya, Moscow (2006). (in Russian)
Vygotsky, L.: Collected works in 6 volumes, vol. 3. Pedagogika, Moscow (1983). (in Russian)
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (2004). https://doi.org/10.1007/BF00992696
Yi, K., Wu, J., Gan, C., Torralba, A., Kohli, P., Tenenbaum, J.B.: Neural-symbolic VQA: disentangling reasoning from vision and language understanding. arXiv e-prints arXiv:1810.02338 (2018)
Zaporozhets, A.V., Lisina, M.I.: Development of Perception in Early and Preschool Childhood. Prosveshchenie Publishing House, Moscow (1966). (in Russian)
Zellers, R., Bisk, Y., Farhadi, A., Choi, Y.: From recognition to cognition: visual commonsense reasoning. CoRR abs/1811.10830 (2018)
Acknowledgements
The reported study was supported by RFBR, research Projects No. 19-37-90164 and 18-29-22027.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kovalev, A.K., Shaban, M., Chuganskaya, A.A., Panov, A.I. (2021). Applying Vector Symbolic Architecture and Semiotic Approach to Visual Dialog. In: Sanjurjo González, H., Pastor López, I., GarcÃa Bringas, P., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2021. Lecture Notes in Computer Science(), vol 12886. Springer, Cham. https://doi.org/10.1007/978-3-030-86271-8_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-86271-8_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86270-1
Online ISBN: 978-3-030-86271-8
eBook Packages: Computer ScienceComputer Science (R0)