Applying Vector Symbolic Architecture and Semiotic Approach to Visual Dialog

Kovalev, Alexey K.; Shaban, Makhmud; Chuganskaya, Anfisa A.; Panov, Aleksandr I.

doi:10.1007/978-3-030-86271-8_21

Alexey K. Kovalev^13,14,
Makhmud Shaban¹⁴,
Anfisa A. Chuganskaya¹³ &
…
Aleksandr I. Panov^13,15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12886))

Included in the following conference series:

International Conference on Hybrid Artificial Intelligence Systems

1369 Accesses
2 Citations

Abstract

The multi-modal tasks have started to play a significant role in the research on Artificial Intelligence. A particular example of that domain is visual-linguistic tasks, such as Visual Question Answering and its extension, Visual Dialog. In this paper, we concentrate on the Visual Dialog task and dataset. The task involves two agents. The first agent does not see an image and asks questions about the image content. The second agent sees this image and answers questions. The symbol grounding problem, or how symbols obtain their meanings, plays a crucial role in such tasks. We approach that problem from the semiotic point of view and propose the Vector Semiotic Architecture for Visual Dialog. The Vector Semiotic Architecture is a combination of the Sign-Based World Model and Vector Symbolic Architecture. The Sign-Based World Model represents agent knowledge on the high level of abstraction and allows uniform representation of different aspects of knowledge, forming a hierarchical representation of that knowledge in the form of a special kind of semantic network. The Vector Symbolic Architecture represents the computational level and allows to operate with symbols as with numerical vectors using simple element-wise operations. That combination enables grounding object representation from any level of abstraction to the sensory agent input.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Agrawal, A., et al.: VQA: visual question answering. arXiv e-prints arXiv:1505.00468 (2015)
Bartlett, F.C.: Remembering: a study in experimental and social psychology. Philosophy 8(31), 374–376 (1932)
Google Scholar
Bernstein, A.N.: On dexterity and its development. Publishing House “Physical Culture and Sport", Moscow (1991). (in Russian)
Google Scholar
Besold, T.R., Kühnberger, K.U.: Towards integrated neural - symbolic systems for human-level AI: two research programs helping to bridge the gaps. Biol. Inspir. Cogn. Architect. 14, 97–110 (2015)
Google Scholar
Chomskaya, E.D.: Neuropsychology, 4th edn. Peter (2005). (in Russian)
Google Scholar
Das, A., et al.: Visual dialog. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Gibson, J.: The Perception of the Visual World. Houghton Mifflin, Boston (1950)
Google Scholar
Gorodetskiy, A., Shlychkova, A., Panov, A.I.: Delta schema network in model-based reinforcement learning. In: Goertzel, B., Panov, A.I., Potapov, A., Yampolskiy, R. (eds.) AGI 2020. LNCS (LNAI), vol. 12177, pp. 172–182. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52152-3_18
Chapter Google Scholar
Gupta, A., Dollar, P., Girshick, R.: LVIS: a dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2019-June, pp. 5351–5359 (2019)
Google Scholar
Harnad, S.: The symbol grounding problem. Physica D 42(1), 335–346 (1990)
Article Google Scholar
Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Zitnick, C.L., Girshick, R.: CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning. In: CVPR (2017)
Google Scholar
Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L., Levy, O.: SpanBERT: improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Linguist. 8, 64–77 (2019)
Article Google Scholar
Kanerva, P.: Hyperdimensional computing: an introduction to computing in distributed representation with high-dimensional random vectors. Cogn. Comput. 1(2), 139–159 (2009). https://doi.org/10.1007/s12559-009-9009-8
Article Google Scholar
Kiselev, G., Kovalev, A., Panov, A.I.: Spatial reasoning and planning in sign-based world model. In: Kuznetsov, S.O., Osipov, G.S., Stefanuk, V.L. (eds.) RCAI 2018. CCIS, vol. 934, pp. 1–10. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00617-4_1
Chapter Google Scholar
Kiselev, G.A., Panov, A.I.: Synthesis of the behavior plan for group of robots with sign based world model. In: Ronzhin, A., Rigoll, G., Meshcheryakov, R. (eds.) ICR 2017. LNCS (LNAI), vol. 10459, pp. 83–94. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66471-2_10
Chapter Google Scholar
Kovalev, A.K., Panov, A.I.: Mental actions and modelling of reasoning in semiotic approach to AGI. In: Hammer, P., Agrawal, P., Goertzel, B., Iklé, M. (eds.) AGI 2019. LNCS (LNAI), vol. 11654, pp. 121–131. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27005-6_12
Chapter Google Scholar
Kovalev, A.K., Panov, A.I., Osipov, E.: Hyperdimensional representations in semiotic approach to AGI. In: Goertzel, B., Panov, A.I., Potapov, A., Yampolskiy, R. (eds.) AGI 2020. LNCS (LNAI), vol. 12177, pp. 231–241. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52152-3_24
Chapter Google Scholar
Lee, K., He, L., Zettlemoyer, L.: Higher-order coreference resolution with coarse-to-fine inference. In: NAACL-HLT (2018)
Google Scholar
Leontiev, A.N.: Psychology of the image [in russian]. Vestn. Mosk. un-ta. Ser. 14, Psychology, no. 2, pp. 3–13 (1979)
Google Scholar
Lin, T.-Y.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Neisser, U.: Cognition and Reality: Principles and Implications of Cognitive Psychology. W. H Freeman and Company, New York (1976)
Google Scholar
Osipov, G.S., Panov, A.I., Chudova, N.V.: Behavior control as a function of consciousness. I. World model and goal setting. J. Comput. Syst. Sci. Int. 53(4), 517–529 (2014)
Article MathSciNet Google Scholar
Osipov, G.S.: Signs-based vs. symbolic models. In: Sidorov, G., Galicia-Haro, S.N. (eds.) MICAI 2015. LNCS (LNAI), vol. 9413, pp. 3–11. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27060-9_1
Chapter Google Scholar
Panov, A.I.: Goal setting and behavior planning for cognitive agents. Sci. Tech. Inf. Process. 46(6), 404–415 (2019)
Article Google Scholar
Piaget, J.: Les mécanismes perceptifs. Presses universitaires de France, Paris (1961). (in French)
Google Scholar
Poddyakov, N.N.: Features of Mental Development of Preschool Children. Professional Education Publishing House, Moscow (1996). [in Russian]
Google Scholar
Shapoval, A.V.: Description of the image structure in modern art criticism analysis. Izvestiya Samarskogo nauchnogo tsentra Rossiyskoy akademii nauk 13(2), 240–246 (2011). (in Russian)
Google Scholar
Velichkovsky, B.M.: Cognitive science: fundamentals of the psychology of cognition. In: 2 volumes. Smysl/Akademiya, Moscow (2006). (in Russian)
Google Scholar
Vygotsky, L.: Collected works in 6 volumes, vol. 3. Pedagogika, Moscow (1983). (in Russian)
Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (2004). https://doi.org/10.1007/BF00992696
Article MATH Google Scholar
Yi, K., Wu, J., Gan, C., Torralba, A., Kohli, P., Tenenbaum, J.B.: Neural-symbolic VQA: disentangling reasoning from vision and language understanding. arXiv e-prints arXiv:1810.02338 (2018)
Zaporozhets, A.V., Lisina, M.I.: Development of Perception in Early and Preschool Childhood. Prosveshchenie Publishing House, Moscow (1966). (in Russian)
Google Scholar
Zellers, R., Bisk, Y., Farhadi, A., Choi, Y.: From recognition to cognition: visual commonsense reasoning. CoRR abs/1811.10830 (2018)
Google Scholar

Download references

Acknowledgements

The reported study was supported by RFBR, research Projects No. 19-37-90164 and 18-29-22027.

Author information

Authors and Affiliations

Artificial Intelligence Research Institute FRC CSC RAS, Moscow, Russia
Alexey K. Kovalev, Anfisa A. Chuganskaya & Aleksandr I. Panov
HSE University, Moscow, Russia
Alexey K. Kovalev & Makhmud Shaban
Moscow Institute of Physics and Technology, Moscow, Russia
Aleksandr I. Panov

Authors

Alexey K. Kovalev
View author publications
You can also search for this author in PubMed Google Scholar
Makhmud Shaban
View author publications
You can also search for this author in PubMed Google Scholar
Anfisa A. Chuganskaya
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandr I. Panov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Deusto, Bilbao, Spain
Hugo Sanjurjo González
University of Deusto, Bilbao, Spain
Iker Pastor López
University of Deusto, Bilbao, Spain
Pablo García Bringas
University of A Coruña, A Coruña, Spain
Héctor Quintián
University of Salamanca, Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kovalev, A.K., Shaban, M., Chuganskaya, A.A., Panov, A.I. (2021). Applying Vector Symbolic Architecture and Semiotic Approach to Visual Dialog. In: Sanjurjo González, H., Pastor López, I., García Bringas, P., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2021. Lecture Notes in Computer Science(), vol 12886. Springer, Cham. https://doi.org/10.1007/978-3-030-86271-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-86271-8_21
Published: 15 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86270-1
Online ISBN: 978-3-030-86271-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics