Question Answering for Visual Navigation in Human-Centered Environments

Kirilenko, Daniil E.; Kovalev, Alexey K.; Osipov, Evgeny; Panov, Aleksandr I.

doi:10.1007/978-3-030-89820-5_3

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13068))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

801 Accesses
2 Citations

Abstract

In this paper, we propose an HISNav VQA dataset – a challenging dataset for a Visual Question Answering task that is aimed at the needs of Visual Navigation in human-centered environments. The dataset consists of images of various room scenes that were captured using the Habitat virtual environment and of questions important for navigation tasks using only visual information. We also propose a baseline for a HISNav VQA dataset, a Vector Semiotic Architecture, and demonstrate its performance. The Vector Semiotic Architecture is a combination of a Sign-Based World Model and Vector Symbolic Architectures. The Sign-Based World Model allows representing various aspects of an agent’s knowledge, and Vector Symbolic Architectures serve on a low computational level. The Vector Semiotic Architecture addresses the symbol grounding problem that plays an important role in the Visual Question Answering Task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://bit.ly/2XR5OUc.
2.
https://toloka.yandex.com.

References

Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. arXiv e-prints arXiv:1707.07998, July 2017
Anderson, P., et al.: Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments (2018)
Google Scholar
Bahdanau, D., Cho, K.H., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1–15 (2015)
Google Scholar
Das, A., et al.: Visual dialog. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Eliasmith, C.: How to Build a Brain: A Neural Architecture for Biological Cognition. Oxford University Press, New York (2013)
Google Scholar
Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in Visual Question Answering. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Gupta, A., Dollar, P., Girshick, R.: LVIS: a dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2019, pp. 5351–5359 (2019)
Google Scholar
Gurari, D., et al.: VizWiz grand challenge: answering visual questions from blind people. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3608–3617 (2018)
Google Scholar
Harnad, S.: The symbol grounding problem. Physica D 42(1), 335–346 (1990)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Zitnick, C.L., Girshick, R.: CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning. In: CVPR (2017)
Google Scholar
Kanerva, P.: Hyperdimensional computing: an introduction to computing in distributed representation with high-dimensional random vectors. Cogn. Comput. 1(2), 139–159 (2009)
Article Google Scholar
Kiselev, G., Kovalev, A., Panov, A.I.: Spatial reasoning and planning in sign-based world model. In: Kuznetsov, S.O., Osipov, G.S., Stefanuk, V.L. (eds.) RCAI 2018. CCIS, vol. 934, pp. 1–10. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00617-4_1
Chapter Google Scholar
Kiselev, G., Panov, A.: Hierarchical psychologically inspired planning for human-robot interaction tasks. In: Ronzhin, A., Rigoll, G., Meshcheryakov, R. (eds.) ICR 2019. LNCS (LNAI), vol. 11659, pp. 150–160. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26118-4_15
Chapter Google Scholar
Komer, B., Stewart, T.C., Voelker, A.R., Eliasmith, C.: A neural representation of continuous space using fractional binding. In: 41st Annual Meeting of the Cognitive Science Society. Cognitive Science Society, QC (2019)
Google Scholar
Kovalev, A.K., Panov, A.I.: Mental actions and modelling of reasoning in semiotic approach to AGI. In: Hammer, P., Agrawal, P., Goertzel, B., Iklé, M. (eds.) AGI 2019. LNCS (LNAI), vol. 11654, pp. 121–131. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27005-6_12
Chapter Google Scholar
Kovalev, A.K., Panov, A.I., Osipov, E.: Hyperdimensional representations in semiotic approach to AGI. In: Goertzel, B., Panov, A.I., Potapov, A., Yampolskiy, R. (eds.) AGI 2020. LNCS (LNAI), vol. 12177, pp. 231–241. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52152-3_24
Chapter Google Scholar
Ku, A., Anderson, P., Patel, R., Ie, E., Baldridge, J.: Room-across-room: multilingual vision-and-language navigation with dense spatiotemporal grounding. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4392–4412, November 2020
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Linda Smith, M.G.: The development of embodied cognition: six lessons from babies. Artif. Life 11, 13–29 (2005)
Article Google Scholar
Savva, M., et al.: Habitat: a platform for embodied AI research. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Osipov, G.S., Panov, A.I., Chudova, N.V.: Behavior control as a function of consciousness. I. world model and goal setting. J. Comput. Syst. Sci. Int. 53(4), 517–529 (2014)
Google Scholar
Osipov, G.S., Panov, A.I.: Relationships and operations in a sign-based world model of the actor. Sci. Tech. Inf. Process. 45(5), 317–330 (2018). https://doi.org/10.3103/S0147688218050040
Article Google Scholar
Osipov, G.S., Panov, A.I.: Rational behaviour planning of cognitive semiotic agent in dynamic environment. Sci. Tech. Inf. Process. 48(6) (2021)
Google Scholar
Panov, A.I.: Goal setting and behavior planning for cognitive agents. Sci. Tech. Inf. Process. 46(6), 404–415 (2019)
Article Google Scholar
Panov, A.I.: Behavior planning of intelligent agent with sign world model. Biol. Inspired Cogn. Archit. 19, 21–31 (2017)
Google Scholar
Plate, T.A.: Holographic reduced representations. IEEE Trans. Neural Networks 6(3), 623–641 (1995). https://doi.org/10.1109/72.377968
Article Google Scholar
Staroverov, A., Yudin, D.A., Belkin, I., Adeshkin, V., Solomentsev, Y.K., Panov, A.I.: Real-time object navigation with deep neural networks and hierarchical reinforcement learning. IEEE Access 8, 195608–195621 (2020)
Article Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (2004)
MATH Google Scholar
Yi, K., Wu, J., Gan, C., Torralba, A., Kohli, P., Tenenbaum, J.B.: Neural-symbolic VQA: disentangling reasoning from vision and language understanding. arXiv e-prints arXiv:1810.02338, October 2018
Yu, J., Zhu, Z., Wang, Y., Zhang, W., Hu, Y., Tan, J.: Cross-modal knowledge reasoning for knowledge-based visual question answering. Pattern Recogn. 108, 107563 (2020)
Article Google Scholar
Zellers, R., Bisk, Y., Farhadi, A., Choi, Y.: From recognition to cognition: visual commonsense reasoning. CoRR abs/1811.10830 (2018)
Google Scholar

Download references

Acknowledgements

The reported study was supported by RFBR, research Project No. 19-37-90164.

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Moscow, Russia
Daniil E. Kirilenko & Aleksandr I. Panov
HSE University, Moscow, Russia
Alexey K. Kovalev
Artificial Intelligence Research Institute FRC CSC RAS, Moscow, Russia
Alexey K. Kovalev & Aleksandr I. Panov
Lulea University of Technology, Lulea, Sweden
Evgeny Osipov

Authors

Daniil E. Kirilenko
View author publications
You can also search for this author in PubMed Google Scholar
Alexey K. Kovalev
View author publications
You can also search for this author in PubMed Google Scholar
Evgeny Osipov
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandr I. Panov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexey K. Kovalev .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Ildar Batyrshin
Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Alexander Gelbukh
Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Grigori Sidorov

Appendices

A Appendix: Data Labeling

B Appendix: Examples

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kirilenko, D.E., Kovalev, A.K., Osipov, E., Panov, A.I. (2021). Question Answering for Visual Navigation in Human-Centered Environments. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Soft Computing. MICAI 2021. Lecture Notes in Computer Science(), vol 13068. Springer, Cham. https://doi.org/10.1007/978-3-030-89820-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-89820-5_3
Published: 21 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89819-9
Online ISBN: 978-3-030-89820-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics