Abstract
We argue the agent’s low generalization problem for searching target object in challenging visual navigation could be solved by "how" and "where" allowing the agent utilizing the scene priors. Although, recent works endow scene priors as fixed spatial features to provide good generalization in novel environment. However, these priors cannot adapt to new scenes. How to build scene priors and where to use the priors in visual navigation has not been well explored. We propose visual relationship detection module to adaptively build relational scene graph as priors. Besides, in order to use priors, we propose Graph attention Markov logical inference Network (GMN) module, which encodes the scene priors and performs precise action inference. GMN updates the graph structure in an unknown scene and estimates the shortest path in scene graph, whose emission probabilities from path to actions are pointwised by action samples in reinforcement learning to get optimal navigation policy. The whole navigation framework is driven by unsupervised reinforcement learning (RL) to exploit the environment. We conduct experiments on the AI2THOR virtual environment, and the results outperform the current most state-of-the-art both in SPL (Success weighted by Path Length) and success rate.
Similar content being viewed by others
References
Chaplot DS, Gandhi DP, Gupta A, Salakhutdinov Russ R (2020) Object goal navigation using goal-oriented semantic exploration. Adv Neural Inf Process Syst, 33
Zhu Y, Zhu F, Zhan Z, Lin B, Jiao J, Chang X, Liang X(2020) Vision-dialog navigation by exploring cross-modal memory. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10730–10739
Zareian A, Wang Z, You H, Chang SFu (2020) Learning visual commonsense for robust scene graph generation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16, pp 642–657. Springer
Miyamoto R, Adachi M, Ishida H, Watanabe T, Matsutani K, Komatsuzaki H, Sakata S, Yokota R, Kobayashi S (2020) Visual navigation based on semantic segmentation using only a monocular camera as an external sensor. J Robot Mech 32(6):1137–1153
Campos C, Elvira R, Rodríguez JJG, Montiel JMM, Tardós JD (2021) Orb-slam3: an accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Trans Robot
Vashishtha G, Kumar R (2022) An amended grey wolf optimization with mutation strategy to diagnose bucket defects in pelton wheel. Measurement 187:110272
Wu Y, Wu Y, Tamar A, Russell S, Gkioxari G, Tian Y (2019) Bayesian relational memory for semantic visual navigation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2769–2779
Yang W, Wang X, Farhadi A, Gupta A, Mottaghi R (2018) Visual semantic navigation using scene priors. arXiv preprint: arXiv:1810.06543
Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta A, Fei-Fei L, Farhadi A (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 3357–3364. IEEE
Wortsman M, Ehsani K, Rastegari M, Farhadi A, Mottaghi R (2019) Learning to learn how to learn: Self-adaptive visual navigation using meta-learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6750–6759
Gupta S, Davidson J, Levine S, Sukthankar R, Malik J (2017) Cognitive mapping and planning for visual navigation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2616–2625
Mayo B, Hazan T, Tal A (2021) Visual navigation with spatial attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 16898–16907
Kolve E, Mottaghi R, Han W, VanderBilt E, Weihs L, Herrasti A, Gordon D, Zhu Y, Gupta A, Farhadi A (2017) Ai2-thor: An interactive 3d environment for visual ai. arXiv preprint: arXiv:1712.05474
Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li LJ, Shamma DA et al. (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73
Kazemi MMM, Wu Q, Abbasnejad E, Shi J (2020) Utilising prior knowledge for visual navigation: distil and adapt. arXiv e-prints: arXiv–2004
Kazemi MM, Wu Q, Abbasnejad E, Qinfeng SJ (2020) Optimistic agent: accurate graph-based value estimation for more successful visual navigation. arXiv e-prints: arXiv–2004
Anderson P, Wu Q, Teney D, Bruce J, Johnson M, Sünderhauf N, Reid I, Gould S, Van DHA (2018) Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3674–3683
Wang X, Xiong W, Wang H, Wang WY (2018) Look before you leap: Bridging model-free and model-based reinforcement learning for planned-ahead vision-and-language navigation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 37–53
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp 1126–1135. PMLR
Moghaddam MK, Wu Q, Abbasnejad E, Shi J (2021) Optimistic agent: accurate graph-based value estimation for more successful visual navigation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 3733–3742
Qiu Y, Pal A, Christensen HI (2020) Target driven visual navigation exploiting object relationships. arXiv preprint: arXiv:2003.06749, 3(7),
Xu Z, Dhamankar G, Nair A, Xiao X, Warnell G, Liu B, Wang Z, Stone P (2020) Applr: Adaptive planner parameter learning from reinforcement. arXiv preprint: arXiv:2011.00397
Espeholt L, Marinier Rl, Stanczyk P, Wang K, Michalski M (2019) Seed rl: Scalable and efficient deep-rl with accelerated central inference. arXiv preprint: arXiv:1910.06591
Liu Y, Cong Y, Sun G (2019) Memory-based parameterized skills learning for mapless visual navigation. In: 2019 IEEE International Conference on Image Processing (ICIP), pp 1890–1894. IEEE
Pritzel A , Banino A , Uria B , Zhang BC, Barry C , Blundell C, Beattie C , Hassabis D, Kumaran D, Viola F et al (2018) Vector-based navigation using grid-like representations in artificial agents
Druon R, Yoshiyasu Y, Kanezaki A, Watt A (2020) Visual object search by learning spatial context. IEEE Robot Auto Lett 5(2):1279–1286
Xu D, Zhu Y, Choy CB, Fei-Fei L (2017) Scene graph generation by iterative message passing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5410–5419
Yang J, Lu J, Lee S, Batra D, Parikh D (2018) Graph r-cnn for scene graph generation. In: Proceedings of the European conference on computer vision (ECCV), p 670–685
Zellers R, Yatskar M, Thomson S, Choi Y (2018) Neural motifs: scene graph parsing with global context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5831–5840
Li Y, Yang X, Xu C (2020) Structured neural motifs: Scene graph parsing via enhanced context. In: International Conference on Multimedia Modeling, pp 175–188. Springer,
Lv Yunlian, Xie Ning, Shi Yimin, Wang Zijiao, Shen Heng Tao (2020) Improving target-driven visual navigation with attention on 3d spatial relationships. arXiv preprint: arXiv:2005.02153
Du H, Yu X, Zheng L (2020) Learning object relation graph and tentative policy for visual navigation. In: European Conference on Computer Vision, pp 19–34. Springer
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint: arXiv:1804.02767
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint: arXiv:1609.02907
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv preprint: arXiv:1710.10903
Yun S, Jeong M, Kim R, Kang J, Kim HJ (2019) Graph transformer networks. Adv Neural Inf Process Syst 32:11983–11993
Meng X, Ratliff N, Xiang Y, Fox D (2020) Scaling local control to large-scale topological navigation. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp 672–678. IEEE
Dai Hanjun, Li Yujia, Wang Chenglong, Singh Rishabh, Huang Po-Sen, Kohli Pushmeet (2019) Learning transferable graph exploration. Advances in Neural Information Processing Systems 32:2518–2529
Ryu H, Shin H, Park J (2020) Multi-agent actor-critic with hierarchical graph attention network. In: Proceedings of the AAAI Conference on Artificial Intelligence 34:7236–7243
Li L, Gan Z, Cheng Y, Liu J (2019) Relation-aware graph attention network for visual question answering. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10313–10322
Vijay VK, Ganesh A, Tang H, Bansal A (2019) Generalization to novel objects using prior relational knowledge. arXiv preprint: arXiv:1906.11315
Yoon KJ, Liao R, Xiong Y, Zhang L, Fetaya E, Urtasun R, Zemel R, Pitkow X (2019) Inference in probabilistic graphical models by graph neural networks. In: 2019 53rd Asilomar Conference on Signals, Systems, and Computers, pp 868–875. IEEE
Qiaoyun W, Manocha D, Wang J, Kai X (2020) Neonav: improving the generalization of visual navigation via generating next expected observations. In: Proceedings of the AAAI Conference on Artificial Intelligence 34:10001–10008
Arias M, Pérez-Martín J, Luque M, Díez FJ (2019) Openmarkov, an open-source tool for probabilistic graphical models. In: IJCAI, pp 6485–6487
Chauhan S, Vashishtha G, Kumar A (2021) A symbiosis of arithmetic optimizer with slime mould algorithm for improving global optimization and conventional design problem. J Supercomput, pp 1–41
Chauhan S, Vashishtha G (2021) Mutation-based arithmetic optimization algorithm for global optimization. In: 2021 International Conference on Intelligent Technologies (CONIT), pp1–6. IEEE
Chauhan S, Singh M, Aggarwal AK (2021) Cluster head selection in heterogeneous wireless sensor network using a new evolutionary algorithm. Wireless Personal Commun, pp 1–32
Vashishtha G, Kumar R (2021) Pelton wheel bucket fault diagnosis using improved shannon entropy and expectation maximization principal component analysis. J Vib Eng Technol, pp 1–15
Yi L, Chen Y, Zhao D, Li D (2021) Mgrl: graph neural network based inference in a markov network with reinforcement learning for visual navigation. Neurocomputing 421:140–150
Mittal H, Bhardwaj A, Gogate V, Singla P (2019) Domain-size aware markov logic networks. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp 3216–3224. PMLR
Poole D, Buchman D, Kazemi SM, Kersting K, Natarajan S (2014) Population size extrapolation in relational probabilistic modelling. In: International Conference on Scalable Uncertainty Management, pp 292–305. Springer
Qiaoyun W, Kai X, Wang J, Mingliang X, Gong X, Manocha D (2021) Reinforcement learning-based visual navigation with information-theoretic regularization. IEEE Robot Auto Lett 6(2):731–738
Zeng Z, Röfer A, Jenkins OC (2020) Semantic linking maps for active visual object search. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp 1984–1990. IEEE
Pennington J, Socher R , Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Harsani P, Suhendra A, Wulandari L, Wibowo WC (2020) A study using machine learning with ngram model in harmonized system classification. J Adv Res Dyn Control Syst, 12(6 Special Issue):145–153
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint: arXiv:2004.10934
Athiwaratkun B, Wilson AG, Anandkumar A (2018) Probabilistic fasttext for multi-sense word embeddings. arXiv preprint: arXiv:1806.02901
Acknowledgements
We acknowledge the support of the National Key Research and Development Program of China under Grant 2018YFB1305001, and Wuhan Science and Technology Planning Application Foundation Frontier Project, No.2019010701011413.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhou, K., Guo, C. & Zhang, H. Relational attention-based Markov logic network for visual navigation. J Supercomput 78, 9907–9933 (2022). https://doi.org/10.1007/s11227-021-04283-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-04283-5