Skip to main content
Log in

Relational attention-based Markov logic network for visual navigation

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

We argue the agent’s low generalization problem for searching target object in challenging visual navigation could be solved by "how" and "where" allowing the agent utilizing the scene priors. Although, recent works endow scene priors as fixed spatial features to provide good generalization in novel environment. However, these priors cannot adapt to new scenes. How to build scene priors and where to use the priors in visual navigation has not been well explored. We propose visual relationship detection module to adaptively build relational scene graph as priors. Besides, in order to use priors, we propose Graph attention Markov logical inference Network (GMN) module, which encodes the scene priors and performs precise action inference. GMN updates the graph structure in an unknown scene and estimates the shortest path in scene graph, whose emission probabilities from path to actions are pointwised by action samples in reinforcement learning to get optimal navigation policy. The whole navigation framework is driven by unsupervised reinforcement learning (RL) to exploit the environment. We conduct experiments on the AI2THOR virtual environment, and the results outperform the current most state-of-the-art both in SPL (Success weighted by Path Length) and success rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Chaplot DS, Gandhi DP, Gupta A, Salakhutdinov Russ R (2020) Object goal navigation using goal-oriented semantic exploration. Adv Neural Inf Process Syst, 33

  2. Zhu Y, Zhu F, Zhan Z, Lin B, Jiao J, Chang X, Liang X(2020) Vision-dialog navigation by exploring cross-modal memory. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10730–10739

  3. Zareian A, Wang Z, You H, Chang SFu (2020) Learning visual commonsense for robust scene graph generation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16, pp 642–657. Springer

  4. Miyamoto R, Adachi M, Ishida H, Watanabe T, Matsutani K, Komatsuzaki H, Sakata S, Yokota R, Kobayashi S (2020) Visual navigation based on semantic segmentation using only a monocular camera as an external sensor. J Robot Mech 32(6):1137–1153

  5. Campos C, Elvira R, Rodríguez JJG, Montiel JMM, Tardós JD (2021) Orb-slam3: an accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Trans Robot

  6. Vashishtha G, Kumar R (2022) An amended grey wolf optimization with mutation strategy to diagnose bucket defects in pelton wheel. Measurement 187:110272

    Article  Google Scholar 

  7. Wu Y, Wu Y, Tamar A, Russell S, Gkioxari G, Tian Y (2019) Bayesian relational memory for semantic visual navigation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2769–2779

  8. Yang W, Wang X, Farhadi A, Gupta A, Mottaghi R (2018) Visual semantic navigation using scene priors. arXiv preprint: arXiv:1810.06543

  9. Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta A, Fei-Fei L, Farhadi A (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 3357–3364. IEEE

  10. Wortsman M, Ehsani K, Rastegari M, Farhadi A, Mottaghi R (2019) Learning to learn how to learn: Self-adaptive visual navigation using meta-learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6750–6759

  11. Gupta S, Davidson J, Levine S, Sukthankar R, Malik J (2017) Cognitive mapping and planning for visual navigation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2616–2625

  12. Mayo B, Hazan T, Tal A (2021) Visual navigation with spatial attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 16898–16907

  13. Kolve E, Mottaghi R, Han W, VanderBilt E, Weihs L, Herrasti A, Gordon D, Zhu Y, Gupta A, Farhadi A (2017) Ai2-thor: An interactive 3d environment for visual ai. arXiv preprint: arXiv:1712.05474

  14. Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li LJ, Shamma DA et al. (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73

  15. Kazemi MMM, Wu Q, Abbasnejad E, Shi J (2020) Utilising prior knowledge for visual navigation: distil and adapt. arXiv e-prints: arXiv–2004

  16. Kazemi MM, Wu Q, Abbasnejad E, Qinfeng SJ (2020) Optimistic agent: accurate graph-based value estimation for more successful visual navigation. arXiv e-prints: arXiv–2004

  17. Anderson P, Wu Q, Teney D, Bruce J, Johnson M, Sünderhauf N, Reid I, Gould S, Van DHA (2018) Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3674–3683

  18. Wang X, Xiong W, Wang H, Wang WY (2018) Look before you leap: Bridging model-free and model-based reinforcement learning for planned-ahead vision-and-language navigation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 37–53

  19. Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp 1126–1135. PMLR

  20. Moghaddam MK, Wu Q, Abbasnejad E, Shi J (2021) Optimistic agent: accurate graph-based value estimation for more successful visual navigation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 3733–3742

  21. Qiu Y, Pal A, Christensen HI (2020) Target driven visual navigation exploiting object relationships. arXiv preprint: arXiv:2003.06749, 3(7),

  22. Xu Z, Dhamankar G, Nair A, Xiao X, Warnell G, Liu B, Wang Z, Stone P (2020) Applr: Adaptive planner parameter learning from reinforcement. arXiv preprint: arXiv:2011.00397

  23. Espeholt L, Marinier Rl, Stanczyk P, Wang K, Michalski M (2019) Seed rl: Scalable and efficient deep-rl with accelerated central inference. arXiv preprint: arXiv:1910.06591

  24. Liu Y, Cong Y, Sun G (2019) Memory-based parameterized skills learning for mapless visual navigation. In: 2019 IEEE International Conference on Image Processing (ICIP), pp 1890–1894. IEEE

  25. Pritzel A , Banino A , Uria B , Zhang BC, Barry C , Blundell C, Beattie C , Hassabis D, Kumaran D, Viola F et al (2018) Vector-based navigation using grid-like representations in artificial agents

  26. Druon R, Yoshiyasu Y, Kanezaki A, Watt A (2020) Visual object search by learning spatial context. IEEE Robot Auto Lett 5(2):1279–1286

    Article  Google Scholar 

  27. Xu D, Zhu Y, Choy CB, Fei-Fei L (2017) Scene graph generation by iterative message passing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5410–5419

  28. Yang J, Lu J, Lee S, Batra D, Parikh D (2018) Graph r-cnn for scene graph generation. In: Proceedings of the European conference on computer vision (ECCV), p 670–685

  29. Zellers R, Yatskar M, Thomson S, Choi Y (2018) Neural motifs: scene graph parsing with global context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5831–5840

  30. Li Y, Yang X, Xu C (2020) Structured neural motifs: Scene graph parsing via enhanced context. In: International Conference on Multimedia Modeling, pp 175–188. Springer,

  31. Lv Yunlian, Xie Ning, Shi Yimin, Wang Zijiao, Shen Heng Tao (2020) Improving target-driven visual navigation with attention on 3d spatial relationships. arXiv preprint: arXiv:2005.02153

  32. Du H, Yu X, Zheng L (2020) Learning object relation graph and tentative policy for visual navigation. In: European Conference on Computer Vision, pp 19–34. Springer

  33. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint: arXiv:1804.02767

  34. Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint: arXiv:1609.02907

  35. Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv preprint: arXiv:1710.10903

  36. Yun S, Jeong M, Kim R, Kang J, Kim HJ (2019) Graph transformer networks. Adv Neural Inf Process Syst 32:11983–11993

  37. Meng X, Ratliff N, Xiang Y, Fox D (2020) Scaling local control to large-scale topological navigation. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp 672–678. IEEE

  38. Dai Hanjun, Li Yujia, Wang Chenglong, Singh Rishabh, Huang Po-Sen, Kohli Pushmeet (2019) Learning transferable graph exploration. Advances in Neural Information Processing Systems 32:2518–2529

    Google Scholar 

  39. Ryu H, Shin H, Park J (2020) Multi-agent actor-critic with hierarchical graph attention network. In: Proceedings of the AAAI Conference on Artificial Intelligence 34:7236–7243

  40. Li L, Gan Z, Cheng Y, Liu J (2019) Relation-aware graph attention network for visual question answering. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10313–10322

  41. Vijay VK, Ganesh A, Tang H, Bansal A (2019) Generalization to novel objects using prior relational knowledge. arXiv preprint: arXiv:1906.11315

  42. Yoon KJ, Liao R, Xiong Y, Zhang L, Fetaya E, Urtasun R, Zemel R, Pitkow X (2019) Inference in probabilistic graphical models by graph neural networks. In: 2019 53rd Asilomar Conference on Signals, Systems, and Computers, pp 868–875. IEEE

  43. Qiaoyun W, Manocha D, Wang J, Kai X (2020) Neonav: improving the generalization of visual navigation via generating next expected observations. In: Proceedings of the AAAI Conference on Artificial Intelligence 34:10001–10008

  44. Arias M, Pérez-Martín J, Luque M, Díez FJ (2019) Openmarkov, an open-source tool for probabilistic graphical models. In: IJCAI, pp 6485–6487

  45. Chauhan S, Vashishtha G, Kumar A (2021) A symbiosis of arithmetic optimizer with slime mould algorithm for improving global optimization and conventional design problem. J Supercomput, pp 1–41

  46. Chauhan S, Vashishtha G (2021) Mutation-based arithmetic optimization algorithm for global optimization. In: 2021 International Conference on Intelligent Technologies (CONIT), pp1–6. IEEE

  47. Chauhan S, Singh M, Aggarwal AK (2021) Cluster head selection in heterogeneous wireless sensor network using a new evolutionary algorithm. Wireless Personal Commun, pp 1–32

  48. Vashishtha G, Kumar R (2021) Pelton wheel bucket fault diagnosis using improved shannon entropy and expectation maximization principal component analysis. J Vib Eng Technol, pp 1–15

  49. Yi L, Chen Y, Zhao D, Li D (2021) Mgrl: graph neural network based inference in a markov network with reinforcement learning for visual navigation. Neurocomputing 421:140–150

  50. Mittal H, Bhardwaj A, Gogate V, Singla P (2019) Domain-size aware markov logic networks. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp 3216–3224. PMLR

  51. Poole D, Buchman D, Kazemi SM, Kersting K, Natarajan S (2014) Population size extrapolation in relational probabilistic modelling. In: International Conference on Scalable Uncertainty Management, pp 292–305. Springer

  52. Qiaoyun W, Kai X, Wang J, Mingliang X, Gong X, Manocha D (2021) Reinforcement learning-based visual navigation with information-theoretic regularization. IEEE Robot Auto Lett 6(2):731–738

  53. Zeng Z, Röfer A, Jenkins OC (2020) Semantic linking maps for active visual object search. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp 1984–1990. IEEE

  54. Pennington J, Socher R , Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  55. Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

  56. Harsani P, Suhendra A, Wulandari L, Wibowo WC (2020) A study using machine learning with ngram model in harmonized system classification. J Adv Res Dyn Control Syst, 12(6 Special Issue):145–153

  57. Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint: arXiv:2004.10934

  58. Athiwaratkun B, Wilson AG, Anandkumar A (2018) Probabilistic fasttext for multi-sense word embeddings. arXiv preprint: arXiv:1806.02901

Download references

Acknowledgements

We acknowledge the support of the National Key Research and Development Program of China under Grant 2018YFB1305001, and Wuhan Science and Technology Planning Application Foundation Frontier Project, No.2019010701011413.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Chi Guo or Huyin Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, K., Guo, C. & Zhang, H. Relational attention-based Markov logic network for visual navigation. J Supercomput 78, 9907–9933 (2022). https://doi.org/10.1007/s11227-021-04283-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-04283-5

Keywords

Navigation