Skip to main content
Log in

Deep introspective SLAM: deep reinforcement learning based approach to avoid tracking failure in visual SLAM

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

Reliable and consistent tracking is essential to realize the dream of power-on-and-go autonomy in mobile robots. Our investigation with state-of-the-art visual navigation and mapping tools (e.g. ORB-SLAM) reveals that these tools suffer from frequent and unexpected tracking failures, especially when tested in the wild. This hinders the ability of robots to reach a goal position less than 10 meters away, without tracking failure, thereby limiting the prospects of real autonomy. We present an introspection-based approach (Introspective-SLAM) that enables SLAM to evaluate safety of navigation steps with respect to tracking failure, before the steps are actually taken. Navigation steps that appear unsafe are thereby avoided, and an alternative path to the goal is planned. We propose a novel deep reinforcement learning (DQN) based network to evaluate safety of future navigation steps using a single image only. Surprisingly, training of our DQN completes in a short amount of time (< 60 h). Even then, this network outperforms several handcrafted and Q-learning based pipelines to achieve state-of-the-art performance. Interestingly, training the DQN in realistic simulators (MINOS), consisting of reconstructed interiors, shows good generalization across real world indoor-outdoor settings. Finally, extensive testing of visual SLAM, equipped with our DQN, shows that tracking failures occur frequently and are a major hindrance in reaching the goal. Currently, there is no standard benchmark to evaluate active visual SLAM approaches. We have released a benchmark of 50 episodes with this work. We hope these findings/benchmark will encourage progress for power-on-and-go visual SLAM without any manual supervision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. https://www.youtube.com/watch?v=Df9WhgibCQA &ab_channel=imperialrobotvision.

  2. https://tinyurl.com/4q6wm453.

  3. https://github.com/knaveed20/Deep-Introspective-SLAM/.

References

  • Agarwal, S., Furukawa, Y., Snavely, N., Simon, I., Curless, B., Seitz, S. M., & Szeliski, R. (2011). Building Rome in a day. Communications of the ACM, 54(10), 105–112.

    Article  Google Scholar 

  • Ahmad, H., Usama, S. M., Hussain, W., & Anjum, M. L. (2021). A sketch is worth a thousand navigational instructions. Autonomous Robots, 45(2), 313–333.

    Article  Google Scholar 

  • Ammirato, P., Poirson, P., Park, E., Košecká, J., & Berg, A. C. (2017). A dataset for developing and benchmarking active vision. In 2017 IEEE international conference on robotics and automation (ICRA) (pp. 1378–1385). IEEE.

  • Anderson, P., Chang, A., Chaplot, D.S., Dosovitskiy, A., Gupta, S., Koltun, V., Kosecka, J., Malik, J., Mottaghi, R., Savva, M., et al. (2018). On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757

  • Bhatti, S., Desmaison, A., Miksik, O., Nardelli, N., Siddharth, N., & Torr, P. H. (2016). Playing doom with slam-augmented deep reinforcement learning. arXiv preprint arXiv:1612.00380

  • Brahmbhatt, S., Gu, J., Kim, K., Hays, J., & Kautz, J. (2018). Geometry-aware learning of maps for camera localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2616–2625).

  • Brahmbhatt, S., & Hays, J. (2017). Deepnav: Learning to navigate large cities. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5193–5202).

  • Chaplot, D. S., Gandhi, D., Gupta, S., Gupta, A., & Salakhutdinov, R. (2020). Learning to explore using active neural slam. arXiv preprint arXiv:2004.05155

  • Charrow, B., Kahn, G., Patil, S., Liu, S., Goldberg, K., Abbeel, P., Michael, N., & Kumar, V. (2015). Information-theoretic planning with trajectory optimization for dense 3d mapping. In Robotics: Science and systems (vol. 11).

  • Costante, G., Forster, C., Delmerico, J., Valigi, P., & Scaramuzza, D. (2016). Perception-aware path planning. arXiv preprint arXiv:1605.04151

  • Cummins, M., & Newman, P. (2011). Appearance-only slam at large scale with fab-map 2.0. The International Journal of Robotics Research, 30(9), 1100–1123.

    Article  Google Scholar 

  • Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255). IEEE.

  • Everingham, M., & Winn, J. (2011). The pascal visual object classes challenge 2012 (voc2012) development kit (p. 8). Pattern analysis, statistical modelling and computational learning, Tech Rep.

  • Eykholt, K., Evtimov, I., Fernandes, E., Li, B., Rahmati, A., Xiao, C., Prakash, A., Kohno, T., & Song, D. (2018). Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1625–1634).

  • Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The Kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition (pp. 3354–3361). IEEE.

  • Gupta, S., Davidson, J., Levine, S., Sukthankar, R., & Malik, J. (2017). Cognitive mapping and planning for visual navigation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2616–2625).

  • Henriques, J. F., & Vedaldi, A. (2018). Mapnet: An allocentric spatial memory for mapping environments. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8476–8484).

  • Höfer, S., Bekris, K., Handa, A., Gamboa, J.C., Golemo, F., Mozifian, M., Atkeson, C., Fox, D., Goldberg, K., Leonard, J., et al. (2020). Perspectives on sim2real transfer for robotics: A summary of the r: Ss 2020 workshop. arXiv preprint arXiv:2012.03806

  • Indelman, V., Carlone, L., & Dellaert, F. (2015). Planning in the continuous domain: A generalized belief space approach for autonomous navigation in unknown environments. The International Journal of Robotics Research, 34(7), 849–882.

    Article  Google Scholar 

  • Kendall, A., Grimes M., & Cipolla, R. (2015). Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision (pp. 2938–2946).

  • Koide, K., Miura, J., Yokozuka, M., Oishi, S., & Banno, A. (2020). Interactive 3d graph slam for map correction. IEEE Robotics and Automation Letters, 6(1), 40–47.

    Article  Google Scholar 

  • Kolve, E., Mottaghi, R., Han, W., VanderBilt, E., Weihs, L., Herrasti, A., Gordon, D., Zhu, Y., Gupta, A., & Farhadi, A. (2017). Ai2-thor: An interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474

  • Kulkarni, T. D., Whitney, W. F., Kohli, P., & Tenenbaum, J. (2015). Deep convolutional inverse graphics network. Advances in Neural Information Processing Systems, 66, 28.

    Google Scholar 

  • Leung, C., Huang, S., & Dissanayake, G. (2006). Active slam using model predictive control and attractor based exploration. In 2006 IEEE/RSJ international conference on intelligent robots and systems (pp. 5026–5031). IEEE.

  • Michels, J., Saxena, A., & Ng, A. Y. (2005). High speed obstacle avoidance using monocular vision and reinforcement learning. In Proceedings of the 22nd international conference on Machine learning (pp. 593–600).

  • Mirowski, P., Grimes, M. K., Malinowski, M., Hermann, K. M., Anderson, K., Teplyashin, D., Simonyan, K., Kavukcuoglu, K., Zisserman, A., & Hadsell, R. (2018). Learning to navigate in cities without a map. arXiv preprint arXiv:1804.00168

  • Mishkin, D., Dosovitskiy, A., & Koltun, V. (2019). Benchmarking classic and learned navigation in complex 3d environments. arXiv preprint arXiv:1901.10915

  • Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602

  • Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.

    Article  Google Scholar 

  • Mostegel, C., Wendel, A., & Bischof, H. (2014). Active monocular localization: Towards autonomous monocular exploration for multirotor mavs. In 2014 IEEE international conference on robotics and automation (ICRA) (pp. 3848–3855). IEEE.

  • Mur-Artal, R., Montiel, J. M. M., & Tardos, J. D. (2015). Orb-slam: A versatile and accurate monocular slam system. IEEE Transactions on Robotics, 31(5), 1147–1163.

    Article  Google Scholar 

  • Nashed, S., & Biswas, J. (2018). Human-in-the-loop slam. In Proceedings of the AAAI conference on artificial intelligence (vol. 32).

  • Newcombe, R. A., Lovegrove, S. J., & Davison, A. J. (2011). Dtam: Dense tracking and mapping in real-time. In 2011 international conference on computer vision (pp. 2320–2327). IEEE.

  • Pomerleau, D. A. (1989). Alvinn: An autonomous land vehicle in a neural network. Tech. rep., Carnegie-Mellon Univ Pittsburgh PA Artificial Intelligence and Psychology.

  • Prasad, V., Yadav, K., Saurabh, R. S., Daga, S., Pareekutty, N., Krishna, K. M., Ravindran, B., & Bhowmick, B. (2018). Learning to prevent monocular slam failure using reinforcement learning. In Proceedings of the 11th Indian conference on computer vision, graphics and image processing (pp. 1–9).

  • Rabiee, S., & Biswas, J. (2020). Iv-slam: Introspective vision for simultaneous localization and mapping. arXiv preprint arXiv:2008.02760

  • Redmon, J., & Farhadi, A. (2017). Yolo9000: Better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263–7271).

  • Ross, S., Melik-Barkhudarov, N., Shankar, K. S., Wendel, A., Dey, D., Bagnell, J. A., & Hebert, M. (2013). Learning monocular reactive UAV control in cluttered natural environments. In 2013 IEEE international conference on robotics and automation (pp. 1765–1772). IEEE.

  • Salas, M., Hussain, W., Concha, A., Montano, L., Civera, J., & Montiel, J. (2015). Layout aware visual tracking and mapping. In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 149–156). IEEE.

  • Sattler, T., Zhou, Q., Pollefeys, M., & Leal-Taixe, L. (2019). Understanding the limitations of CNN-based absolute camera pose regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3302–3312).

  • Savva, M., Chang, A. X., Dosovitskiy, A., Funkhouser, T., & Koltun, V. (2017). Minos: Multimodal indoor simulator for navigation in complex environments. arXiv preprint arXiv:1712.03931

  • Savva, M., Kadian, A., Maksymets, O., Zhao, Y., Wijmans, E., Jain, B., Straub, J., Liu, J., Koltun, V., Malik, J., et al. (2019). Habitat: A platform for embodied AI research. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9339–9347).

  • Saxena, D. M., Kurtz, V., & Hebert, M. (2017). Learning robust failure response for autonomous vision based flight. In 2017 IEEE international conference on robotics and automation (ICRA) (pp. 5824–5829). IEEE.

  • Sidaoui, A., Zein, M. K., Elhajj, I. H., & Asmar, D. (2019). A-slam: Human in-the-loop augmented slam. In 2019 International conference on robotics and automation (ICRA) (pp. 5245–5251). IEEE.

  • Smith, M., Baldwin, I., Churchill, W., Paul, R., & Newman, P. (2009). The new college vision and laser data set. The International Journal of Robotics Research, 28(5), 595–599.

    Article  Google Scholar 

  • Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of rgb-d slam systems. In 2012 IEEE/RSJ international conference on intelligent robots and systems (pp. 573–580). IEEE.

  • Tai, L., Paolo, G., & Liu, M. (2017). Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 31–36). IEEE.

  • Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. In CVPR 2011 (pp. 1521–1528). IEEE.

  • Wang, W., Zhu, D., Wang, X., Hu, Y., Qiu, Y., Wang, C., Hu, Y., Kapoor, A., & Scherer, S. (2020). Tartanair: A dataset to push the limits of visual slam. In: IEEE/RSJ international conference on intelligent robots and systems (IROS).

  • Zafar, M. M., Anjum, M. L., & Hussain, W. (2021). Lta*: Local tangent based a* for optimal path planning. Autonomous Robots, 45(2), 209–227.

    Article  Google Scholar 

  • Zhang, J., Tai, L., Boedecker, J., Burgard, W., & Liu, M. (2017). Neural slam: Learning to explore with external memory. arXiv preprint arXiv:1706.09520

  • Zhang, J., Tai, L., Yun, P., Xiong, Y., Liu, M., Boedecker, J., & Burgard, W. (2019). Vr-goggles for robots: Real-to-sim domain adaptation for visual control. IEEE Robotics and Automation Letters, 4(2), 1148–1155.

    Article  Google Scholar 

  • Zhao, Y., & Vela, P. A. (2020). Good feature matching: Toward accurate, robust vo/vslam with low latency. IEEE Transactions on Robotics, 36(3), 657–675.

    Article  Google Scholar 

  • Zhou, L., Luo, Z., Shen, T., Zhang, J., Zhen, M., Yao, Y., Fang, T., & Quan, L. (2020). Kfnet: Learning temporal camera relocalization using Kalman filtering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4919–4928).

  • Zhu, Y., Mottaghi, R., Kolve, E., Lim, J. J., Gupta, A., Fei-Fei, L., & Farhadi, A. (2017). Target-driven visual navigation in indoor scenes using deep reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA) (pp. 3357–3364). IEEE.

Download references

Acknowledgements

This work has been funded by Higher Education Commission (HEC), Govt. of Pakistan through two research Grants 10023/Federal/NRPU/R &D/HEC/2017 and 20-13396/NRPU/R &D/HEC/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Latif Anjum.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (mp4 95965 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Naveed, K., Anjum, M.L., Hussain, W. et al. Deep introspective SLAM: deep reinforcement learning based approach to avoid tracking failure in visual SLAM. Auton Robot 46, 705–724 (2022). https://doi.org/10.1007/s10514-022-10046-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-022-10046-9

Keywords

Navigation