Skip to main content

In-Plane Rotation-Aware Monocular Depth Estimation Using SLAM

  • Conference paper
  • First Online:
Frontiers of Computer Vision (IW-FCV 2020)

Abstract

Estimating accurate depth from an RGB image in any environment is challenging task in computer vision. Recent learning based method using deep Convolutional Neural Networks (CNNs) have driven plausible appearance, but these conventional methods are not good at estimating scenes that have a pure rotation of camera, such as in-plane rolling. This movement imposes perturbations on learning-based methods because gravity direction is considered to be strong prior to CNN depth estimation (i.e., the top region of an image has a relatively large depth, whereas bottom region tends to have a small depth). To overcome this crucial weakness in depth estimation with CNN, we propose a simple but effective refining method that incorporates in-plane roll alignment using camera poses of monocular Simultaneous Localization and Mapping (SLAM). For the experiment, we used public datasets and also created our own dataset composed of mostly in-plane roll camera movements. Evaluation results on these datasets show the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    A rotary motion around an optical axis in camera coordinate system.

  2. 2.

    https://github.com/NetEaseAI-CVLab/CNN-MonoFusion.

References

  1. Wang, J., Liu, H., Cong, L., Xiahou, Z., Wang, L.: CNN-MonoFusion: online monocular dense reconstruction using learned depth from single view. In: IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pp. 57–62. IEEE, Munich (2018)

    Google Scholar 

  2. Wang, Y., Chao, W., Garg, D., Hariharan, B., Campbell, M., Weinberger, K.: Pseudo-LiDAR from visual depth estimation: bridging the gap in 3D object detection for autonomous driving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8445–8453. IEEE (2019)

    Google Scholar 

  3. Marcu, A., Costea, D., Licăreţ, V., Pîrvu, M., Sluşanschi, E., Leordeanu, M.: SafeUAV: learning to estimate depth and safe landing areas for UAVs from synthetic data. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11130, pp. 43–58. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11012-3_4

    Chapter  Google Scholar 

  4. Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: International Conference on 3D Vision (3DV), pp. 11–20. IEEE (2016)

    Google Scholar 

  5. Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2002–2011. IEEE (2018)

    Google Scholar 

  6. Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54

    Chapter  Google Scholar 

  7. Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity invariant CNNs. In: International Conference on 3D Vision (3DV), pp. 11–20. IEEE (2017)

    Google Scholar 

  8. Mi, L., Wang, H., Tian, Y., Shavit, N.: Training-free uncertainty estimation for neural networks. arXiv preprint arXiv:1910.04858 (2019)

  9. Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGB-D SLAM systems. In: IEEE International Conference on Intelligent Robot Systems, pp. 573–580. IEEE (2012)

    Google Scholar 

  10. Grisettiyz, G., Stachniss, C., Burgard, W.: Improving grid-based SLAM with Rao-Blackwellized particle filters by adaptive proposals and selective resampling. In: IEEE International Conference on Robotics and Automation, pp. 2432–2437. IEEE (2005)

    Google Scholar 

  11. Tateno, K., Tombari, F., Laina, I., Navab, N.: CNN-SLAM: real-time dense monocular SLAM with learned depth prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6243–6252. IEEE (2017)

    Google Scholar 

  12. Laidlow, T., Czarnowski, J., Leutenegger, S.: DeepFusion: real-time dense 3D reconstruction for monocular SLAM using single-view depth and gradient predictions. In: International Conference on Robotics and Automation, pp. 4068–4074. IEEE (2019)

    Google Scholar 

  13. Toyoda, K., Kono, M., Rekimoto, J.: Post-data augmentation to improve deep pose estimation of extreme and wild motions. arXiv preprint arXiv:1902.04250 (2019)

  14. Kurz, D., Benhimane, S.: Gravity-aware handheld augmented reality. In: IEEE International Symposium on Mixed and Augmented Reality, pp. 111–120. IEEE (2011)

    Google Scholar 

  15. Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: Proceedings of the IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 225–234. IEEE (2007)

    Google Scholar 

  16. Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)

    Article  Google Scholar 

  17. Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 834–849. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_54

    Chapter  Google Scholar 

  18. Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2017)

    Article  Google Scholar 

  19. Forster, C., Pizzoli, M., Scaramuzza, D.: SVO: fast semi-direct monocular visual odometry. In: International Conference on Robotics and Automation, pp. 15–22. IEEE (2014)

    Google Scholar 

  20. Fischer, P., Dosovitskiy, A., Brox, T.: Image orientation estimation with convolutional networks. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 368–378. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24947-6_30

    Chapter  Google Scholar 

  21. Olmschenk, G., Tang, H., Zhu, Z.: Pitch and roll camera orientation from a single 2D image using convolutional neural networks. In: 2017 14th Conference on Computer and Robot Vision, pp. 261–268. IEEE (2015)

    Google Scholar 

  22. Xian, W., Li, Z., Fisher, M., Eisenmann, J., Shechtman, E., Snavely, N.: UprightNet: geometry-aware camera orientation estimation from single images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9974–9983. IEEE (2019)

    Google Scholar 

  23. Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2015)

    Article  Google Scholar 

  24. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)

    Google Scholar 

Download references

Acknowledgement

This work was partially supported by the Japan Science and Technology Agency (JST) under grant JPMJMI19B2 and JPMJCR1683.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yuki Saito , Ryo Hachiuma , Masahiro Yamaguchi or Hideo Saito .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Saito, Y., Hachiuma, R., Yamaguchi, M., Saito, H. (2020). In-Plane Rotation-Aware Monocular Depth Estimation Using SLAM. In: Ohyama, W., Jung, S. (eds) Frontiers of Computer Vision. IW-FCV 2020. Communications in Computer and Information Science, vol 1212. Springer, Singapore. https://doi.org/10.1007/978-981-15-4818-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-4818-5_23

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-4817-8

  • Online ISBN: 978-981-15-4818-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics