Abstract
Estimating accurate depth from an RGB image in any environment is challenging task in computer vision. Recent learning based method using deep Convolutional Neural Networks (CNNs) have driven plausible appearance, but these conventional methods are not good at estimating scenes that have a pure rotation of camera, such as in-plane rolling. This movement imposes perturbations on learning-based methods because gravity direction is considered to be strong prior to CNN depth estimation (i.e., the top region of an image has a relatively large depth, whereas bottom region tends to have a small depth). To overcome this crucial weakness in depth estimation with CNN, we propose a simple but effective refining method that incorporates in-plane roll alignment using camera poses of monocular Simultaneous Localization and Mapping (SLAM). For the experiment, we used public datasets and also created our own dataset composed of mostly in-plane roll camera movements. Evaluation results on these datasets show the effectiveness of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
A rotary motion around an optical axis in camera coordinate system.
- 2.
References
Wang, J., Liu, H., Cong, L., Xiahou, Z., Wang, L.: CNN-MonoFusion: online monocular dense reconstruction using learned depth from single view. In: IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pp. 57–62. IEEE, Munich (2018)
Wang, Y., Chao, W., Garg, D., Hariharan, B., Campbell, M., Weinberger, K.: Pseudo-LiDAR from visual depth estimation: bridging the gap in 3D object detection for autonomous driving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8445–8453. IEEE (2019)
Marcu, A., Costea, D., Licăreţ, V., Pîrvu, M., Sluşanschi, E., Leordeanu, M.: SafeUAV: learning to estimate depth and safe landing areas for UAVs from synthetic data. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11130, pp. 43–58. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11012-3_4
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: International Conference on 3D Vision (3DV), pp. 11–20. IEEE (2016)
Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2002–2011. IEEE (2018)
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity invariant CNNs. In: International Conference on 3D Vision (3DV), pp. 11–20. IEEE (2017)
Mi, L., Wang, H., Tian, Y., Shavit, N.: Training-free uncertainty estimation for neural networks. arXiv preprint arXiv:1910.04858 (2019)
Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGB-D SLAM systems. In: IEEE International Conference on Intelligent Robot Systems, pp. 573–580. IEEE (2012)
Grisettiyz, G., Stachniss, C., Burgard, W.: Improving grid-based SLAM with Rao-Blackwellized particle filters by adaptive proposals and selective resampling. In: IEEE International Conference on Robotics and Automation, pp. 2432–2437. IEEE (2005)
Tateno, K., Tombari, F., Laina, I., Navab, N.: CNN-SLAM: real-time dense monocular SLAM with learned depth prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6243–6252. IEEE (2017)
Laidlow, T., Czarnowski, J., Leutenegger, S.: DeepFusion: real-time dense 3D reconstruction for monocular SLAM using single-view depth and gradient predictions. In: International Conference on Robotics and Automation, pp. 4068–4074. IEEE (2019)
Toyoda, K., Kono, M., Rekimoto, J.: Post-data augmentation to improve deep pose estimation of extreme and wild motions. arXiv preprint arXiv:1902.04250 (2019)
Kurz, D., Benhimane, S.: Gravity-aware handheld augmented reality. In: IEEE International Symposium on Mixed and Augmented Reality, pp. 111–120. IEEE (2011)
Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: Proceedings of the IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 225–234. IEEE (2007)
Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)
Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 834–849. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_54
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2017)
Forster, C., Pizzoli, M., Scaramuzza, D.: SVO: fast semi-direct monocular visual odometry. In: International Conference on Robotics and Automation, pp. 15–22. IEEE (2014)
Fischer, P., Dosovitskiy, A., Brox, T.: Image orientation estimation with convolutional networks. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 368–378. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24947-6_30
Olmschenk, G., Tang, H., Zhu, Z.: Pitch and roll camera orientation from a single 2D image using convolutional neural networks. In: 2017 14th Conference on Computer and Robot Vision, pp. 261–268. IEEE (2015)
Xian, W., Li, Z., Fisher, M., Eisenmann, J., Shechtman, E., Snavely, N.: UprightNet: geometry-aware camera orientation estimation from single images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9974–9983. IEEE (2019)
Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)
Acknowledgement
This work was partially supported by the Japan Science and Technology Agency (JST) under grant JPMJMI19B2 and JPMJCR1683.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Saito, Y., Hachiuma, R., Yamaguchi, M., Saito, H. (2020). In-Plane Rotation-Aware Monocular Depth Estimation Using SLAM. In: Ohyama, W., Jung, S. (eds) Frontiers of Computer Vision. IW-FCV 2020. Communications in Computer and Information Science, vol 1212. Springer, Singapore. https://doi.org/10.1007/978-981-15-4818-5_23
Download citation
DOI: https://doi.org/10.1007/978-981-15-4818-5_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-4817-8
Online ISBN: 978-981-15-4818-5
eBook Packages: Computer ScienceComputer Science (R0)