Skip to main content
Log in

A review on monocular tracking and mapping: from model-based to data-driven methods

  • Survey
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Visual odometry and visual simultaneous localization and mapping aid in tracking the position of a camera and mapping the surroundings using images. It is an important part of robotic perception. Tracking and mapping using a monocular camera is cost-effective, requires less calibration effort, and is easy to deploy across a wide range of applications. This paper provides an extensive review of the developments for the first two decades of the twenty-first century. Astounding results from early methods based on filtering have intrigued the community to extend these algorithms using other forms of techniques like bundle adjustment and deep learning. This article starts by introducing the basic sensor systems and analyzing the evolution of monocular tracking and mapping algorithms through bibliometric data. Then, it covers the overview of filtering and bundle adjustment methods, followed by recent advancements in methods using deep learning with the mathematical constraints applied on the networks. Finally, the popular benchmarks available for developing and evaluating these algorithms are presented along with a comparative study on a different class of algorithms. It is anticipated that this article will serve as the latest introductory tool and further ignite the interest of the community to solve current and future impediments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Nourani-Vatani, N., Roberts, J., Srinivasan, M.V.: Practical visual odometry for car-like vehicles. In: 2009 IEEE International Conference on Robotics and Automation, pp. 3551–3557 (2009). IEEE

  2. Helmick, D.M., Cheng, Y., Clouse, D.S., Matthies, L.H., Roumeliotis, S.I.: Path following using visual odometry for a mars rover in high-slip environments. In: 2004 IEEE Aerospace Conference Proceedings (IEEE Cat. No. 04TH8720), Vol. 2, pp. 772–789 (2004). IEEE

  3. Woodman, O.J.: An introduction to inertial navigation. Technical report, University of Cambridge, Computer Laboratory (2007)

  4. Jiang, W., Yin, Z.: Combining passive visual cameras and active imu sensors for persistent pedestrian tracking. J. Vis. Commun. Image Represent. 48, 419–431 (2017). https://doi.org/10.1016/j.jvcir.2017.03.015

    Article  Google Scholar 

  5. Aqel, M.O., Marhaban, M.H., Saripan, M.I., Ismail, N.B.: Review of visual odometry: types, approaches, challenges, and applications. Springerplus 5(1), 1–26 (2016). https://doi.org/10.1186/s40064-016-3573-7

    Article  Google Scholar 

  6. Debeunne, C., Vivet, D.: A review of visual-lidar fusion based simultaneous localization and mapping. Sensors 20(7), 2068 (2020). https://doi.org/10.3390/s20072068

    Article  Google Scholar 

  7. Zaffar, M., Ehsan, S., Stolkin, R., Maier, K.M.: Sensors, slam and long-term autonomy: a review. In: 2018 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), pp. 285–290 (2018). IEEE

  8. Yousif, K., Bab-Hadiashar, A., Hoseinnezhad, R.: An overview to visual odometry and visual slam: applications to mobile robotics. Intell. Ind. Syst. 1, 289–311 (2015)

    Article  Google Scholar 

  9. Younes, G., Asmar, D.C., Shammas, E.: A survey on non-filter-based monocular visual slam systems. arXiv:1607.00470 (2016)

  10. Younes, G., Asmar, D., Shammas, E., Zelek, J.: Keyframe-based monocular slam: design, survey, and future directions. Robot. Auton. Syst. 98, 67–88 (2017). https://doi.org/10.1016/j.robot.2017.09.010

    Article  Google Scholar 

  11. Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: Monoslam: real-time single camera slam. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007). https://doi.org/10.1109/TPAMI.2007.1049

    Article  Google Scholar 

  12. Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment-a modern synthesis. In: International Workshop on Vision Algorithms, pp. 298–372 (1999). Springer

  13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2012)

    Article  Google Scholar 

  14. Durrant-Whyte, H., Bailey, T.: Simultaneous localization and mapping: part I. IEEE Robot. Autom. Mag. 13(2), 99–110 (2006). https://doi.org/10.1109/MRA.2006.1638022

    Article  Google Scholar 

  15. Nistér, D., Naroditsky, O., Bergen, J.R.: Visual odometry. In; Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. Vol. 1, (2004)

  16. Wei, Y., Kang, L., Yang, B., Wu, L.: Applications of structure from motion: a survey. J. Zhejiang Univ. Sci. C 14, 486–494 (2013). https://doi.org/10.1631/jzus.CIDE1302

    Article  Google Scholar 

  17. Song, S., Chandraker, M.: Robust scale estimation in real-time monocular sfm for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1566–1573 (2014)

  18. Zhou, D., Dai, Y., Li, H.: Ground-plane-based absolute scale estimation for monocular visual odometry. IEEE Trans. Intell. Transp. Syst. 21(2), 791–802 (2019). https://doi.org/10.1109/TITS.2019.2900330

    Article  Google Scholar 

  19. He, M., Zhu, C., Huang, Q., Ren, B., Liu, J.: A review of monocular visual odometry. Vis. Comput. 36(5), 1053–1065 (2020). https://doi.org/10.1007/s00371-019-01714-6

    Article  Google Scholar 

  20. Milz, S., Arbeiter, G., Witt, C., Abdallah, B., Yogamani, S.: Visual slam for automated driving: Exploring the applications of deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 247–257 (2018)

  21. Mahmoud, N., Grasa, Ó.G., Nicolau, S.A., Doignon, C., Soler, L., Marescaux, J., Montiel, J.: On-patient see-through augmented reality based on visual slam. Int. J. Comput. Assist. Radiol. Surg. 12(1), 1–11 (2017)

    Article  Google Scholar 

  22. Yu, K., Ahn, J., Lee, J., Kim, M., Han, J.: Collaborative slam and ar-guided navigation for floor layout inspection. Vis. Comput. 36(10), 2051–2063 (2020)

    Article  Google Scholar 

  23. Marchand, É., Courty, N.: Controlling a camera in a virtual environment. Vis. Comput. 18(1), 1–19 (2002)

    Article  MATH  Google Scholar 

  24. Grasa, O.G., Bernal, E., Casado, S., Gil, I., Montiel, J.: Visual slam for handheld monocular endoscope. IEEE Trans. Med. Imaging 33(1), 135–146 (2013). https://doi.org/10.1109/TMI.2013.2282997

    Article  Google Scholar 

  25. Liu, X., Sinha, A., Ishii, M., Hager, G.D., Reiter, A., Taylor, R.H., Unberath, M.: Dense depth estimation in monocular endoscopy with self-supervised learning methods. IEEE Trans. Med. Imaging 39(5), 1438–1447 (2019). https://doi.org/10.1109/TMI.2019.2950936

    Article  Google Scholar 

  26. Hartley, R.I.: In defense of the eight-point algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 19(6), 580–593 (1997). https://doi.org/10.1109/34.601246

    Article  Google Scholar 

  27. Zhang, Z.: Determining the epipolar geometry and its uncertainty: a review. Int. J. Comput. Vision 27(2), 161–195 (1998). https://doi.org/10.1023/A:1007941100561

    Article  Google Scholar 

  28. Zhu, R., Yang, M., Liu, W., Song, R., Yan, B., Xiao, Z.: Deepavo: Efficient pose refining with feature distilling for deep visual odometry. Neurocomputing 467, 22–35 (2022). https://doi.org/10.1016/j.neucom.2021.09.029

    Article  Google Scholar 

  29. Bailey, T., Durrant-Whyte, H.: Simultaneous localization and mapping (slam): part II. IEEE Robot. Autom. Mag. 13(3), 108–117 (2006). https://doi.org/10.1109/MRA.2006.1678144

    Article  Google Scholar 

  30. Scaramuzza, D., Fraundorfer, F.: Visual odometry [tutorial]. IEEE Robot. Autom. Mag. 18(4), 80–92 (2011). https://doi.org/10.1109/MRA.2011.943233

    Article  Google Scholar 

  31. Fraundorfer, F., Scaramuzza, D.: Visual odometry: part II: matching, robustness, optimization, and applications. IEEE Robot. Autom. Mag. 19(2), 78–90 (2012). https://doi.org/10.1109/MRA.2012.2182810

    Article  Google Scholar 

  32. Taketomi, T., Uchiyama, H., Ikeda, S.: Visual slam algorithms: a survey from 2010 to 2016. IPSJ Trans. Comput. Vis. Appl. 9, 1–11 (2017). https://doi.org/10.1186/s41074-017-0027-2

    Article  Google Scholar 

  33. Li, R., Wang, S., Gu, D.: Ongoing evolution of visual slam from geometry to deep learning: challenges and opportunities. Cogn. Comput. 10, 875–889 (2018). https://doi.org/10.1007/s12559-018-9591-8

    Article  Google Scholar 

  34. Taheri, H., Xia, Z.C.: Slam; definition and evolution. Eng. Appl. Artif. Intell. 97, 104032 (2021). https://doi.org/10.1016/j.engappai.2020.104032

    Article  Google Scholar 

  35. Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J., Reid, I., Leonard, J.: Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans. Robot. 32, 1309–1332 (2016). https://doi.org/10.1109/TRO.2016.2624754

    Article  Google Scholar 

  36. Saputra, M.R.U., Markham, A., Trigoni, A.: Visual slam and structure from motion in dynamic environments. ACM Comput. Surv. 51, 1–36 (2018). https://doi.org/10.1145/3177853

    Article  Google Scholar 

  37. Pan, J., Li, L., Yamaguchi, H., Hasegawa, K., Thufail, F.I., Tanaka, S., et al.: 3d reconstruction of borobudur reliefs from 2d monocular photographs based on soft-edge enhanced deep learning. ISPRS J. Photogramm. Remote. Sens. 183, 439–450 (2022). https://doi.org/10.1016/j.isprsjprs.2021.11.007

    Article  Google Scholar 

  38. Davison, A.J.: Real-time simultaneous localisation and mapping with a single camera. In: IEEE International Conference on Computer Vision, vol. 3, pp. 1403–1403 (2003). IEEE Computer Society

  39. Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31(5), 1147–1163 (2015). https://doi.org/10.1109/TRO.2015.2463671

    Article  Google Scholar 

  40. Liu, Y., Chen, X., Gu, T., Zhang, Y., Xing, G.: Real-time camera pose estimation via line tracking. Vis. Comput. 34(6), 899–909 (2018)

    Article  Google Scholar 

  41. Maity, S., Saha, A., Bhowmick, B.: Edge slam: Edge points based monocular visual slam. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2408–2417 (2017)

  42. Dong, Y., Wang, S., Yue, J., Chen, C., He, S., Wang, H., He, B.: A novel texture-less object oriented visual slam system. IEEE Trans. Intell. Transp. Syst. (2019)

  43. Yang, S., Scherer, S.: Cubeslam: monocular 3-d object slam. IEEE Trans. Robot. 35(4), 925–938 (2019). https://doi.org/10.1109/TRO.2019.2909168

    Article  Google Scholar 

  44. Tuytelaars, T., Mikolajczyk, K.: Local Invariant Feature Detectors: a Survey. Now Publishers Inc, (2008)

  45. Li, Y., Wang, S., Tian, Q., Ding, X.: A survey of recent advances in visual feature detection. Neurocomputing 149, 736–751 (2015). https://doi.org/10.1016/j.neucom.2014.08.003

    Article  Google Scholar 

  46. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986). https://doi.org/10.1109/TPAMI.1986.4767851

    Article  Google Scholar 

  47. Harris, C.G., Stephens, M., et al.: A combined corner and edge detector. In: Alvey Vision Conference, vol. 15, pp. 10–5244 (1988). Citeseer

  48. Shi, J., et al.: Good features to track. In: 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600 (1994). IEEE

  49. Piniés, P., Tardós, J.D.: Large-scale slam building conditionally independent local maps: application to monocular vision. IEEE Trans. Rob. 24(5), 1094–1106 (2008). https://doi.org/10.1109/TRO.2008.2004637

    Article  Google Scholar 

  50. Kwon, J., Lee, K.M.: Monocular slam with locally planar landmarks via geometric rao-blackwellized particle filtering on lie groups. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1522–1529 (2010). IEEE

  51. Clemente, L.A., Davison, A.J., Reid, I.D., Neira, J., Tardós, J.D.: Mapping large loops with a single hand-held camera. In: Robotics: Science and Systems, vol. 2 (2007)

  52. Holmes, S.A., Klein, G., Murray, D.W.: An o (n\(^2\)) square root unscented kalman filter for visual simultaneous localization and mapping. IEEE Trans. Pattern Anal. Mach. Intell. 31(7), 1251–1263 (2008). https://doi.org/10.1109/TPAMI.2008.189

    Article  Google Scholar 

  53. Celik, K., Chung, S.-J., Clausman, M., Somani, A.K.: Monocular vision slam for indoor aerial vehicles. In: 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1566–1573 (2009). IEEE

  54. Liu, J., Liu, D., Cheng, J., Tang, Y.: Conditional simultaneous localization and mapping: a robust visual slam system. Neurocomputing 145, 269–284 (2014). https://doi.org/10.1016/j.neucom.2014.05.034

    Article  Google Scholar 

  55. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94

    Article  Google Scholar 

  56. Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: European Conference on Computer Vision, pp. 430–443 (2006). Springer

  57. Klein, G., Murray, D.: Parallel tracking and mapping for small ar workspaces. In: 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 225–234 (2007). IEEE

  58. Herrera, D.C., Kim, K., Kannala, J., Pulli, K., Heikkilä, J.: Dt-slam: Deferred triangulation for robust slam. In: 2014 2nd International Conference on 3D Vision, vol. 1, pp. 609–616 (2014). IEEE

  59. Pumarola, A., Vakhitov, A., Agudo, A., Sanfeliu, A., Moreno-Noguer, F.: Pl-slam: Real-time monocular visual slam with points and lines. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 4503–4508 (2017). IEEE

  60. Ma, J., Jiang, X., Fan, A., Jiang, J., Yan, J.: Image matching from handcrafted to deep features: a survey. Int. J. Comput. Vis. 129(1), 23–79 (2021). https://doi.org/10.1007/s11263-020-01359-2

    Article  MathSciNet  MATH  Google Scholar 

  61. Chen, L., Rottensteiner, F., Heipke, C.: Feature detection and description for image matching: from hand-crafted design to deep learning. Geo-Spatial Inf. Sci. 24(1), 58–74 (2021). https://doi.org/10.1080/10095020.2020.1843376

    Article  Google Scholar 

  62. Martins, P.F., Costelha, H., Bento, L.C., Neves, C.: Monocular camera calibration for autonomous driving-a comparative study. In: 2020 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), pp. 306–311 (2020). IEEE

  63. Nistér, D.: An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 756–770 (2004). https://doi.org/10.1109/TPAMI.2004.17

    Article  Google Scholar 

  64. Armangué, X., Salvi, J.: Overall view regarding fundamental matrix estimation. Image Vis. Comput. 21(2), 205–220 (2003). https://doi.org/10.1016/S0262-8856(02)00154-3

    Article  Google Scholar 

  65. Lui, V., Drummond, T.: An iterative 5-pt algorithm for fast and robust essential matrix estimation. IJCV 74(2), 117–136 (2007)

    Google Scholar 

  66. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981). https://doi.org/10.1145/358669.358692

    Article  MathSciNet  Google Scholar 

  67. Torr, P.H., Zisserman, A.: Mlesac: a new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78(1), 138–156 (2000). https://doi.org/10.1006/cviu.1999.0832

    Article  Google Scholar 

  68. Yan, K., Zhao, R., Liu, E., Ma, Y.: A robust fundamental matrix estimation method based on epipolar geometric error criterion. IEEE Access 7, 147523–147533 (2019). https://doi.org/10.1109/ACCESS.2019.2946387

    Article  Google Scholar 

  69. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, New York (2003)

    MATH  Google Scholar 

  70. Forster, C., Pizzoli, M., Scaramuzza, D.: Svo: Fast semi-direct monocular visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 15–22 (2014). IEEE

  71. Huang, J., Liu, R., Zhang, J., Chen, S.: Fast initialization method for monocular slam based on indoor model. In: 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 2360–2365 (2017). IEEE

  72. Yang, Y., Xiong, J., She, X., Liu, C., Yang, C., Li, J.: Passive initialization method based on motion characteristics for monocular slam. Complexity 2019, 8176489–1817648911 (2019). https://doi.org/10.1155/2019/8176489

    Article  Google Scholar 

  73. Strasdat, H., Montiel, J., Davison, A.J.: Real-time monocular slam: Why filter? In: 2010 IEEE International Conference on Robotics and Automation, pp. 2657–2664 (2010). IEEE

  74. Ho, T.S., Fai, Y.C., Ming, E.S.L.: Simultaneous localization and mapping survey based on filtering techniques. In: 2015 10th Asian Control Conference (ASCC), pp. 1–6 (2015). IEEE

  75. Huang, S., Dissanayake, G.: Convergence and consistency analysis for extended Kalman filter based slam. IEEE Trans. Robot. 23(5), 1036–1049 (2007). https://doi.org/10.1109/TRO.2007.903811

    Article  Google Scholar 

  76. Guivant, J.E., Nebot, E.M.: Optimization of the simultaneous localization and map-building algorithm for real-time implementation. IEEE Trans. Robot. Autom. 17(3), 242–257 (2001). https://doi.org/10.1109/70.938382

    Article  Google Scholar 

  77. Dissanayake, G., Williams, S.B., Durrant-Whyte, H., Bailey, T.: Map management for efficient simultaneous localization and mapping (slam). Auton. Robot. 12(3), 267–286 (2002). https://doi.org/10.1023/A:1015217631658

    Article  MATH  Google Scholar 

  78. Paz, L.M., Piniés, P., Tardós, J.D., Neira, J.: Large-scale 6-dof slam with stereo-in-hand. IEEE Trans. Robot. 24(5), 946–957 (2008)

    Article  Google Scholar 

  79. Mahon, I., Williams, S.B., Pizarro, O., Johnson-Roberson, M.: Efficient view-based slam using visual loop closures. IEEE Trans. Robot. 24(5), 1002–1014 (2008). https://doi.org/10.1109/TRO.2008.2004888

    Article  Google Scholar 

  80. Cadena, C., Neira, J.: Slam in o (logn) with the combined Kalman-information filter. Robot. Auton. Syst. 58(11), 1207–1219 (2010). https://doi.org/10.1016/j.robot.2010.08.003

    Article  Google Scholar 

  81. He, B., Liu, Y., Dong, D., Shen, Y., Yan, T., Nian, R.: Simultaneous localization and mapping with iterative sparse extended information filter for autonomous vehicles. Sensors 15(8), 19852–19879 (2015). https://doi.org/10.3390/s150819852

    Article  Google Scholar 

  82. Wan, E.A., Van Der Merwe, R.: The unscented Kalman filter for nonlinear estimation. In: Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No. 00EX373), pp. 153–158 (2000). IEEE

  83. Civera, J., Davison, A.J., Montiel, J.M.: Inverse depth parametrization for monocular slam. IEEE Trans. Robot. 24(5), 932–945 (2008). https://doi.org/10.1109/TRO.2008.2003276

    Article  Google Scholar 

  84. Montemerlo, M., Thrun, S., Koller, D., Wegbreit, B., et al.: Fastslam: A factored solution to the simultaneous localization and mapping problem. Aaai/iaai Vol. 593598 (2002)

  85. Montemerlo, M., Thrun, S., Koller, D., Wegbreit, B., et al.: Fastslam 2.0: An improved particle filtering algorithm for simultaneous localization and mapping that provably converges. In: IJCAI, vol. 3, pp. 1151–1156 (2003)

  86. Pupilli, M., Calway, A.: Real-time camera tracking using a particle filter. In: BMVC (2005)

  87. Hoseini, S.A., Kabiri, P.: A novel feature-based approach for indoor monocular slam. Electronics 7(11), 305 (2018). https://doi.org/10.3390/electronics7110305

    Article  Google Scholar 

  88. Angeli, A., Doncieux, S., Meyer, J.-A., Filliat, D.: Real-time visual loop-closure detection. In: 2008 IEEE International Conference on Robotics and Automation, pp. 1842–1847 (2008). IEEE

  89. Lee, S.-H.: Real-time camera tracking using a particle filter combined with unscented kalman filters. J. Electron. Imaging 23(1), 013029 (2014). https://doi.org/10.1117/1.JEI.23.1.013029

    Article  MathSciNet  Google Scholar 

  90. Zhou, H., Zou, D., Pei, L., Ying, R., Liu, P., Yu, W.: Structslam: Visual slam with building structure lines. IEEE Trans. Veh. Technol. 64(4), 1364–1375 (2015). https://doi.org/10.1109/TVT.2015.2388780

    Article  Google Scholar 

  91. Tseng, K.-K., Li, J., Chang, Y., Yung, K., Chan, C., Hsu, C.-Y.: A new architecture for simultaneous localization and mapping: an application of a planetary rover. Enterprise Inf. Syst. 15(8), 1162–1178 (2021). https://doi.org/10.1080/17517575.2019.1698772

    Article  Google Scholar 

  92. Gao, X.-S., Hou, X.-R., Tang, J., Cheng, H.-F.: Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 25(8), 930–943 (2003). https://doi.org/10.1109/TPAMI.2003.1217599

    Article  Google Scholar 

  93. Lepetit, V., Moreno-Noguer, F., Fua, P.: Epnp: an accurate o (n) solution to the pnp problem. Int. J. Comput. Vis. 81(2), 155 (2009). https://doi.org/10.1007/s11263-008-0152-6

    Article  Google Scholar 

  94. Persson, M., Nordberg, K.: Lambda twist: An accurate fast robust perspective three point (p3p) solver. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 318–332 (2018)

  95. Blochliger, F., Fehr, M., Dymczyk, M., Schneider, T., Siegwart, R.: Topomap: Topological mapping and navigation based on visual slam maps. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 3818–3825 (2018). IEEE

  96. Yang, A., Luo, Y., Chen, L., Xu, Y.: Survey of 3d map in slam: localization and navigation. In: Advanced Computational Methods in Life System Modeling and Simulation, pp. 410–420. Springer (2017)

  97. Cai, Q., Zhang, L., Wu, Y., Yu, W., Hu, D.: A pose-only solution to visual reconstruction and navigation. arXiv preprint arXiv:2103.01530 (2021)

  98. Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: Dtam: Dense tracking and mapping in real-time. In: 2011 International Conference on Computer Vision, pp. 2320–2327 (2011). IEEE

  99. Engel, J., Schöps, T., Cremers, D.: Lsd-slam: Large-scale direct monocular slam. In: European Conference on Computer Vision, pp. 834–849 (2014). Springer

  100. Forster, C., Zhang, Z., Gassner, M., Werlberger, M., Scaramuzza, D.: Svo: Semidirect visual odometry for monocular and multicamera systems. IEEE Trans. Rob. 33(2), 249–265 (2016). https://doi.org/10.1109/TRO.2016.2623335

    Article  Google Scholar 

  101. Concha, A., Civera, J.: Dpptam: Dense piecewise planar tracking and mapping from a monocular sequence. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5686–5693 (2015). IEEE

  102. Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2017). https://doi.org/10.1109/TPAMI.2017.2658577

    Article  Google Scholar 

  103. Zubizarreta, J., Aguinaga, I., Montiel, J.M.M.: Direct sparse mapping. IEEE Trans. Robot. 36(4), 1363–1370 (2020). https://doi.org/10.1109/TRO.2020.2991614

    Article  Google Scholar 

  104. Roberts, R., Nguyen, H., Krishnamurthi, N., Balch, T.: Memory-based learning for visual odometry. In: 2008 IEEE International Conference on Robotics and Automation, pp. 47–52 (2008). IEEE

  105. Guizilini, V., Ramos, F.: Semi-parametric learning for visual odometry. Tnt. J. Robot. Res. 32(5), 526–546 (2013). https://doi.org/10.1177/2F0278364912472245

    Article  Google Scholar 

  106. Konda, K.R., Memisevic, R.: Learning visual odometry with a convolutional network. In: VISAPP (1), pp. 486–490 (2015)

  107. DeTone, D., Malisiewicz, T., Rabinovich, A.: Deep image homography estimation. arXiv preprint arXiv:1606.03798 (2016)

  108. Costante, G., Mancini, M., Valigi, P., Ciarfuglia, T.A.: Exploring representation learning with cnns for frame-to-frame ego-motion estimation. IEEE Robot. Autom. Lett. 1(1), 18–25 (2015). https://doi.org/10.1109/TITS.2019.2952159

    Article  Google Scholar 

  109. Muller, P., Savakis, A.: Flowdometry: An optical flow and deep learning based approach to visual odometry. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 624–631 (2017). IEEE

  110. Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., Brox, T.: Flownet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)

  111. Ummenhofer, B., Zhou, H., Uhrig, J., Mayer, N., Ilg, E., Dosovitskiy, A., Brox, T.: Demon: Depth and motion network for learning monocular stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5038–5047 (2017)

  112. Costante, G., Ciarfuglia, T.A.: Ls-vo: Learning dense optical subspace for robust visual odometry estimation. IEEE Robot. Autom. Lett. 3(3), 1735–1742 (2018). https://doi.org/10.1109/LRA.2018.2803211

    Article  Google Scholar 

  113. Pandey, T., Pena, D., Byrne, J., Moloney, D.: Leveraging deep learning for visual odometry using optical flow. Sensors 21(4), 1313 (2021). https://doi.org/10.3390/s21041313

    Article  Google Scholar 

  114. Wang, H., Ban, X., Ding, F., Xiao, Y., Zhou, J.: Monocular vo based on deep siamese convolutional neural network. Complexity (2020). https://doi.org/10.1155/2020/6367273

  115. Saputra, M.R.U., de Gusmao, P.P., Wang, S., Markham, A., Trigoni, N.: Learning monocular visual odometry through geometry-aware curriculum learning. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3549–3555 (2019). IEEE

  116. Wang, S., Clark, R., Wen, H., Trigoni, N.: End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. Int. J. Robot. Res. 37(4–5), 513–542 (2018). https://doi.org/10.1177/2F0278364917734298

    Article  Google Scholar 

  117. Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5974–5983 (2017)

  118. Gadipudi, N., Elamvazuthi, I., Lu, C.-K., Paramasivam, S., Su, S.: Wpo-net: Windowed pose optimization network for monocular visual odometry estimation. Sensors 21(23), 8155 (2021). https://doi.org/10.3390/s21238155

    Article  Google Scholar 

  119. Wang, X., Zhang, H.: Deep monocular visual odometry for ground vehicle. IEEE Access 8, 175220–175229 (2020). https://doi.org/10.1109/ACCESS.2020.3025557

    Article  Google Scholar 

  120. Saputra, M.R.U., de Gusmao, P.P., Almalioglu, Y., Markham, A., Trigoni, N.: Distilling knowledge from a deep pose regressor network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 263–272 (2019)

  121. Koumis, A.S., Preiss, J.A., Sukhatme, G.S.: Estimating metric scale visual odometry from videos using 3d convolutional networks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 265–272 (2019). IEEE

  122. Zhai, G., Liu, L., Zhang, L., Liu, Y., Jiang, Y.: Poseconvgru: a monocular approach for visual ego-motion estimation by learning. Pattern Recogn. 102, 107187 (2020). https://doi.org/10.1016/j.patcog.2019.107187

    Article  Google Scholar 

  123. Kuo, X.-Y., Liu, C., Lin, K.-C., Lee, C.-Y.: Dynamic attention-based visual odometry. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 36–37 (2020)

  124. Gadipudi, N., Elamvazuthi, I., Lu, C.-K., Paramasivam, S., Su, S.: Lightweight spatial attentive network for vehicular visual odometry estimation in urban environments. Neural Computing and Applications, 1–14 (2022). https://doi.org/10.1007/s00521-022-07484-y

  125. Xue, F., Wang, X., Li, S., Wang, Q., Wang, J., Zha, H.: Beyond tracking: Selecting memory and refining poses for deep visual odometry. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8575–8583 (2019)

  126. Xu, S., Xiong, H., Wu, Q., Wang, Z.: Attention-based long-term modeling for deep visual odometry. In: 2021 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–8. IEEE

  127. Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048 (2016)

  128. Garg, R., Bg, V.K., Carneiro, G., Reid, I.: Unsupervised cnn for single view depth estimation: geometry to the rescue. In: European Conference on Computer Vision, pp. 740–756 (2016). Springer

  129. Kendall, A., Grimes, M., Cipolla, R.: Posenet: A convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2938–2946 (2015)

  130. Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. arXiv preprint arXiv:1506.02025 (2015)

  131. Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1851–1858 (2017)

  132. Prasad, V., Bhowmick, B.: Sfmlearner++: Learning monocular depth and ego-motion using meaningful geometric constraints. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2087–2096 (2019). IEEE

  133. Yin, Z., Shi, J.: Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1983–1992 (2018)

  134. Zou, Y., Luo, Z., Huang, J.-B.: Df-net: Unsupervised joint learning of depth and flow using cross-task consistency. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 36–53 (2018)

  135. Sun, Q., Tang, Y., Zhao, C.: Cycle-sfm: Joint self-supervised learning of depth and camera motion from monocular image sequences. Chaos: Interdiscip. J. Nonlinear Sci. 29(12), 123102 (2019). https://doi.org/10.1063/1.5120605

    Article  Google Scholar 

  136. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014)

  137. Almalioglu, Y., Saputra, M.R.U., de Gusmao, P.P., Markham, A., Trigoni, N.: Ganvo: Unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 5474–5480 (2019). IEEE

  138. Li, S., Xue, F., Wang, X., Yan, Z., Zha, H.: Sequential adversarial learning for self-supervised deep visual odometry. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2851–2860 (2019)

  139. Zhao, C., Yen, G.G., Sun, Q., Zhang, C., Tang, Y.: Masked gan for unsupervised depth and pose prediction with scale consistency. IEEE Trans. Neural Netw. Learn. Syst. (2020). https://doi.org/10.1109/TNNLS.2020.3044181

    Article  Google Scholar 

  140. Bian, J.-W., Li, Z., Wang, N., Zhan, H., Shen, C., Cheng, M.-M., Reid, I.: Unsupervised scale-consistent depth and ego-motion learning from monocular video. arXiv preprint arXiv:1908.10553 (2019)

  141. Zou, Y., Ji, P., Tran, Q.-H., Huang, J.-B., Chandraker, M.: Learning monocular visual odometry via self-supervised long-term modeling. In: Proceedings of 16th European Conference Computer Vision–ECCV 2020, Glasgow, UK, August 23–28, 2020, Part XIV 16, pp. 710–727 (2020). Springer

  142. Lu, Y., Xu, X., Ding, M., Lu, Z., Xiang, T.: A global occlusion-aware approach to self-supervised monocular visual odometry. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2260–2268 (2021)

  143. Liu, Y., Wang, H., Wang, J., Wang, X.: Unsupervised monocular visual odometry based on confidence evaluation. IEEE Trans. Intell. Transp. Syst. (2021). https://doi.org/10.1109/TITS.2021.3053412

    Article  Google Scholar 

  144. Sarlin, P.-E., Unagar, A., Larsson, M., Germain, H., Toft, C., Larsson, V., Pollefeys, M., Lepetit, V., Hammarstrand, L., Kahl, F., et al.: Back to the feature: Learning robust camera localization from pixels to pose. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3247–3257 (2021)

  145. Zhang, J., Su, Q., Liu, P., Xu, C., Chen, Y.: Unsupervised learning of monocular depth and ego-motion with space-temporal-centroid loss. Int. J. Mach. Learn. Cybern. 11(3), 615–627 (2020). https://doi.org/10.1007/s13042-019-01020-6

    Article  Google Scholar 

  146. Liu, Q., Li, R., Hu, H., Gu, D.: Using unsupervised deep learning technique for monocular visual odometry. Ieee Access 7, 18076–18088 (2019). https://doi.org/10.1109/ACCESS.2019.2896988

    Article  Google Scholar 

  147. Wang, A., Fang, Z., Gao, Y., Tan, S., Wang, S., Ma, S., Hwang, J.-N.: Adversarial learning for joint optimization of depth and ego-motion. IEEE Trans. Image Process. 29, 4130–4142 (2020). https://doi.org/10.1109/TIP.2020.2968751

    Article  MATH  Google Scholar 

  148. Ding, Y., Barath, D., Yang, J., Kukelova, Z.: Relative pose from a calibrated and an uncalibrated smartphone image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12766–12775 (2022)

  149. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning library. In: NeurIPS (2019)

  150. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., Zhang, X.: Tensorflow: A system for large-scale machine learning. In: OSDI (2016)

  151. Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., Zhang, Z.: Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274 (2015)

  152. Tian, C., Fei, L., Zheng, W., Xu, Y., Zuo, W., Lin, C.-W.: Deep learning on image denoising: an overview. Neural Netw. (2020). https://doi.org/10.1016/j.neunet.2020.07.025

    Article  MATH  Google Scholar 

  153. Tao, X., Gao, H., Wang, Y., Shen, X., Wang, J., Jia, J.: Scale-recurrent network for deep image deblurring. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8174–8182 (2018)

  154. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38, 295–307 (2016). https://doi.org/10.1109/TPAMI.2015.2439281

    Article  Google Scholar 

  155. Yi, K., Trulls, E., Lepetit, V., Fua, P.: Lift: Learned invariant feature transform. arXiv:1603.09114 (2016)

  156. DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: Self-supervised interest point detection and description. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 337–33712 (2018)

  157. Ono, Y., Trulls, E., Fua, P., Yi, K.: Lf-net: Learning local features from images. In: NeurIPS (2018)

  158. Altwaijry, H., Veit, A., Belongie, S.J.: Learning to detect and match keypoints with deep architectures. In: BMVC (2016)

  159. Nguyen, T., Chen, S.W., Shivakumar, S.S., Taylor, C.J., Kumar, V.: Unsupervised deep homography: a fast and robust homography estimation model. IEEE Robot. Autom. Lett. 3(3), 2346–2353 (2018). https://doi.org/10.1109/LRA.2018.2809549

    Article  Google Scholar 

  160. Ranftl, R., Koltun, V.: Deep fundamental matrix estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 284–299 (2018)

  161. Balntas, V., Li, S., Prisacariu, V.: Relocnet: Continuous metric learning relocalisation using neural nets. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 751–767 (2018)

  162. Valada, A., Radwan, N., Burgard, W.: Deep auxiliary learning for visual localization and odometry. 2018 IEEE International Conference on Robotics and Automation (ICRA), 6939–6946 (2018)

  163. Radwan, N., Valada, A., Burgard, W.: Vlocnet++: deep multitask learning for semantic visual localization and odometry. IEEE Robot. Autom. Lett. 3, 4407–4414 (2018). https://doi.org/10.1109/LRA.2018.2869640

    Article  Google Scholar 

  164. Brachmann, E., Krull, A., Nowozin, S., Shotton, J., Michel, F., Gumhold, S., Rother, C.: Dsac - differentiable ransac for camera localization. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2492–2500 (2017)

  165. Brachmann, E., Rother, C.: Learning less is more - 6d camera localization via 3d surface regression. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4654–4662 (2018)

  166. Brachmann, E., Rother, C.: Expert sample consensus applied to camera re-localization. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 7524–7533 (2019)

  167. Barath, D., Cavalli, L., Pollefeys, M.: Learning to find good models in ransac. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15744–15753 (2022)

  168. Yin, X., Wang, X., Du, X., Chen, Q.: Scale recovery for monocular visual odometry using depth estimated with deep convolutional neural fields. 2017 IEEE International Conference on Computer Vision (ICCV), 5871–5879 (2017)

  169. Sünderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., Milford, M.: Place recognition with convnet landmarks: Viewpoint-robust, condition-robust, training-free. In: Robotics: Science and Systems (2015)

  170. Merrill, N., Huang, G.: Lightweight unsupervised deep loop closure. arXiv:1805.07703 (2018)

  171. Memon, A.R., Wang, H., Hussain, A.: Loop closure detection using supervised and unsupervised deep neural networks for monocular slam systems. Robot. Auton. Syst. 126, 103470 (2020). https://doi.org/10.1016/j.robot.2020.103470

    Article  Google Scholar 

  172. Clark, R., Bloesch, M., Czarnowski, J., Leutenegger, S., Davison, A.: Ls-net: Learning to solve nonlinear least squares for monocular stereo. arXiv:1809.02966 (2018)

  173. Tang, C., Tan, P.: Ba-net: Dense bundle adjustment network. arXiv:1806.04807 (2018)

  174. Zhou, H., Ummenhofer, B., Brox, T.: Deeptam: deep tracking and mapping with convolutional neural networks. Int. J. Comput. Vis. 128(3), 756–769 (2020). https://doi.org/10.1007/s11263-019-01221-0

    Article  Google Scholar 

  175. Tiwari, L., Ji, P., Tran, Q.-H., Zhuang, B., Anand, S., Chandraker, M.: Pseudo rgb-d for self-improving monocular slam and depth prediction. In: European Conference on Computer Vision, pp. 437–455 (2020). Springer

  176. Loo, S.Y., Amiri, A.J., Mashohor, S., Tang, S.H., Zhang, H.: Cnn-svo: Improving the mapping in semi-direct visual odometry using single-image depth prediction. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 5218–5223 (2019). IEEE

  177. Cheng, J., Wang, Z., Zhou, H., Li, L., Yao, J.: Dm-slam: a feature-based slam system for rigid dynamic scenes. ISPRS Int. J. Geo Inf. 9(4), 202 (2020). https://doi.org/10.3390/ijgi9040202

    Article  Google Scholar 

  178. Yang, N., Stumberg, L.v., Wang, R., Cremers, D.: D3vo: Deep depth, deep pose and deep uncertainty for monocular visual odometry. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1281–1292 (2020)

  179. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 32, 1231–1237 (2013). https://doi.org/10.1177/2F0278364913491297

    Article  Google Scholar 

  180. Blanco-Claraco, J.-L., Moreno, F.A., González, J.: The málaga urban dataset: High-rate stereo and lidar in a realistic urban scenario. Int. J. Robot. Res. 33, 207–214 (2014). https://doi.org/10.1177/2F0278364913507326

    Article  Google Scholar 

  181. Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of rgb-d slam systems. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 573–580 (2012)

  182. Maddern, W., Pascoe, G., Linegar, C., Newman, P.: 1 year, 1000 km: The oxford robotcar dataset. Int. J. Robot. Res. 36, 15–30 (2017). https://doi.org/10.1177/2F0278364916679498

    Article  Google Scholar 

  183. Carlevaris-Bianco, N., Ushani, A.K., Eustice, R.: University of michigan north campus long-term vision and lidar dataset. Int. J. Robot. Res. 35, 1023–1035 (2016). https://doi.org/10.1177/2F0278364915614638

    Article  Google Scholar 

  184. Burri, M., Nikolic, J., Gohl, P., Schneider, T., Rehder, J., Omari, S., Achtelik, M., Siegwart, R.: The Euroc micro aerial vehicle datasets. Int. J. Robot. Res. 35, 1157–1163 (2016). https://doi.org/10.1177/2F0278364915620033

    Article  Google Scholar 

  185. Majdik, A., Till, C., Scaramuzza, D.: The zurich urban micro aerial vehicle dataset. Int. J. Robot. Res. 36, 269–273 (2017). https://doi.org/10.1177/2F0278364917702237

    Article  Google Scholar 

  186. Smith, M., Baldwin, I., Churchill, W., Paul, R., Newman, P.: The new college vision and laser data set. Int. J. Robot. Res. 28, 595–599 (2009). https://doi.org/10.1177/2F0278364909103911

    Article  Google Scholar 

  187. Huang, A.S., Antone, M.E., Olson, E., Fletcher, L., Moore, D., Teller, S., Leonard, J.: A high-rate, heterogeneous data set from the darpa urban challenge. Int. J. Robot. Res. 29, 1595–1601 (2010). https://doi.org/10.1177/2F0278364910384295

    Article  Google Scholar 

  188. Pandey, G., McBride, J., Eustice, R.: Ford campus vision and lidar data set. Int. J. Robot. Res. 30, 1543–1552 (2011)

    Article  Google Scholar 

  189. Engel, J., Usenko, V., Cremers, D.: A photometrically calibrated benchmark for monocular visual odometry. arXiv:1607.02555 (2016). https://doi.org/10.1177/2F0278364911400640

  190. Dosovitskiy, A., Ros, G., Codevilla, F., López, A., Koltun, V.: Carla: An open urban driving simulator. arXiv:1711.03938 (2017)

  191. Li, W., Saeedi, S., McCormac, J., Clark, R., Tzoumanikas, D., Ye, Q., Huang, Y., Tang, R., Leutenegger, S.: Interiornet: Mega-scale multi-sensor photo-realistic indoor scenes dataset. arXiv:1809.00716 (2018)

  192. Kirsanov, P., Gaskarov, A., Konokhov, F., Sofiiuk, K., Vorontsova, A., Slinko, I., Zhukov, D., Bykov, S., Barinova, O., Konushin, A.: Discoman: Dataset of indoor scenes for odometry, mapping and navigation. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2470–2477 (2019)

  193. Wang, W., Zhu, D., Wang, X., Hu, Y., Qiu, Y., Wang, C., Kapoor, A., Scherer, S.: Tartanair: A dataset to push the limits of visual slam. 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 4909–4916 (2020)

  194. Shah, S., Dey, D., Lovett, C., Kapoor, A.: Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In: FSR (2017)

  195. Richter, S.R., AlHaija, H.A., Koltun, V.: Enhancing photorealism enhancement. arXiv preprint arXiv:2105.04619 (2021)

  196. Behley, J., Garbade, M., Milioto, A., Quenzel, J., Behnke, S., Stachniss, C., Gall, J.: Semantickitti: A dataset for semantic scene understanding of lidar sequences. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9297–9307 (2019)

  197. Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020)

  198. Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., Caine, B., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454 (2020)

  199. Geyer, J., Kassahun, Y., Mahmudi, M., Ricou, X., Durgesh, R., Chung, A.S., Hauswald, L., Pham, V.H., Mühlegg, M., Dorn, S., et al.: A2d2: Audi autonomous driving dataset. arXiv preprint arXiv:2004.06320 (2020)

Download references

Acknowledgements

The authors are grateful to the sponsors who provided YUTP Grant (015LC0-243) for this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Irraivan Elamvazuthi.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gadipudi, N., Elamvazuthi, I., Izhar, L.I. et al. A review on monocular tracking and mapping: from model-based to data-driven methods. Vis Comput 39, 5897–5924 (2023). https://doi.org/10.1007/s00371-022-02702-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02702-z

Keywords

Navigation