Skip to main content
Log in

Higher accuracy self-supervised visual odometry with reliable projection

  • Original Article
  • Published:
Artificial Life and Robotics Aims and scope Submit manuscript

Abstract

A complex outdoor environment includes variable illuminations, dynamic objects, and occlusion, making visual odometry in outdoor environments remains a challenging problem. This work trains a self-supervised pose estimation network based on the projection between adjacent frames. The pose between two adjacent frames is a 6D vector. It can be better solved with reliable pixels, instead of all the pixels. Thus, only reliable pixels are under consideration when training. This research proposed a boundary mask and inferior projection mask to eliminate the influence of unreliable pixels. Furthermore, we evaluated the proposed method on the KITTI datasets and compared it with other state-of-the-art deep learning-based visual odometry methods. The result shows that the proposed method can detect inferior projection and boundary pixels well. Moreover, it achieves higher accuracy in pose estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Tripathi N, Sistu G, Yogamani S (2020) Trained trajectory based automated parking system using visual slam. pp 1–6

  2. Xu D, Chen Y, Lin C et al (2012) Real-time dynamic gesture recognition system based on depth perception for robot navigation. In: Proceedings of the IEEE conference on robotics and biomimetics, pp 689–694

  3. Wei C, Li A (2021) Overview of visual SLAM for mobile robots. Int J Front Eng Technol 3(7):1–7

    Google Scholar 

  4. Klein G, Murray D (2007) Parallel tracking and mapping for small AR workspaces. In: 2007 6th IEEE and ACM international symposium on mixed and augmented reality, pp 225–234

  5. Davison AJ, Reid ID, Molton ND et al (2007) MonoSLAM: real-time single camera SLAM. IEEE Trans Pattern Anal Mach Intell (TPAMI) 29(6):1052–1067

    Article  Google Scholar 

  6. Kang X, Li J, Fan X et al (2021) Object-Level Semantic Map Construction for Dynamic Scenes. Appl Sci 11(2):1–20

    Article  Google Scholar 

  7. Wu M, Chen L (2015) Image recognition based on deep learning. In: 2015 Chinese Automation Congress, pp 542–546

  8. Radwan N, Valada A, Burgard W (2018) Vlocnet++: Deep multitask learning for semantic visual localization and odometry. IEEE Robot Autom Lett 3(4):4407–4414

    Article  Google Scholar 

  9. Prakhya SM, Bingbing L, Weisi L et al (2015) Sparse depth odometry: 3D keypoint based pose estimation from dense depth data. In: 2015 IEEE international conference on robotics and automation (ICRA), pp 4216–4223

  10. Zhan H, Garg R, Weerasekera C et al (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 340–349

  11. Yang N, Stumberg LV, Wang R et al (2020) D3vo: Deep depth, deep pose and deep uncertainty for monocular visual odometry. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1281–1292

  12. Scaramuzza D, Fraundorfer F (2011) Visual odometry. IEEE Robot Autom Magz 18(4):80–92

    Article  Google Scholar 

  13. Ullman S (1979) The interpretation of structure from motion. Proc R Soc Lond Ser B Biol Sci 203(1153):405–426

    Google Scholar 

  14. Matthies L, Maimone M, Johnson A et al (2007) Computer vision on Mars. Int J Comput Vision 75(1):67–92

    Article  Google Scholar 

  15. Harris C, Stephens M (1988) A combined corner and edge detector. In Alvey Vis Conf 15(50):10–5244

    Google Scholar 

  16. Rosten E, Drummond T (2006) Machine learning for high-speed corner detection. In European conference on computer vision, pp 430–443

  17. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110

    Article  Google Scholar 

  18. Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In European conference on computer vision, pp 404–417

  19. Rublee E, Rabaud V, Konolige K et al (2011) ORB: an efficient alternative to SIFT or SURF. In: 2011 International conference on computer vision, pp 2564–2571

  20. Engel J, Schöps T, Cremers D (2014) LSD-SLAM: large-scale direct monocular SLAM. In: European conference on computer vision, pp 834–849

  21. Engel J, Koltun V, Cremers D (2017) Direct sparse odometry. IEEE Trans Pattern Anal Mach Intell 40(3):611–625

    Article  Google Scholar 

  22. Agrawal P, Carreira J, Malik J (2015) Learning to see by moving. In: Proceedings of the IEEE international conference on computer vision, pp 37–45

  23. Wang S, Clark R, Wen H et al (2017) Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp 2043–2050

  24. Ummenhofer B, Zhou H, Uhrig J et al (2017) Demon: Depth and motion network for learning monocular stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5038–5047

  25. Zhou H, Ummenhofer B, Brox T (2018) Deeptam: Deep tracking and mapping. In Proceedings of the European conference on computer vision (ECCV), pp 822–838

  26. Zhou T, Brown M, Snavely N et al (2017) Unsupervised learning of depth and ego-motion from video. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1851–1858

  27. Godard C, Mac Aodha O, Firman M et al (2019) Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 3828–3838

  28. Zhou S, Zhu M, Li Z et al (2021) Self-supervised monocular depth estimation with occlusion mask and edge awareness. Artif Life Robot 26(3):354–359

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shi Zhou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, S., Yang, Z., Zhu, M. et al. Higher accuracy self-supervised visual odometry with reliable projection. Artif Life Robotics 27, 568–575 (2022). https://doi.org/10.1007/s10015-022-00766-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10015-022-00766-7

Keywords

Navigation