Abstract
A complex outdoor environment includes variable illuminations, dynamic objects, and occlusion, making visual odometry in outdoor environments remains a challenging problem. This work trains a self-supervised pose estimation network based on the projection between adjacent frames. The pose between two adjacent frames is a 6D vector. It can be better solved with reliable pixels, instead of all the pixels. Thus, only reliable pixels are under consideration when training. This research proposed a boundary mask and inferior projection mask to eliminate the influence of unreliable pixels. Furthermore, we evaluated the proposed method on the KITTI datasets and compared it with other state-of-the-art deep learning-based visual odometry methods. The result shows that the proposed method can detect inferior projection and boundary pixels well. Moreover, it achieves higher accuracy in pose estimation.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Tripathi N, Sistu G, Yogamani S (2020) Trained trajectory based automated parking system using visual slam. pp 1–6
Xu D, Chen Y, Lin C et al (2012) Real-time dynamic gesture recognition system based on depth perception for robot navigation. In: Proceedings of the IEEE conference on robotics and biomimetics, pp 689–694
Wei C, Li A (2021) Overview of visual SLAM for mobile robots. Int J Front Eng Technol 3(7):1–7
Klein G, Murray D (2007) Parallel tracking and mapping for small AR workspaces. In: 2007 6th IEEE and ACM international symposium on mixed and augmented reality, pp 225–234
Davison AJ, Reid ID, Molton ND et al (2007) MonoSLAM: real-time single camera SLAM. IEEE Trans Pattern Anal Mach Intell (TPAMI) 29(6):1052–1067
Kang X, Li J, Fan X et al (2021) Object-Level Semantic Map Construction for Dynamic Scenes. Appl Sci 11(2):1–20
Wu M, Chen L (2015) Image recognition based on deep learning. In: 2015 Chinese Automation Congress, pp 542–546
Radwan N, Valada A, Burgard W (2018) Vlocnet++: Deep multitask learning for semantic visual localization and odometry. IEEE Robot Autom Lett 3(4):4407–4414
Prakhya SM, Bingbing L, Weisi L et al (2015) Sparse depth odometry: 3D keypoint based pose estimation from dense depth data. In: 2015 IEEE international conference on robotics and automation (ICRA), pp 4216–4223
Zhan H, Garg R, Weerasekera C et al (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 340–349
Yang N, Stumberg LV, Wang R et al (2020) D3vo: Deep depth, deep pose and deep uncertainty for monocular visual odometry. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1281–1292
Scaramuzza D, Fraundorfer F (2011) Visual odometry. IEEE Robot Autom Magz 18(4):80–92
Ullman S (1979) The interpretation of structure from motion. Proc R Soc Lond Ser B Biol Sci 203(1153):405–426
Matthies L, Maimone M, Johnson A et al (2007) Computer vision on Mars. Int J Comput Vision 75(1):67–92
Harris C, Stephens M (1988) A combined corner and edge detector. In Alvey Vis Conf 15(50):10–5244
Rosten E, Drummond T (2006) Machine learning for high-speed corner detection. In European conference on computer vision, pp 430–443
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110
Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In European conference on computer vision, pp 404–417
Rublee E, Rabaud V, Konolige K et al (2011) ORB: an efficient alternative to SIFT or SURF. In: 2011 International conference on computer vision, pp 2564–2571
Engel J, Schöps T, Cremers D (2014) LSD-SLAM: large-scale direct monocular SLAM. In: European conference on computer vision, pp 834–849
Engel J, Koltun V, Cremers D (2017) Direct sparse odometry. IEEE Trans Pattern Anal Mach Intell 40(3):611–625
Agrawal P, Carreira J, Malik J (2015) Learning to see by moving. In: Proceedings of the IEEE international conference on computer vision, pp 37–45
Wang S, Clark R, Wen H et al (2017) Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp 2043–2050
Ummenhofer B, Zhou H, Uhrig J et al (2017) Demon: Depth and motion network for learning monocular stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5038–5047
Zhou H, Ummenhofer B, Brox T (2018) Deeptam: Deep tracking and mapping. In Proceedings of the European conference on computer vision (ECCV), pp 822–838
Zhou T, Brown M, Snavely N et al (2017) Unsupervised learning of depth and ego-motion from video. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1851–1858
Godard C, Mac Aodha O, Firman M et al (2019) Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 3828–3838
Zhou S, Zhu M, Li Z et al (2021) Self-supervised monocular depth estimation with occlusion mask and edge awareness. Artif Life Robot 26(3):354–359
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Zhou, S., Yang, Z., Zhu, M. et al. Higher accuracy self-supervised visual odometry with reliable projection. Artif Life Robotics 27, 568–575 (2022). https://doi.org/10.1007/s10015-022-00766-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10015-022-00766-7