Predictive monocular odometry (PMO): What is possible without RANSAC and multiframe bundle adjustment?☆
Introduction
In today's traffic, human drivers largely rely on visual information: traffic signs, traffic lights and road markings are only some examples of crucial visual cues. This is why — though non-visual sensors, such as radar and LIDAR are also commonplace in advanced driver assistance systems (ADAS) — there is still a strong need to employ visual information for navigation. Substantial advances have been made during the recent ten years in the processing of visual data coming from stereo cameras. The obvious next step is to ask what is feasible with monocular systems — possibly an intermediate step to multi-monocular surround view.
However, visual sensing in traffic scenarios is difficult. Algorithms that, for the first time, successfully solve a problem which has been notoriously nagging the scientific community often lead in its wake to a long lasting belief that such a soon-to-be classic approach is the only correct way to tackle the problem.
Visual odometry (VO) and SLAM are core tasks in an area where a certain set of standard mainstream techniques have evolved throughout the recent 2 to 3 decades of intensive research: complex feature detectors and feature descriptors, descriptor-based matching, RANSAC-supported PnP for relative pose estimation and multiframe bundle adjustment are just some dominant key terms when discussing state-of-the-art VO/SLAM approaches. Switching from geometry-based optimization to ‘direct methods' is only one recent example of the paradigm shifts that occur from time to time.
In the present paper, we will show that a system consisting of components that are, at least to a large degree, located off the mainstream for visual odometry can lead to superior performance if the constraints of the regarded application area — here: autonomous driving and driver assistance — are considered properly. Our method named ‘predictive monocular odometry’ (PMO) achieves the highest position among other published monocular methods on the challenging KITTI benchmark [1] and even outperforms some stereo methods.
It is important to note that our results have been achieved without RANSAC, and PMO is thus in contrast to the mainstream camera-based methods which heavily rely on iterative robustification. Our results have also been achieved without multiframe bundle adjustment (MFBA) which is a very useful tool in sparse VO/VSLAM to globally optimize the egomotion estimates. This means that our results give an impression of the level of accuracy that is possible without MFBA. It also means that all results can probably still be further improved through such global optimization.
Fig. 1 shows the main concept of PMO. This paper details all components of the PMO framework. It presents the extension of work previously published in [2], [3]. The most recent extension in the presented scheme is to employ semantic segmentation by a CNN in order to robustify the scale estimation.
Section snippets
Related work
State-of-the-art approaches for visual odometry, such as those appearing in leading positions of the KITTI benchmark, can be categorized according to several characteristics. One obvious dichotomy is that between direct approaches, optimizing a photometric error (such as LSD-SLAM [4] or DSO [5]) on one hand, and indirect methods optimizing a geometric error (such as monoSLAM [6], PTAM [7] or ORB-SLAM [8]) on the other. Due to their common principle of inferring 3D motion and 3D structure from
Approach: extended propagation-based tracking (PbT)
The fundamental characteristic of the approach that we present in this paper is that it is strongly recursive. In accordance with the excellent predictability of car motion (except in rare cases such as speed bumps and pot holes), we exploit the information collected about ego-motion and the 3D structure of the environment up to frame n − 1 and use this to guide subsequent steps of keypoint matching and tracking from frame n − 1 to n. This strategy is mainly responsible for the capability of the
Experiments
We test the proposed method on the KITTI dataset [1]. We set the following parameters for the scale estimation: τs = 1 m/s, τh = 20 cm, τv = 300 km/h, τssd = 50 and τa = 20°. In order to allow thorough analysis of the results, the simulations were first carried out on all KITTI training sequences where the ground truth poses are available, thus allowing the computation of the accuracy. Beyond this absolute evaluation, we compare our results relative to three state-of-the-art monocular methods: VISO2-M
Conclusion
In this paper, we have presented PMO, a predictive monocular odometry system for automotive applications. The core of PMO is an extended propagation based tracking scheme which provides initially unscaled motion information and keypoint tracks. By applying a multi-modal ground plane estimation method which is significantly robustified by exploiting street masks computed by a CNN, we are able to compute metrically scaled motion information and also significantly improve the accuracy of the
References (31)
- et al.
Vision meets robotics: the KITTI dataset
Int. J. Robot. Res.
(2013) - et al.
Keypoint trajectory estimation using propagation based tracking
- et al.
Multimodal scale estimation for monocular visual odometry
- et al.
LSD-SLAM: Large-scale direct monocular SLAM
- et al.
Direct Sparse Odometry
(2017) - et al.
MonoSLAM: Real-Time Single Camera SLAM
Trans. Pattern Anal. Mach. Intell. (PAMI)
(2007) - et al.
Parallel Tracking and Mapping for Small AR Workspaces
- et al.
ORB-SLAM: a Versatile and Accurate Monocular SLAM System
Trans. Robot.
(2015) - et al.
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography
Commun. ACM
(1981) - et al.
How to distinguish inliers from outliers in visual odometry for high-speed automotive applications
Parallel, real-time monocular visual odometry
Fast Techniques for Monocular Visual Odometry
Stereoscan: Dense 3d reconstruction in real-time
The Statistics of Driving Sequences - And What We Can Learn from Them
Extending GKLT Tracking–Feature Tracking for Controlled Environments with Integrated Uncertainty Estimation
Cited by (28)
A dedicated benchmark for contour-based corner detection evaluation
2023, Image and Vision ComputingImage-based Localization for Self-driving Vehicles Based on Online Network Adjustment in A Dynamic Scope
2022, Proceedings of the International Joint Conference on Neural NetworksA novel translation estimation for essential matrix based stereo visual odometry
2021, Proceedings of the 2021 15th International Conference on Ubiquitous Information Management and Communication, IMCOM 2021Sensitive Detection of Target-Vehicle-Motion using Vision only
2020, IEEE Intelligent Vehicles Symposium, Proceedings
- ☆
This paper has been recommended for acceptance by Branislav Kisacanin.