Elsevier

Image and Vision Computing

Volume 68, December 2017, Pages 3-13
Image and Vision Computing

Predictive monocular odometry (PMO): What is possible without RANSAC and multiframe bundle adjustment?

https://doi.org/10.1016/j.imavis.2017.08.002Get rights and content

Abstract

Visual odometry using only a monocular camera faces more algorithmic challenges than stereo odometry. We present a robust monocular visual odometry framework for automotive applications. An extended propagation-based tracking framework is proposed which yields highly accurate (unscaled) pose estimates. Scale is supplied by ground plane pose estimation employing street pixel labeling using a convolutional neural network (CNN). The proposed framework has been extensively tested on the KITTI dataset and achieves a higher rank than current published state-of-the-art monocular methods in the KITTI odometry benchmark. Unlike other VO/SLAM methods, this result is achieved without loop closing mechanism, without RANSAC and also without multiframe bundle adjustment. Thus, we challenge the common belief that robust systems can only be built using iterative robustification tools like RANSAC.

Introduction

In today's traffic, human drivers largely rely on visual information: traffic signs, traffic lights and road markings are only some examples of crucial visual cues. This is why — though non-visual sensors, such as radar and LIDAR are also commonplace in advanced driver assistance systems (ADAS) — there is still a strong need to employ visual information for navigation. Substantial advances have been made during the recent ten years in the processing of visual data coming from stereo cameras. The obvious next step is to ask what is feasible with monocular systems — possibly an intermediate step to multi-monocular surround view.

However, visual sensing in traffic scenarios is difficult. Algorithms that, for the first time, successfully solve a problem which has been notoriously nagging the scientific community often lead in its wake to a long lasting belief that such a soon-to-be classic approach is the only correct way to tackle the problem.

Visual odometry (VO) and SLAM are core tasks in an area where a certain set of standard mainstream techniques have evolved throughout the recent 2 to 3 decades of intensive research: complex feature detectors and feature descriptors, descriptor-based matching, RANSAC-supported PnP for relative pose estimation and multiframe bundle adjustment are just some dominant key terms when discussing state-of-the-art VO/SLAM approaches. Switching from geometry-based optimization to ‘direct methods' is only one recent example of the paradigm shifts that occur from time to time.

In the present paper, we will show that a system consisting of components that are, at least to a large degree, located off the mainstream for visual odometry can lead to superior performance if the constraints of the regarded application area — here: autonomous driving and driver assistance — are considered properly. Our method named ‘predictive monocular odometry’ (PMO) achieves the highest position among other published monocular methods on the challenging KITTI benchmark [1] and even outperforms some stereo methods.

It is important to note that our results have been achieved without RANSAC, and PMO is thus in contrast to the mainstream camera-based methods which heavily rely on iterative robustification. Our results have also been achieved without multiframe bundle adjustment (MFBA) which is a very useful tool in sparse VO/VSLAM to globally optimize the egomotion estimates. This means that our results give an impression of the level of accuracy that is possible without MFBA. It also means that all results can probably still be further improved through such global optimization.

Fig. 1 shows the main concept of PMO. This paper details all components of the PMO framework. It presents the extension of work previously published in [2], [3]. The most recent extension in the presented scheme is to employ semantic segmentation by a CNN in order to robustify the scale estimation.

Section snippets

Related work

State-of-the-art approaches for visual odometry, such as those appearing in leading positions of the KITTI benchmark, can be categorized according to several characteristics. One obvious dichotomy is that between direct approaches, optimizing a photometric error (such as LSD-SLAM [4] or DSO [5]) on one hand, and indirect methods optimizing a geometric error (such as monoSLAM [6], PTAM [7] or ORB-SLAM [8]) on the other. Due to their common principle of inferring 3D motion and 3D structure from

Approach: extended propagation-based tracking (PbT)

The fundamental characteristic of the approach that we present in this paper is that it is strongly recursive. In accordance with the excellent predictability of car motion (except in rare cases such as speed bumps and pot holes), we exploit the information collected about ego-motion and the 3D structure of the environment up to frame n  1 and use this to guide subsequent steps of keypoint matching and tracking from frame n  1 to n. This strategy is mainly responsible for the capability of the

Experiments

We test the proposed method on the KITTI dataset [1]. We set the following parameters for the scale estimation: τs = 1 m/s, τh = 20 cm, τv = 300 km/h, τssd = 50 and τa = 20°. In order to allow thorough analysis of the results, the simulations were first carried out on all KITTI training sequences where the ground truth poses are available, thus allowing the computation of the accuracy. Beyond this absolute evaluation, we compare our results relative to three state-of-the-art monocular methods: VISO2-M 

Conclusion

In this paper, we have presented PMO, a predictive monocular odometry system for automotive applications. The core of PMO is an extended propagation based tracking scheme which provides initially unscaled motion information and keypoint tracks. By applying a multi-modal ground plane estimation method which is significantly robustified by exploiting street masks computed by a CNN, we are able to compute metrically scaled motion information and also significantly improve the accuracy of the

References (31)

  • A. Geiger et al.

    Vision meets robotics: the KITTI dataset

    Int. J. Robot. Res.

    (2013)
  • N. Fanani et al.

    Keypoint trajectory estimation using propagation based tracking

  • N. Fanani et al.

    Multimodal scale estimation for monocular visual odometry

  • J. Engel et al.

    LSD-SLAM: Large-scale direct monocular SLAM

  • J. Engel et al.

    Direct Sparse Odometry

    (2017)
  • A.J. Davison et al.

    MonoSLAM: Real-Time Single Camera SLAM

    Trans. Pattern Anal. Mach. Intell. (PAMI)

    (2007)
  • G. Klein et al.

    Parallel Tracking and Mapping for Small AR Workspaces

  • R. Mur-Artal et al.

    ORB-SLAM: a Versatile and Accurate Monocular SLAM System

    Trans. Robot.

    (2015)
  • M.A. Fischler et al.

    Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

    Commun. ACM

    (1981)
  • M. Buczko et al.

    How to distinguish inliers from outliers in visual odometry for high-speed automotive applications

  • S. Song et al.

    Parallel, real-time monocular visual odometry

  • M.H. Mirabdollah et al.

    Fast Techniques for Monocular Visual Odometry

  • A. Geiger et al.

    Stereoscan: Dense 3d reconstruction in real-time

  • H. Bradler et al.

    The Statistics of Driving Sequences - And What We Can Learn from Them

  • M. Trummer et al.

    Extending GKLT Tracking–Feature Tracking for Controlled Environments with Integrated Uncertainty Estimation

  • Cited by (28)

    • Image-based Localization for Self-driving Vehicles Based on Online Network Adjustment in A Dynamic Scope

      2022, Proceedings of the International Joint Conference on Neural Networks
    • A novel translation estimation for essential matrix based stereo visual odometry

      2021, Proceedings of the 2021 15th International Conference on Ubiquitous Information Management and Communication, IMCOM 2021
    • Sensitive Detection of Target-Vehicle-Motion using Vision only

      2020, IEEE Intelligent Vehicles Symposium, Proceedings
    View all citing articles on Scopus

    This paper has been recommended for acceptance by Branislav Kisacanin.

    View full text