Passive navigation using egomotion estimates

https://doi.org/10.1016/S0262-8856(99)00082-7Get rights and content

Abstract

The goal of this work is to propose a method to solve the problem of passive navigation with visual means. Passive navigation is the ability of an autonomous agent to determine its motion with respect to the environment. The two main egomotion parameters allowing performing passive navigation are the heading direction and the time to collision with the environment. A lot of approaches have been proposed in literature in order to estimate the above parameters, most of which work well only if the motion is a predominant forward translation and small amounts of noise are present in the input data.

The method we propose is a two-state approach: matching of features extracted from 2D images of a sequence at different times and egomotion parameter computation. Both algorithms are based on optimization approaches minimizing appropriate energy functions. The novelty of the proposed approach is to formulate the matching energy function in order to englobe invariant cues of the scene. The matching stage recovers correspondences between sparse high interest feature points of two successive images useful to perform the second stage of egomotion parameter estimation. Experimental results obtained in real context show the robustness of the method.

Introduction

Passive navigation is the ability of an autonomous agent to determine its motion with respect to the environment. In many dynamical applications, like autonomous robot navigation and car driving, the capacity for passive navigation is a prerequisite for any other navigational functionality.

The goal is to estimate the egomotion of a mobile vehicle moving in a stationary indoor man made environment, on a flat surface in order to perform on line adjustments to the current navigational path and to avoid obstacles. Psychophysical evidence shows [1] that the two main egomotion parameters, that allow living organisms to perform the above tasks, are the Heading Direction (HD) and the Time To Collision (TTC).

Although heading information can be obtained using odometers or gyros (but with an unbounded incremental error) and time to collision can be estimated by sonar or infrared sensors (but with large measurement inaccuracies), those tasks can be performed by visual sensors with a lower uncertainty.

Useful visual information can be obtained from a camera mounted on the mobile vehicle. The vehicle motion through the surrounding environment produces spatial and temporal changes in the viewed image giving rise to a 2D motion field that can be used for the reconstruction of both the vehicle motion and the 3D structure of the scene.

We consider the application context in which the viewer translates on a flat ground (floor) and can rotate only around an axis orthogonal to the ground. In this context, the heading direction is projected on the image plane in a singular point named Focus of Expansion (FOE) where the 2D motion field vanishes. The resulting 2D motion field has a radial topology with all 2D velocity vectors radiating from the FOE location. Rotations, occurring while the observer translates, will cause only a FOE location shifting without perturbing the radial shape of 2D motion field.

Several methods have been proposed to estimate egomotion parameter from 2D motion field, [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], most of which use calibrated systems, or require a preliminary decomposition of the displacement flow field into its translational and rotational components, or propose to compute only a fuzzy region containing the FOE. Though several methods [7], [16], [17] demonstrate some robustness, their efficiency is strongly dependent on the angular velocity values: optimal estimates of FOE location are obtained only when bounded rotations are permitted and the FOE is constrained to lie in the field of view. TTC is estimated in a small neighborhood of the FOE location. In real contexts a small amount of observer rotations or accidental camera vibrations, can shift the FOE location outside the field of view. To manage this situation methods based on angular velocity computation only have been proposed.

In Ref. [18] correct heading direction estimates can be obtained (independently of rotation performed by the vehicle on the planar surface) by 3D translational motion parameters only.

The method we use to estimate 3D egomotion is based on approximation of the available 2D motion field with a linear combination of 2D bases functions describing the projections on the image plane of 3D elementary motion [19]. The projection coefficients obtained with a least-square-error based method represent the 3D motion parameters describing the vehicle motion. The 3D motion parameters are used to estimate, directly, FOE location and TTC. The performances of the method are independent of angular velocity.

The motion field we provide as input to the egomotion estimation algorithm is a displacement vector field representing the correspondences among 2D features extracted in successive images of a temporal sequence and corresponding to the same 3D feature in the space. A small number of such displacement vectors on the image plane is enough to obtain useful information about egomotion parameters.

The standard method to estimate a 2D motion field as correspondences of interest features is to embody both local and global constraints about the extracted features within some search algorithm to explore the space of potential matches with the aim to extract a final set of matches. Some criterion must be defined in order to select and to distinguish between correct and incorrect sets of matches (i.e. defining a functional measuring the merit of alternative mappings). Moreover, an algorithm to explore the search space is required. Though recursive search techniques provide a correct solution, they require a lot of time to explore all possible solutions. On the other hand, optimization techniques can converge in polynomial times to an optimal solution. Robustness of matches recovered using optimization techniques depends on considered constraints and on employed search approach. Robustness of matching can be increased by including projective invariant constraints among features into a global optimization approach.

Projective invariants, popular in object recognition as useful descriptors of objects [20], are properties of the scene that remain unchanged under some group transformations. Perceptual relationships like parallelism, collinearity, are viewpoint invariant and are strong cues for matching [21], [22].

Most of the used invariants [20], [23] are derived for planar objects using geometric entities such as points, lines and conics, since in this case, there exists a plane projective transformation between object and image space. The assumption to use projective invariants in indoor scene can be considered appropriate, because indoor environments present a lot of instances of planar surfaces (derivable regions, walls or surfaces of polyhedral objects). Since in our context of passive navigation the distance camera-scene is sufficiently large to permit an optimal approximation of the whole observed scene with a plane, and, even when this condition is not verified, in most contexts of navigation into a man-made indoor environment, the scene contains a lot of planar surfaces where the local planar approximation is verified, we can successfully impose the projective invariance of cross-ratio [24], [25] of five coplanar points as global constraint in the matching process.

The projective invariance of cross-ratio of five coplanar points has been used in literature as constraint for optimal match selection in tracking algorithms, planar region detection or object recognition using probabilistic analysis [17], [25], [26], [27], [28], [29]. Actually, the performance of probabilistic approaches depends on the choice of rule for deciding whether five image points have a given cross-ratio [24], [30]. The definition of a robust decision rule (which takes into account the effects on the cross-ratio of small errors in locating the image points) and to threshold on the probabilities in the decision rule is a difficult task. An unexplored field is that to consider projective invariance constraints into a general optimization process for computation of correspondences. Our idea is to overcome the problems derived from the use of probabilistic decision rules considering many intersecting subsets of five points, obtained as combinations of available sparse features.

The correspondence problem is solved by searching for a solution in the space of all potential matches by imposing to satisfy five-order constraints [31] through a non-linear relaxation labeling approach [32], [33]. Relaxation labeling processes provide an efficient optimization tool to solve difficult constrained satisfaction problems, making use of contextual information to solve local ambiguities and achieve global consistency.

Summarizing, in our work, firstly, a 2D displacement vector field is estimated. In fact, Moravec's interest operator [34] is used to extract high variance feature points in the sequence images, and the radiometric similarity is used as unary constraint to define an initial set of potential matches; then, the geometrical invariance of cross-ratio is embodied as global constraint within a relaxation labeling approach determining the set of optimal matches useful for generating the correct 2D sparse motion field (Section 2). Finally (Section 3) the egomotion parameters are estimated from the optimal coefficients of projection of the velocity flow field in elementary motion flow fields. Experimental results (Section 4) obtained on several image sequences acquired in our laboratory with a TV camera mounted on our vehicle SAURO, are reported to show the performances of such an approach on both FOE and TTC estimation.

Section snippets

Displacement field computation

Displacement vectors are estimated only for “high” interest features which are salient points that can be more easily matched than other points. These points, generally, occur at the corners and at the edges. In our work we use the interest operator introduced by Moravec [34] to isolate points with minimal autocorrelation values. The variance among neighboring pixels in four directions (vertical, horizontal and two diagonal) is computed over a window (the smallest value is called the interest

3D motion estimation

The matches selected in the displacement field computation step are used to determine the 3D motion parameters describing the vehicle motion.

Assuming a perspective projection model in which a world point P=(X,Y,Z) projects on the image point (x,y)=f(X/Z,Y/Z), where f is the focal length, Longuet-Higgins and Prazdny [35] derived the following equations to describe the general rigid motion of an observer moving in a stationary world:u=Tx−xTzZ(x,y)−xyRx+(1+x2)Ry−yRzv=Ty−yTzZ(x,y)−(1+y2)Rx+xyRy+xRz

Experimental results: heading estimation

The experiments on heading direction estimation have been performed on images acquired in our laboratory with a TV camera mounted on a pan–tilt head installed on our vehicle SAURO. The goal was to make on-line adjustments to the current navigational path while the vehicle is moving in a stationary environment.

In Fig. 1 two images acquired during the experiments and the computed displacement field with the estimated FOE location are shown. These two images have been acquired under translational

Experimental results: TTC estimation

Tests on ability of the described system to perform obstacle avoidance have been also performed on image sequences acquired in our laboratory with the same TV camera mounted on our vehicle SAURO. We have performed TTC estimation on both contexts of forward translation only and translation with rotations occurring at different degrees.

The method we use to estimate 3D motion is based on computation of the projection coefficients that better describe the 2D motion field (a 2D pattern), the so

Conclusions

A robust description of the 3D motion of a vehicle moving in a stationary environment has been obtained from the 3D translational motion parameters estimated as the optimal coefficients of projection of the computed 2D motion field on the 2D elementary translational motion fields. The 2D motion field is estimated through a relaxation labeling approach searching for a set of mutually compatible feature matches by imposing the geometrical invariance of cross-ratio of any set of five coplanar

References (35)

  • J. Aloimonos, C.M. Brown, Direct processing of curvilinear sensor motion from a sequence of perspective images,...
  • B.K.P. Horn, E.J. Weldon, Computationally efficient methods of recovering translational motion, Proceedings of the...
  • S. Negahdaripour et al.

    Direct passive navigation

    IEEE Trans. Pattern Anal. Machine Intell.

    (1987)
  • C. Fermuller

    Passive navigation as a pattern recognition problem

    IJCV

    (1995)
  • R.C. Nelson et al.

    Obstacle avoidance using flow field divergence

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1989)
  • W. Burger et al.

    Estimating 3D egomotion from perspective image sequences

    IEEE Trans. PAMI

    (1990)
  • E. De Milcheli et al.

    The accuracy of the computation of optical flow and of the recovery of motion parameters, III

    Trans. Pattern Anal. Mach. Intell.

    (1993)
  • Cited by (11)

    View all citing articles on Scopus
    View full text