Passive navigation using egomotion estimates
Introduction
Passive navigation is the ability of an autonomous agent to determine its motion with respect to the environment. In many dynamical applications, like autonomous robot navigation and car driving, the capacity for passive navigation is a prerequisite for any other navigational functionality.
The goal is to estimate the egomotion of a mobile vehicle moving in a stationary indoor man made environment, on a flat surface in order to perform on line adjustments to the current navigational path and to avoid obstacles. Psychophysical evidence shows [1] that the two main egomotion parameters, that allow living organisms to perform the above tasks, are the Heading Direction (HD) and the Time To Collision (TTC).
Although heading information can be obtained using odometers or gyros (but with an unbounded incremental error) and time to collision can be estimated by sonar or infrared sensors (but with large measurement inaccuracies), those tasks can be performed by visual sensors with a lower uncertainty.
Useful visual information can be obtained from a camera mounted on the mobile vehicle. The vehicle motion through the surrounding environment produces spatial and temporal changes in the viewed image giving rise to a 2D motion field that can be used for the reconstruction of both the vehicle motion and the 3D structure of the scene.
We consider the application context in which the viewer translates on a flat ground (floor) and can rotate only around an axis orthogonal to the ground. In this context, the heading direction is projected on the image plane in a singular point named Focus of Expansion (FOE) where the 2D motion field vanishes. The resulting 2D motion field has a radial topology with all 2D velocity vectors radiating from the FOE location. Rotations, occurring while the observer translates, will cause only a FOE location shifting without perturbing the radial shape of 2D motion field.
Several methods have been proposed to estimate egomotion parameter from 2D motion field, [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], most of which use calibrated systems, or require a preliminary decomposition of the displacement flow field into its translational and rotational components, or propose to compute only a fuzzy region containing the FOE. Though several methods [7], [16], [17] demonstrate some robustness, their efficiency is strongly dependent on the angular velocity values: optimal estimates of FOE location are obtained only when bounded rotations are permitted and the FOE is constrained to lie in the field of view. TTC is estimated in a small neighborhood of the FOE location. In real contexts a small amount of observer rotations or accidental camera vibrations, can shift the FOE location outside the field of view. To manage this situation methods based on angular velocity computation only have been proposed.
In Ref. [18] correct heading direction estimates can be obtained (independently of rotation performed by the vehicle on the planar surface) by 3D translational motion parameters only.
The method we use to estimate 3D egomotion is based on approximation of the available 2D motion field with a linear combination of 2D bases functions describing the projections on the image plane of 3D elementary motion [19]. The projection coefficients obtained with a least-square-error based method represent the 3D motion parameters describing the vehicle motion. The 3D motion parameters are used to estimate, directly, FOE location and TTC. The performances of the method are independent of angular velocity.
The motion field we provide as input to the egomotion estimation algorithm is a displacement vector field representing the correspondences among 2D features extracted in successive images of a temporal sequence and corresponding to the same 3D feature in the space. A small number of such displacement vectors on the image plane is enough to obtain useful information about egomotion parameters.
The standard method to estimate a 2D motion field as correspondences of interest features is to embody both local and global constraints about the extracted features within some search algorithm to explore the space of potential matches with the aim to extract a final set of matches. Some criterion must be defined in order to select and to distinguish between correct and incorrect sets of matches (i.e. defining a functional measuring the merit of alternative mappings). Moreover, an algorithm to explore the search space is required. Though recursive search techniques provide a correct solution, they require a lot of time to explore all possible solutions. On the other hand, optimization techniques can converge in polynomial times to an optimal solution. Robustness of matches recovered using optimization techniques depends on considered constraints and on employed search approach. Robustness of matching can be increased by including projective invariant constraints among features into a global optimization approach.
Projective invariants, popular in object recognition as useful descriptors of objects [20], are properties of the scene that remain unchanged under some group transformations. Perceptual relationships like parallelism, collinearity, are viewpoint invariant and are strong cues for matching [21], [22].
Most of the used invariants [20], [23] are derived for planar objects using geometric entities such as points, lines and conics, since in this case, there exists a plane projective transformation between object and image space. The assumption to use projective invariants in indoor scene can be considered appropriate, because indoor environments present a lot of instances of planar surfaces (derivable regions, walls or surfaces of polyhedral objects). Since in our context of passive navigation the distance camera-scene is sufficiently large to permit an optimal approximation of the whole observed scene with a plane, and, even when this condition is not verified, in most contexts of navigation into a man-made indoor environment, the scene contains a lot of planar surfaces where the local planar approximation is verified, we can successfully impose the projective invariance of cross-ratio [24], [25] of five coplanar points as global constraint in the matching process.
The projective invariance of cross-ratio of five coplanar points has been used in literature as constraint for optimal match selection in tracking algorithms, planar region detection or object recognition using probabilistic analysis [17], [25], [26], [27], [28], [29]. Actually, the performance of probabilistic approaches depends on the choice of rule for deciding whether five image points have a given cross-ratio [24], [30]. The definition of a robust decision rule (which takes into account the effects on the cross-ratio of small errors in locating the image points) and to threshold on the probabilities in the decision rule is a difficult task. An unexplored field is that to consider projective invariance constraints into a general optimization process for computation of correspondences. Our idea is to overcome the problems derived from the use of probabilistic decision rules considering many intersecting subsets of five points, obtained as combinations of available sparse features.
The correspondence problem is solved by searching for a solution in the space of all potential matches by imposing to satisfy five-order constraints [31] through a non-linear relaxation labeling approach [32], [33]. Relaxation labeling processes provide an efficient optimization tool to solve difficult constrained satisfaction problems, making use of contextual information to solve local ambiguities and achieve global consistency.
Summarizing, in our work, firstly, a 2D displacement vector field is estimated. In fact, Moravec's interest operator [34] is used to extract high variance feature points in the sequence images, and the radiometric similarity is used as unary constraint to define an initial set of potential matches; then, the geometrical invariance of cross-ratio is embodied as global constraint within a relaxation labeling approach determining the set of optimal matches useful for generating the correct 2D sparse motion field (Section 2). Finally (Section 3) the egomotion parameters are estimated from the optimal coefficients of projection of the velocity flow field in elementary motion flow fields. Experimental results (Section 4) obtained on several image sequences acquired in our laboratory with a TV camera mounted on our vehicle SAURO, are reported to show the performances of such an approach on both FOE and TTC estimation.
Section snippets
Displacement field computation
Displacement vectors are estimated only for “high” interest features which are salient points that can be more easily matched than other points. These points, generally, occur at the corners and at the edges. In our work we use the interest operator introduced by Moravec [34] to isolate points with minimal autocorrelation values. The variance among neighboring pixels in four directions (vertical, horizontal and two diagonal) is computed over a window (the smallest value is called the interest
3D motion estimation
The matches selected in the displacement field computation step are used to determine the 3D motion parameters describing the vehicle motion.
Assuming a perspective projection model in which a world point P=(X,Y,Z) projects on the image point (x,y)=f(X/Z,Y/Z), where f is the focal length, Longuet-Higgins and Prazdny [35] derived the following equations to describe the general rigid motion of an observer moving in a stationary world:
Experimental results: heading estimation
The experiments on heading direction estimation have been performed on images acquired in our laboratory with a TV camera mounted on a pan–tilt head installed on our vehicle SAURO. The goal was to make on-line adjustments to the current navigational path while the vehicle is moving in a stationary environment.
In Fig. 1 two images acquired during the experiments and the computed displacement field with the estimated FOE location are shown. These two images have been acquired under translational
Experimental results: TTC estimation
Tests on ability of the described system to perform obstacle avoidance have been also performed on image sequences acquired in our laboratory with the same TV camera mounted on our vehicle SAURO. We have performed TTC estimation on both contexts of forward translation only and translation with rotations occurring at different degrees.
The method we use to estimate 3D motion is based on computation of the projection coefficients that better describe the 2D motion field (a 2D pattern), the so
Conclusions
A robust description of the 3D motion of a vehicle moving in a stationary environment has been obtained from the 3D translational motion parameters estimated as the optimal coefficients of projection of the computed 2D motion field on the 2D elementary translational motion fields. The 2D motion field is estimated through a relaxation labeling approach searching for a set of mutually compatible feature matches by imposing the geometrical invariance of cross-ratio of any set of five coplanar
References (35)
Determining the instantaneous direction of motion from optical flow generated by a curvilinearly moving observer
CGIP
(1981)Three-dimensional object recognition from single 2D images
Artificial Intell.
(1987)- et al.
Using projective geometry to recover planar surfaces in stereovision pattern recognition
(1996) - et al.
Iterative pose estimation using coplanar feature points
Comput. Vision Image Understanding
(1996) Computational cross-ratio for computer vision
CVGIP: Image Understanding
(1994)The Perception of the Visual World
(1950)- M.E. Spetsakis, J. Aloimonos, Optimal computing of structure from motion using point correspondences in two frames,...
Relative orientation
Int. J. Comput. Vision
(1990)- G.S. Young, R. Chellappa, 3-D motion estimation using a sequence of noisy stereo images, Proceedings of the IEEE...
- J. Weng, T.S. Huang, N. Ahuja, A two step approach to optimal motion estimation and structure estimation, Proceedings...
Direct passive navigation
IEEE Trans. Pattern Anal. Machine Intell.
Passive navigation as a pattern recognition problem
IJCV
Obstacle avoidance using flow field divergence
IEEE Trans. Pattern Anal. Mach. Intell.
Estimating 3D egomotion from perspective image sequences
IEEE Trans. PAMI
The accuracy of the computation of optical flow and of the recovery of motion parameters, III
Trans. Pattern Anal. Mach. Intell.
Cited by (11)
Outlier rejection by oriented tracks to aid pose estimation from video
2006, Pattern Recognition LettersComputer Vision based Tools and Technologies for Navigational Assistance for the Visually Impaired
2022, Proceedings - International Conference on Augmented Intelligence and Sustainable Systems, ICAISS 2022A vision-based odometer for localization of omnidirectional indoor robots
2020, Sensors (Switzerland)Collision detection method using image segmentation for the visually impaired
2017, IEEE Transactions on Consumer ElectronicsImproving video-based robot self localization through outlier removal
2006, 1st Joint Emergency Preparedness and Response/Robotic and Remote Systems Topical Meeting