Matching scale-space features in 1D panoramas

https://doi.org/10.1016/j.cviu.2006.06.007Get rights and content

Abstract

We define a family of novel interest operators for extracting features from one-dimensional panoramic images for use in mobile robot navigation. Feature detection proceeds by applying local interest operators in the scale space of a 1D circular image formed by averaging the center scanlines of a cylindrical panorama. We demonstrate that many such features remain stable over changes in viewpoint and in the presence of noise and camera vibration, and define a feature descriptor that collects shape properties of the scale-space surface and color information from the original images. We then present a novel dynamic programming method to establish globally optimal correspondences between features in images taken from different viewpoints. Our method can handle arbitrary rotations and large numbers of missing features. It is also robust to significant changes in lighting conditions and viewing angle, and in the presence of some occlusion.

Introduction

Scale-invariant interest operators and feature detectors have received much recent interest in the computer vision and robotics literature [1], [2], [3], [4]. These methods work by computing the scale space of an image [5], [6], and finding the extrema of simple operators. Each such potential interest point, which identifies a location and a scale, is then augmented with a descriptor that encodes the image patch around this point in a manner that is invariant to local (rigid, affine, or perspective) transformations. Such descriptors can be stored and later used to identify scene locations observed from different viewpoints, with applications in object recognition, image retrieval, tracking, and robot localization.

The success of these methods inspires the work in this paper, where we apply similar ideas to extract stable features from the scale space of one-dimensional panoramic images. Instead of storing relatively few distinct features that are invariant over a wide range of scales and viewing angles, however, our work utilizes a large number of features that are only locally invariant to changes in scale and viewing angle. Robot localization can then be achieved by matching sets of features between the current frame and stored features of reference frames, taking into consideration their relative location and explicitly accounting for unmatched features.

Fig. 1 illustrates our experimental setup. It shows the robot equipped with an omnidirectional camera, a sample panoramic view, the 1D circular image formed by averaging the center scanlines, and an epipolar-plane image (EPI) [7], i.e., the evolution of the 1D image over time as the robot travels.

One-dimensional images offer several advantages over traditional 2D images. Chief among them are fast processing times and low storage requirements, enabling real-time analysis and a dense sampling of viewpoints. The reduced dimensionality also aids greatly in image matching since fewer parameters need to be estimated. Using 1D omnidirectional images for localization and navigation also presents several challenges, however.

First, for global invariance to viewpoints, the imaged scene has to lie in the plane traversed by the camera (the epipolar plane). While such images can be obtained with specialized sensors such as “strip cameras” [8], or simply by extracting a single scanline from a view taken with an omnidirectional camera, it requires that the robot travels on a planar surface, which limits the applicability to indoor environments. Furthermore, it is difficult to precisely maintain the camera’s orientation due to vibrations of the robot [9].

Instead, we form our 1D images by averaging of the center scanlines of the cylindrical view, typically subtending a vertical viewing angle of about 15°. Averaging multiple scanlines, however, also increases distance-dependent intensity changes, since the backprojection of a pixel into the scene now subtends a significant positive vertical angle (see Fig. 2). We thus trade distance-invariant intensities for robustness. A second issue is that the linear relationship between distance and scale is violated when using cylindrical, rather than planar, projection. Neither issue is a problem in practice since intensities still change smoothly with distance which in turn causes smooth changes in the scale space. We demonstrate below that we can robustly match features in the presence of such smooth changes.

Another difficulty of the 1D approach is that one-dimensional images do not carry very much information. Distinct features that can be matched reliably and uniquely over wide ranges of views are rare. A unique descriptor would have to span many pixels, increasing the chance of occlusion. We thus forego global uniqueness of features in favor of a large number of simple features. This requires a global matching technique that not only matches individual features, but also considers their spatial relation. The appeal of a scale-space approach is that interest points correspond to scene features of all sizes, ranging from small details such as chair legs to large features, such as entire walls of a room.

The aim of this paper is to investigate what can be done with 1D images alone. Clearly, one can imagine situations in which the additional information present in 2D images may be essential for disambiguation. There are also other data reduction techniques that could be employed. For real applications it may in fact be beneficial to combine 1D and 2D approaches.

The remainder of the paper is organized as follows. Section 2 discusses related work, and Section 3 presents the scale-space computation and interest point selection. Feature stability is evaluated in Section 4, and Section 5 presents our local feature descriptors and matching cost. Our global matching method and an experimental evaluation is presented in Section 6, and we conclude in Section 7.

Section snippets

Related work

There has been much recent work on invariant features in 2D images, including Lowe’s SIFT detector [1], [10], and the invariant interest points by Mikolajczyk and Schmid [2], [3]. Such features have been used for object recognition and image retrieval, as well as robot localization and navigation [11], [4], with a comparison of local image descriptors in [12].

The classic epipolar-plane image (EPI) analysis approach [7] has been applied to panoramic views by Zhu et al. [9] with the application

Scale-space analysis

The key idea of our method is to compute the scale space S(x, σ) of each omnidirectional image I(x) for a range of scales σ, and to detect locally scale-invariant interest points or “keypoints” in this space. The scale space is defined as the convolution of the image with a Gaussian G(x, σ) over a range of scales σ:S(x,σ)=I(x)G(x,σ),withG(x,σ)=12πσe-x2/(2σ2).This convolution is slightly unusual due to the fact that the omnidirectional image I “wraps around”, i.e., I(x + 2π) = I(x). In particular,

Evaluating the robustness of features using tracking

Given an image sequence with closely spaced views, a good way of measuring the robustness and stability of features is to track features from frame to frame, and to record the track length for each feature. For each feature in the current frame, we search the next frame within a small neighborhood of the feature’s current (x, σ) location. We only consider neighboring scales σ (i.e., we use a vertical search radius of ±1), since the scale of a feature cannot change very quickly. In the horizontal

Local feature matching

Recall that our candidate features are the extrema in both difference scale spaces Dσ and Dx. While many of them correspond to real physical features in the original image and persist over changes in viewpoints, others are caused by image noise or by minor variations of intensities and colors. Some of the unstable features can be identified by their local properties, in particular by a small absolute value at the extremum, or low curvature around it. We exclude these features by imposing lower

Global feature matching

In the absence of narrow occluding objects, the features visible from two different locations will have the same relative ordering. This observation, known as the ordering constraint, enables an efficient algorithm for finding the globally optimal solution to the feature matching problem. Our algorithm is based on dynamic programming (DP), and is related to DP scanline algorithms that have been used in stereo matching [23], [24], [25], [26] which also use the ordering constraint. There are

Conclusion

In this paper, we have defined interest operators for extracting stable features from the scale space of one-dimensional omnidirectional images. We have defined a local matching cost based on feature descriptors, which store color information and shape properties of the scale-space surface. We then presented a novel dynamic programming method to establish globally optimal correspondences between features in different images. Experimental results show that our method handles arbitrary rotations,

Acknowledgments

Support for this work was provided in part by the National Science Foundation under Grants IIS-0118892, IIS-9984485, EIA-9806108, and by Middlebury College.

References (27)

  • D. Scharstein et al.

    Real-time recognition of self-similar landmarks

    Image and Vision Computing

    (2001)
  • I. Cox et al.

    A maximum likelihood stereo algorithm

    Computer Vision and Image Understanding

    (1996)
  • D. Lowe, Object recognition from local scale-invariant features, in: Proceedings of the International Conference on...
  • K. Mikolajczyk, C. Schmid, Indexing based on scale invariant interest points, in: Proceedings of the International...
  • K. Mikolajczyk, C. Schmid, An affine invariant interest point detector, in: Proceedings of the European Conference on...
  • S. Se et al.

    Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks

    International Journal of Robotics Research

    (2002)
  • T. Lindeberg

    Scale-space for discrete signals

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1990)
  • A. Witkin et al.

    Signal matching through scale space

    International Journal of Computer Vision

    (1987)
  • R. Bolles, H. Baker, Epipolar-plane image analysis: a technique for analyzing motion sequences, Tech. Rep. 377, AI...
  • S. Nayar, A. Karmarkar, 360×360 mosaics, in: Proceedings of the IEEE Conference on Computer Vision and Pattern...
  • Z. Zhu, G. Xu, X. Lin, Panoramic EPI generation and analysis of video from a moving platform with vibration, in:...
  • D. Lowe

    Distinctive image features from scale-invariant keypoints

    International Journal of Computer Vision

    (2004)
  • S. Se, D. Lowe, J. Little, Global localization using distinctive visual features, in: Proceedings of the IEEE/RSJ...
  • Cited by (15)

    View all citing articles on Scopus
    View full text