Matching scale-space features in 1D panoramas
Introduction
Scale-invariant interest operators and feature detectors have received much recent interest in the computer vision and robotics literature [1], [2], [3], [4]. These methods work by computing the scale space of an image [5], [6], and finding the extrema of simple operators. Each such potential interest point, which identifies a location and a scale, is then augmented with a descriptor that encodes the image patch around this point in a manner that is invariant to local (rigid, affine, or perspective) transformations. Such descriptors can be stored and later used to identify scene locations observed from different viewpoints, with applications in object recognition, image retrieval, tracking, and robot localization.
The success of these methods inspires the work in this paper, where we apply similar ideas to extract stable features from the scale space of one-dimensional panoramic images. Instead of storing relatively few distinct features that are invariant over a wide range of scales and viewing angles, however, our work utilizes a large number of features that are only locally invariant to changes in scale and viewing angle. Robot localization can then be achieved by matching sets of features between the current frame and stored features of reference frames, taking into consideration their relative location and explicitly accounting for unmatched features.
Fig. 1 illustrates our experimental setup. It shows the robot equipped with an omnidirectional camera, a sample panoramic view, the 1D circular image formed by averaging the center scanlines, and an epipolar-plane image (EPI) [7], i.e., the evolution of the 1D image over time as the robot travels.
One-dimensional images offer several advantages over traditional 2D images. Chief among them are fast processing times and low storage requirements, enabling real-time analysis and a dense sampling of viewpoints. The reduced dimensionality also aids greatly in image matching since fewer parameters need to be estimated. Using 1D omnidirectional images for localization and navigation also presents several challenges, however.
First, for global invariance to viewpoints, the imaged scene has to lie in the plane traversed by the camera (the epipolar plane). While such images can be obtained with specialized sensors such as “strip cameras” [8], or simply by extracting a single scanline from a view taken with an omnidirectional camera, it requires that the robot travels on a planar surface, which limits the applicability to indoor environments. Furthermore, it is difficult to precisely maintain the camera’s orientation due to vibrations of the robot [9].
Instead, we form our 1D images by averaging of the center scanlines of the cylindrical view, typically subtending a vertical viewing angle of about 15°. Averaging multiple scanlines, however, also increases distance-dependent intensity changes, since the backprojection of a pixel into the scene now subtends a significant positive vertical angle (see Fig. 2). We thus trade distance-invariant intensities for robustness. A second issue is that the linear relationship between distance and scale is violated when using cylindrical, rather than planar, projection. Neither issue is a problem in practice since intensities still change smoothly with distance which in turn causes smooth changes in the scale space. We demonstrate below that we can robustly match features in the presence of such smooth changes.
Another difficulty of the 1D approach is that one-dimensional images do not carry very much information. Distinct features that can be matched reliably and uniquely over wide ranges of views are rare. A unique descriptor would have to span many pixels, increasing the chance of occlusion. We thus forego global uniqueness of features in favor of a large number of simple features. This requires a global matching technique that not only matches individual features, but also considers their spatial relation. The appeal of a scale-space approach is that interest points correspond to scene features of all sizes, ranging from small details such as chair legs to large features, such as entire walls of a room.
The aim of this paper is to investigate what can be done with 1D images alone. Clearly, one can imagine situations in which the additional information present in 2D images may be essential for disambiguation. There are also other data reduction techniques that could be employed. For real applications it may in fact be beneficial to combine 1D and 2D approaches.
The remainder of the paper is organized as follows. Section 2 discusses related work, and Section 3 presents the scale-space computation and interest point selection. Feature stability is evaluated in Section 4, and Section 5 presents our local feature descriptors and matching cost. Our global matching method and an experimental evaluation is presented in Section 6, and we conclude in Section 7.
Section snippets
Related work
There has been much recent work on invariant features in 2D images, including Lowe’s SIFT detector [1], [10], and the invariant interest points by Mikolajczyk and Schmid [2], [3]. Such features have been used for object recognition and image retrieval, as well as robot localization and navigation [11], [4], with a comparison of local image descriptors in [12].
The classic epipolar-plane image (EPI) analysis approach [7] has been applied to panoramic views by Zhu et al. [9] with the application
Scale-space analysis
The key idea of our method is to compute the scale space S(x, σ) of each omnidirectional image I(x) for a range of scales σ, and to detect locally scale-invariant interest points or “keypoints” in this space. The scale space is defined as the convolution of the image with a Gaussian G(x, σ) over a range of scales σ:This convolution is slightly unusual due to the fact that the omnidirectional image I “wraps around”, i.e., I(x + 2π) = I(x). In particular,
Evaluating the robustness of features using tracking
Given an image sequence with closely spaced views, a good way of measuring the robustness and stability of features is to track features from frame to frame, and to record the track length for each feature. For each feature in the current frame, we search the next frame within a small neighborhood of the feature’s current (x, σ) location. We only consider neighboring scales σ (i.e., we use a vertical search radius of ±1), since the scale of a feature cannot change very quickly. In the horizontal
Local feature matching
Recall that our candidate features are the extrema in both difference scale spaces Dσ and Dx. While many of them correspond to real physical features in the original image and persist over changes in viewpoints, others are caused by image noise or by minor variations of intensities and colors. Some of the unstable features can be identified by their local properties, in particular by a small absolute value at the extremum, or low curvature around it. We exclude these features by imposing lower
Global feature matching
In the absence of narrow occluding objects, the features visible from two different locations will have the same relative ordering. This observation, known as the ordering constraint, enables an efficient algorithm for finding the globally optimal solution to the feature matching problem. Our algorithm is based on dynamic programming (DP), and is related to DP scanline algorithms that have been used in stereo matching [23], [24], [25], [26] which also use the ordering constraint. There are
Conclusion
In this paper, we have defined interest operators for extracting stable features from the scale space of one-dimensional omnidirectional images. We have defined a local matching cost based on feature descriptors, which store color information and shape properties of the scale-space surface. We then presented a novel dynamic programming method to establish globally optimal correspondences between features in different images. Experimental results show that our method handles arbitrary rotations,
Acknowledgments
Support for this work was provided in part by the National Science Foundation under Grants IIS-0118892, IIS-9984485, EIA-9806108, and by Middlebury College.
References (27)
- et al.
Real-time recognition of self-similar landmarks
Image and Vision Computing
(2001) - et al.
A maximum likelihood stereo algorithm
Computer Vision and Image Understanding
(1996) - D. Lowe, Object recognition from local scale-invariant features, in: Proceedings of the International Conference on...
- K. Mikolajczyk, C. Schmid, Indexing based on scale invariant interest points, in: Proceedings of the International...
- K. Mikolajczyk, C. Schmid, An affine invariant interest point detector, in: Proceedings of the European Conference on...
- et al.
Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks
International Journal of Robotics Research
(2002) Scale-space for discrete signals
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1990)- et al.
Signal matching through scale space
International Journal of Computer Vision
(1987) - R. Bolles, H. Baker, Epipolar-plane image analysis: a technique for analyzing motion sequences, Tech. Rep. 377, AI...
- S. Nayar, A. Karmarkar, 360×360 mosaics, in: Proceedings of the IEEE Conference on Computer Vision and Pattern...
Distinctive image features from scale-invariant keypoints
International Journal of Computer Vision
Cited by (15)
Local visual homing by warping of two-dimensional images
2009, Robotics and Autonomous SystemsCitation Excerpt :Most approaches can be assigned to one of the following three classes. Correspondence methods attempt to establish relations between local features in the two images, transform the resulting correspondence vectors into movement directions, and combine these in an overall home vector ([7,18,10,1], overview: [17]). In another class of methods, the snapshot location is reached by a descent in image distances (DID) between snapshot and current view [19,11,14,12].
A novel robot visual homing method based on SIFT features
2015, Sensors (Switzerland)Panorama construction from multi-view cameras in outdoor scenes
2015, Intelligent Systems Reference LibraryA robot visual homing algorithm based on reduced landmarks
2014, Jiqiren/RobotA robot navigation algorithm based on sparse landmarks
2014, Proceedings - 2014 6th International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2014