Keypoint descriptor matching with context-based orientation estimation☆
Introduction
Keypoints extracted from digital images have been adopted with good results as primitive parts in many computer vision tasks, such as recognition [1], tracking [2] and 3D reconstruction [3]. The detection and extraction of meaningful image regions, named keypoints or image features, are usually the first step of these methodologies. Numerical vectors that embody the image region properties are successively computed to compare the keypoints found according to the particular task.
Different feature detectors have been proposed during the last decade invariant to affine transformations or scale and rotation only, including, but not limited to, corners and blobs. The reader may refer to [4] for a general overview.
After the keypoint is located, a meaningful descriptor vector to embody the characteristic properties of the keypoint support region (i.e. its neighborhood) is computed. Different descriptors have been developed, which can be divided mainly into two categories: distribution-based descriptors and banks of filters. In general, while the former give better results, the latter provides more compact descriptors. Banks of filters include complex filters, color moments, the local jet of the keypoint, differential operators and Haar wavelet coefficients. Refer to [5] for more details.
Distribution-based descriptors, also named histogram-based descriptors, divide the keypoint region, also called feature patch, into different areas and compute specific histograms related to some image properties for each area. The final descriptor is given by the ordered concatenation of these histograms. The rank and the census transforms [6], which consider binary comparisons of the intensity of central pixel against its neighborhood, are the precursors of the histogram-based descriptors. In particular, the CS-LBP [7] descriptor can be considered an extension of this kind of approach. The spin image descriptor, the shape context and the geometric blur and the more recent DAISY, BRIEF, BRISK and FREAK descriptors (see [5], [8]) should be mentioned.
One of the most popular descriptors based on histograms is surely the SIFT (Scale Invariant Feature Transform) [9], which is a 3D histogram of gradient orientations on a Cartesian grid. SIFT has been extended in various ways since its first introduction. The PCA-SIFT descriptor [10] increases the robustness of the descriptor and decreases its length by applying PCA (Principal Component Analysis), RIFT (Rotation Invariant Feature Transform) [11] is a ring-based rotational invariant version, while GLOH (Gradient Local Orientation Histogram) [5] combines a log-polar grid with PCA and SURF [12] is an efficient discrete SIFT variant. Recently, RootSIFT [13] improves upon SIFT by replacing the Euclidean distance with the Bhattacharyya distance after the normalization of the descriptor vector with the Manhattan norm instead of the conventional Euclidean norm. Overlapping regions using multiple support regions combined by intensity order pooling are used by MROGH (Multi Support Region Order Based Gradient Histogram) [14]. Furthermore, LIOP (Local Intensity Order Pattern) [15] uses the intensity order pooling and the relative order of neighbor pixels to define the histogram.
Over the last few years, machine learning techniques have been applied to remove the correlation between the descriptor elements and to reduce the dimension [10], [16], as well as different histogram distances to improve the matches [17], [18].
Different methodologies for evaluating feature descriptors and detectors have been proposed [4], [5], [8], [16], [19], [20], [21], [22], [23]. In the case of planar images, the Oxford dataset benchmark [4], [24] is a well-established set of de facto standard, although an extension to non-planar images is not immediate [19]. Other evaluation methodologies use laser-scanner images [21] or structure from motion algorithms [16], [23] or epipolar reprojection on more than two images [20], but in general they require a complex and error prone setup.
This paper presents in Section 2 a matching strategy to improve the discriminative power of histogram-based keypoint descriptors by constraining the range of allowable orientations according to the scene context.
We build the proposed matching strategy on the sGLOH (shifting GLOH) descriptor described in Section 2, presented in our previous work [25]. It uses a circular grid to embed more descriptor instances with different dominant discrete orientations of the same feature patch into a single feature vector. Each descriptor instance is accessible by an internal shift of the feature vector elements without the need to recompute the histograms. The matching distance between features is modified to consider the minimum distance among all descriptor instances for the possible dominant discrete orientations.
The sGLOH design can be used to further constrain the allowable dominant discrete orientations to be considered in the matching distance. A finer selection of the range of the dominant discrete orientations to be considered can be done a priori by defining a very fast matching strategy, named sCOr (shifting Constrained Orientation) or alternatively, when no further information is given in advance, by using an adaptive distance measure according to a voting strategy to get the sGOr (shifting Global Orientation) matching (see Section 2).
In order to assess the properties of the novel matching strategies, different experiments reported in Section 3 were carried out, both on planar and non-planar scenes. To provide more insights, we also evaluated the case when more than just the first dominant orientation is used in SIFT and GLOH. The rotational invariant MROGH [14] and LIOP [15] were also included in the evaluation due to the increasing interest towards them in recent works [14], [8].
In the case of non-planar scenes, a novel dataset was created which employs a new evaluation protocol based on the approximated overlap error [26], [27]. This evaluation protocol provides an effective analysis in the case of non-planar scenes, extending the current state-of-the-art results [20], [22]. Section 4 reports final comments and conclusions.
Section snippets
The proposed matching strategy
Patch normalization and orientation methods are presented before defining the keypoint matching with sCOr and sGOr, as well as details on the sGLOH descriptor [25], which is essential in the matching pipeline since it allows constraints on the range of allowable orientations.
Experimental evaluation
In order to evaluate the proposed matching approaches, comparisons with SIFT, GLOH, LIOP, MROGH and the original sGLOH were carried out, both in the planar and non-planar cases. The HarrisZ detector [34] which selects robust and stable Harris corners in the affine scale-space was used. Previous evaluations [34] have shown that it is comparable with the state-of-the-art detectors and provides better keypoints than Harris-affine. Moreover, although descriptors are influenced by detectors, the
Conclusions
In this paper we have shown how to improve the discriminative power of histogram-based keypoint descriptors by constraining the range of allowable orientations according to the scene under observation. This is done by computing a global gradient orientation based on the image matching context in the case of sGOr, or can be provided a priori in the case of sCOr.
We tested the proposed descriptors together with SIFT, GLOH and recent rotation invariant descriptors by intensity order pooling, MROGH
Acknowledgments
This work was supported partially by grant B71J12001380001, University of Palermo FFR 2012/2013.
References (35)
- et al.
Description of interest regions with local binary patterns
Pattern Recogn.
(2009) - et al.
Speeded-up robust features (SURF)
Comput. Vis. Image Underst.
(2008) - et al.
Evaluation of two-view geometry methods with automatic ground-truth generation
Image Vis. Comput.
(2013) - et al.
Using orientation codes for rotation-invariant template matching
Pattern Recogn.
(2004) Generalizing the Hough transform to detect arbitrary shapes
Pattern Recogn.
(1981)- et al.
Beyond bags of features: spatial pyramid matching for recognizing natural scene categories
- et al.
A comparative evaluation of interest point detectors and local descriptors for visual SLAM
Mach. Vis. Appl.
(2010) - et al.
Modeling the world from internet photo collections
Int. J. Comput. Vis.
(2008) - et al.
A comparison of affine region detectors
Int. J. Comput. Vis.
(2005) - et al.
A performance evaluation of local descriptors
IEEE Trans. Pattern Anal. Mach. Intell.
(2005)
Non-parametric local transforms for computing visual correspondence
Evaluation of local detectors and descriptors for fast feature matching
Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vis.
PCA-SIFT: a more distinctive representation for local image descriptors
A sparse texture representation using local affine regions
IEEE Trans. Pattern Anal. Mach. Intell.
Three things everyone should know to improve object retrieval
Rotationally invariant descriptors using intensity order pooling
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (26)
A visual framework to create photorealistic retinal vessels for diagnosis purposes
2020, Journal of Biomedical InformaticsCitation Excerpt :In other cases, an initial training must be performed to produce supervised methods, which seem to be strictly dependent on the values set by the user. A promising approach based on keypoints [10] was described in [11]. Regarding the study of the optic disc, a circular luminous object was proposed in [12] to define the unique round structure.
A multi-vision sensor-based fast localization system with image matching for challenging outdoor environments
2015, Expert Systems with ApplicationsCitation Excerpt :To make objective observations, SURF (Bay et al., 2008), Affine-SIFT (Guoshen & Morel, 2009), FAST (Rosten et al., 2010), and the extraction from the feature point of the proposed virtual view feature extraction are evaluated for a data set having both illumination and viewpoint changes. These are combined into SURF (Bay et al., 2008), CS-LBP (Kim et al., 2012), BRIEF (Calonder, 2011), sGLOH (Ballavia et al., 2014), and the proposed generation method of an illumination robust descriptor. The parameter value of the algorithm used is the most ideal value, and the estimate of the algorithm is implemented by average matching rate and processing time.
Orthogonal moments for determining correspondence between vessel bifurcations for retinal image registration
2015, Computer Methods and Programs in BiomedicineCitation Excerpt :The paper describes application of Random Sample Consensus (RANSAC) to the location determination problem (LDP) of known landmark points. Implementation details and comparison of histogram based keypoint descriptors is presented in Ref. [17]. Stewart et al. propose a dual bootstrap iterative closest point algorithm [18].
Relative Feature Orientation Filtering in COLMAP Structure from Motion
2023, International Conference Image and Vision Computing New ZealandRetinal image synthesis through the least action principle
2020, ICIIBMS 2020 - 5th International Conference on Intelligent Informatics and Biomedical SciencesRootsGloh2: Embedding RootSIFT 'square rooting' in sGLOH2
2020, IET Computer Vision
- ☆
This paper has been recommended for acceptance by Cornelia M Fermuller.