Elsevier

Image and Vision Computing

Volume 32, Issue 9, September 2014, Pages 559-567
Image and Vision Computing

Keypoint descriptor matching with context-based orientation estimation

https://doi.org/10.1016/j.imavis.2014.05.002Get rights and content

Highlights

  • Novel matching strategies for histogram-based descriptors are presented.

  • Global dominant orientation is used by exploiting the image context.

  • A new 3D extensible framework to evaluate feature descriptors is introduced.

  • 2D/3D comparisons with state-of-the-art rotational invariant descriptors are reported.

  • Results show the effectiveness of the proposed matching approaches.

Abstract

This paper presents a matching strategy to improve the discriminative power of histogram-based keypoint descriptors by constraining the range of allowable dominant orientations according to the context of the scene under observation. This can be done when the descriptor uses a circular grid and quantized orientation steps, by computing or providing a global reference orientation based on the feature matches.

The proposed matching strategy is compared with the standard approaches used with the SIFT and GLOH descriptors and the recent rotation invariant MROGH and LIOP descriptors. A new evaluation protocol based on an approximated overlap error is presented to provide an effective analysis in the case of non-planar scenes, thus extending the current state-of-the-art results.

Introduction

Keypoints extracted from digital images have been adopted with good results as primitive parts in many computer vision tasks, such as recognition [1], tracking [2] and 3D reconstruction [3]. The detection and extraction of meaningful image regions, named keypoints or image features, are usually the first step of these methodologies. Numerical vectors that embody the image region properties are successively computed to compare the keypoints found according to the particular task.

Different feature detectors have been proposed during the last decade invariant to affine transformations or scale and rotation only, including, but not limited to, corners and blobs. The reader may refer to [4] for a general overview.

After the keypoint is located, a meaningful descriptor vector to embody the characteristic properties of the keypoint support region (i.e. its neighborhood) is computed. Different descriptors have been developed, which can be divided mainly into two categories: distribution-based descriptors and banks of filters. In general, while the former give better results, the latter provides more compact descriptors. Banks of filters include complex filters, color moments, the local jet of the keypoint, differential operators and Haar wavelet coefficients. Refer to [5] for more details.

Distribution-based descriptors, also named histogram-based descriptors, divide the keypoint region, also called feature patch, into different areas and compute specific histograms related to some image properties for each area. The final descriptor is given by the ordered concatenation of these histograms. The rank and the census transforms [6], which consider binary comparisons of the intensity of central pixel against its neighborhood, are the precursors of the histogram-based descriptors. In particular, the CS-LBP [7] descriptor can be considered an extension of this kind of approach. The spin image descriptor, the shape context and the geometric blur and the more recent DAISY, BRIEF, BRISK and FREAK descriptors (see [5], [8]) should be mentioned.

One of the most popular descriptors based on histograms is surely the SIFT (Scale Invariant Feature Transform) [9], which is a 3D histogram of gradient orientations on a Cartesian grid. SIFT has been extended in various ways since its first introduction. The PCA-SIFT descriptor [10] increases the robustness of the descriptor and decreases its length by applying PCA (Principal Component Analysis), RIFT (Rotation Invariant Feature Transform) [11] is a ring-based rotational invariant version, while GLOH (Gradient Local Orientation Histogram) [5] combines a log-polar grid with PCA and SURF [12] is an efficient discrete SIFT variant. Recently, RootSIFT [13] improves upon SIFT by replacing the Euclidean distance with the Bhattacharyya distance after the normalization of the descriptor vector with the Manhattan norm instead of the conventional Euclidean norm. Overlapping regions using multiple support regions combined by intensity order pooling are used by MROGH (Multi Support Region Order Based Gradient Histogram) [14]. Furthermore, LIOP (Local Intensity Order Pattern) [15] uses the intensity order pooling and the relative order of neighbor pixels to define the histogram.

Over the last few years, machine learning techniques have been applied to remove the correlation between the descriptor elements and to reduce the dimension [10], [16], as well as different histogram distances to improve the matches [17], [18].

Different methodologies for evaluating feature descriptors and detectors have been proposed [4], [5], [8], [16], [19], [20], [21], [22], [23]. In the case of planar images, the Oxford dataset benchmark [4], [24] is a well-established set of de facto standard, although an extension to non-planar images is not immediate [19]. Other evaluation methodologies use laser-scanner images [21] or structure from motion algorithms [16], [23] or epipolar reprojection on more than two images [20], but in general they require a complex and error prone setup.

This paper presents in Section 2 a matching strategy to improve the discriminative power of histogram-based keypoint descriptors by constraining the range of allowable orientations according to the scene context.

We build the proposed matching strategy on the sGLOH (shifting GLOH) descriptor described in Section 2, presented in our previous work [25]. It uses a circular grid to embed more descriptor instances with different dominant discrete orientations of the same feature patch into a single feature vector. Each descriptor instance is accessible by an internal shift of the feature vector elements without the need to recompute the histograms. The matching distance between features is modified to consider the minimum distance among all descriptor instances for the possible dominant discrete orientations.

The sGLOH design can be used to further constrain the allowable dominant discrete orientations to be considered in the matching distance. A finer selection of the range of the dominant discrete orientations to be considered can be done a priori by defining a very fast matching strategy, named sCOr (shifting Constrained Orientation) or alternatively, when no further information is given in advance, by using an adaptive distance measure according to a voting strategy to get the sGOr (shifting Global Orientation) matching (see Section 2).

In order to assess the properties of the novel matching strategies, different experiments reported in Section 3 were carried out, both on planar and non-planar scenes. To provide more insights, we also evaluated the case when more than just the first dominant orientation is used in SIFT and GLOH. The rotational invariant MROGH [14] and LIOP [15] were also included in the evaluation due to the increasing interest towards them in recent works [14], [8].

In the case of non-planar scenes, a novel dataset was created which employs a new evaluation protocol based on the approximated overlap error [26], [27]. This evaluation protocol provides an effective analysis in the case of non-planar scenes, extending the current state-of-the-art results [20], [22]. Section 4 reports final comments and conclusions.

Section snippets

The proposed matching strategy

Patch normalization and orientation methods are presented before defining the keypoint matching with sCOr and sGOr, as well as details on the sGLOH descriptor [25], which is essential in the matching pipeline since it allows constraints on the range of allowable orientations.

Experimental evaluation

In order to evaluate the proposed matching approaches, comparisons with SIFT, GLOH, LIOP, MROGH and the original sGLOH were carried out, both in the planar and non-planar cases. The HarrisZ detector [34] which selects robust and stable Harris corners in the affine scale-space was used. Previous evaluations [34] have shown that it is comparable with the state-of-the-art detectors and provides better keypoints than Harris-affine. Moreover, although descriptors are influenced by detectors, the

Conclusions

In this paper we have shown how to improve the discriminative power of histogram-based keypoint descriptors by constraining the range of allowable orientations according to the scene under observation. This is done by computing a global gradient orientation based on the image matching context in the case of sGOr, or can be provided a priori in the case of sCOr.

We tested the proposed descriptors together with SIFT, GLOH and recent rotation invariant descriptors by intensity order pooling, MROGH

Acknowledgments

This work was supported partially by grant B71J12001380001, University of Palermo FFR 2012/2013.

References (35)

  • R. Zabih et al.

    Non-parametric local transforms for computing visual correspondence

  • O. Miksik et al.

    Evaluation of local detectors and descriptors for fast feature matching

  • D. Lowe

    Distinctive image features from scale-invariant keypoints

    Int. J. Comput. Vis.

    (2004)
  • Y. Ke et al.

    PCA-SIFT: a more distinctive representation for local image descriptors

  • S. Lazebnik et al.

    A sparse texture representation using local affine regions

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2005)
  • R. Arandjelović et al.

    Three things everyone should know to improve object retrieval

  • B. Fan et al.

    Rotationally invariant descriptors using intensity order pooling

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • Cited by (26)

    • A visual framework to create photorealistic retinal vessels for diagnosis purposes

      2020, Journal of Biomedical Informatics
      Citation Excerpt :

      In other cases, an initial training must be performed to produce supervised methods, which seem to be strictly dependent on the values set by the user. A promising approach based on keypoints [10] was described in [11]. Regarding the study of the optic disc, a circular luminous object was proposed in [12] to define the unique round structure.

    • A multi-vision sensor-based fast localization system with image matching for challenging outdoor environments

      2015, Expert Systems with Applications
      Citation Excerpt :

      To make objective observations, SURF (Bay et al., 2008), Affine-SIFT (Guoshen & Morel, 2009), FAST (Rosten et al., 2010), and the extraction from the feature point of the proposed virtual view feature extraction are evaluated for a data set having both illumination and viewpoint changes. These are combined into SURF (Bay et al., 2008), CS-LBP (Kim et al., 2012), BRIEF (Calonder, 2011), sGLOH (Ballavia et al., 2014), and the proposed generation method of an illumination robust descriptor. The parameter value of the algorithm used is the most ideal value, and the estimate of the algorithm is implemented by average matching rate and processing time.

    • Orthogonal moments for determining correspondence between vessel bifurcations for retinal image registration

      2015, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      The paper describes application of Random Sample Consensus (RANSAC) to the location determination problem (LDP) of known landmark points. Implementation details and comparison of histogram based keypoint descriptors is presented in Ref. [17]. Stewart et al. propose a dual bootstrap iterative closest point algorithm [18].

    • Relative Feature Orientation Filtering in COLMAP Structure from Motion

      2023, International Conference Image and Vision Computing New Zealand
    • Retinal image synthesis through the least action principle

      2020, ICIIBMS 2020 - 5th International Conference on Intelligent Informatics and Biomedical Sciences
    View all citing articles on Scopus

    This paper has been recommended for acceptance by Cornelia M Fermuller.

    View full text