Comparing salient point detectors

https://doi.org/10.1016/S0167-8655(02)00192-7Get rights and content

Abstract

The use of salient points in content-based retrieval allows an image index to represent local properties of the image. Classic corner detectors can also be used for this purpose but they have drawbacks when are applied to various natural images mainly because visual features do not need to be corners and corners may gather in small regions. In this paper, we present a salient point detector using wavelet transform and we compare it with two corner detectors using two criteria: repeatability rate and information content. We determine which detector gives the best results and show that it satisfies the criteria well.

Introduction

Many computer vision tasks rely on low level features. A wide variety of feature detectors exist and results can vary enormously depending on the detector used. An image is “summarized” by a set of features, the image index, to allow fast querying. Local features are of interest since they lead to an index based on local properties of the image. The feature extraction is limited to a subset of the image pixels, the interest points, where the image information is supposed to be the most important (Schmid and Mohr, 1997; Sebe et al., 2000). Besides saving time in the indexing process, these points may lead to a more discriminant index because they are related to the visually most important parts of the image.

Haralick and Shapiro (1993) consider a point in an image interesting if it has two main properties: distinctiveness and invariance. This means that a point should be distinguishable from its immediate neighbors and the position as well as the selection of the interesting point should be invariant with respect to the expected geometric and radiometric distortions.

Schmid and Mohr (1997) proposed the use of corners as interest points in image retrieval using the Harris corner detector (Harris and Stephens, 1988). The basic idea is to use the auto-correlation function in order to determine locations where the signal changes in two directions. A matrix related to the auto-correlation function which takes into account the first derivatives of the signal on a window is computed. The eigenvectors of this matrix are the principal curvatures of the auto-correlation function. Two significant values indicate the presence of an interest point.

Different interest point detectors are evaluated and compared in (Schmid et al., 2000). Besides the Harris corner detector and an improved variant of it called PreciseHarris, the authors also consider the detectors proposed by Heitger et al. (1992), Förstner (1994), and Horaud et al. (1990). Heitger et al. (1992) developed an approach inspired by experiments on the biological visual system. They extract 1D directional characteristics by convolving the image with orientation-selective Gabor filters. In order to obtain 2D characteristics, they compute the first and second derivatives of the 1D characteristics. Förstner (1994) classifies image pixels into categories––region, contour, and interest point––by using the auto-correlation function. Local statistics allow a blind estimate of signal-dependent noise variation and thus an automatic selection of thresholds. Horaud et al. (1990) extract line segments from the image contours. These segments are grouped and the intersections of grouped line segments are the interest points. The authors (Schmid et al., 2000) concluded that the best results are provided by the Harris detector (Harris and Stephens, 1988). Zheng et al. (1999) proposed a method derived from the Harris detector (in their paper they call it Plessey corner detector). The most important improvement of their corner detector is that it decreases the complexity (instead of calculating the Gaussians they calculate smoothed gradient-multiple images). They conclude that the performance of their gradient-direction corner detection is slightly inferior to that of the Harris detector but the performance of localization (defined as the closeness to the true location of the corner) is better than that of the Harris detector.

Corner detectors, however, were designated for robotics and shape recognition and they have drawbacks when are applied to natural images. Visual focus points do not need to be corners: when looking at a picture, we are attracted by some parts of the image, which are the most meaningful for us. We cannot assume them to be located only in corner points, as is mathematically defined in most corner detectors. For instance, a smoothed edge may have visual focus points and they are usually not detected by a corner detector. The image index we want to compute should describe them as well. Corners also gather in textured regions. The problem is that due to efficiency reasons only a preset number of points per image can be used in the indexing process. Since in this case most of the detected points will be in a small region, the other parts of the image may not be described in the index at all. However, we do not want to have points in all possible regions: regions where there is nothing interesting (e.g., a region with a constant grey level) should not contain any “interesting” points.

We believe that other points based on image information can be extracted using approaches other than the corner differential framework. Studies on visual attention, more related to human vision, propose different models. The basic information is still the variation in the stimuli. However, this is not longer taken into account in a differential way but mainly from an energy point of view (Itti et al., 1998). Another approach is to integrate a scale space approach into the corner extraction algorithm (Lindeberg, 1998; Mikolajczyk and Schmid, 2001). The idea is to select a characteristic scale by searching for local extreme over scales.

In this context, we aim for a set of interesting points called salient points that are related to any visual interesting part of the image whether it is smoothed or corner-like. Moreover, to describe different parts of the image the set of salient points should not be clustered in few regions. We believe multiresolution representation is interesting to detect salient points. Multiresolution representations are usually implemented using image pyramids. This representation has various properties that makes it very popular in image processing and computer vision algorithms: (1) the adaptation of resolution is suitable for coarse-to-fine multigrid iteration strategies; (2) iterative algorithms that proceed by successive refinements usually require less computations and have faster convergence; (3) in the context of iterative algorithms, the smoothing effect of the pyramid reduces the likelihood of getting trapped in local extrema, which increases robustness; and (4) analogies can be made with the hierarchical organization of the human primary visual cortex. One of the earliest example of a pyramid is due to Burt and Adelson (1986). Their Gaussian filtering, however, produces excessive smoothing which leads to some loss of image details. Higher-quality image reduction can be obtained by designing a filter that is optimum in the least-squares sense (Unser, 1992) or by using the lowpass branch of a wavelet decomposition algorithm (Mallat, 1989).

Taking these into account, we present a salient point extraction algorithm that uses the wavelet transform, which expresses image variations at different resolutions. Our wavelet-based salient points are detected for smoothed edges and are not gathered in texture regions. Hence, they lead to a more complete image representation than corner detectors. The algorithm presented in this paper is an improved version of our algorithm presented in Loupias et al. (2000), Tian et al. (2001), and Loupias and Bres (2001). There we were interested in using the salient points in a content-based retrieval scenario and we showed that extracting color and texture features in the location given by the salient points provided significantly improved results in terms of retrieval accuracy, computational complexity, and storage space of feature vectors as compared to global features approaches. In a content-based retrieval application the geometric stability of the salient points is not really critical. There, the features stability is more important since image matching is done at feature level. For example, even if a salient point moves along an edge, the matching does not change as long as the feature extracted in that point remains stable. However, if we want to use the salient points in other applications, such as object tracking and recognition or stereo matching, the geometrical stability becomes really critical.

In order to evaluate the “interestingness” of the points (as was introduced by Haralick and Shapiro (1993)) two criteria are considered: repeatability rate and information content. The repeatability rate evaluates the geometric stability of points under different image transformation. Information content measures the distinctiveness of greylevel pattern at an interest point. A local pattern is described using rotationally invariant combinations of derivatives. The entropy of these invariants is measured for a set of interest points.

Section snippets

Wavelet-based salient points

The wavelet representation gives information about the variations in the image at different scales. We would like to extract salient points from any part of the image where something happens at any resolution. A high wavelet coefficient (in absolute value) at a coarse resolution corresponds to a region with high global variations. The idea is to find a relevant point to represent this global variation by looking at wavelet coefficients at finer resolutions.

A wavelet is an oscillating and

Repeatability

Repeatability is defined by the image geometry. Given a 3D point P and two projection matrices M1 and M2, the projections of P into two images I1 and I2 are p1=M1P and p2=M2P. The point p1 detected in image I1 is repeated in image I2 if the corresponding point p2 is detected in image I2. To measure the repeatability, a unique relation between p1 and p2 has to be established. In the case of a planar scene this relation is defined by an homography: p2=H21p1.

The percentage of detected points which

Information content

Information content is a measure of the distinctiveness of a salient point. Distinctiveness is based on the likelihood of a greyvalue descriptor computed at the point within the population of all observed salient point descriptors. Given one image, a descriptor is computed for each of the detected salient points and the information content will measure the distribution of these descriptors. If all descriptors are spread out, information content is high and matching is likely to succeed. On the

Results

In the experiments we used a set of 1000 images taken from the Corel database and we compared four salient point detectors. In Section 2 we introduced two salient point detectors using wavelets: Haar and Daubechies4. For benchmarking purposes we also considered the Harris corner detector (Harris and Stephens, 1988) and a variant of it called PreciseHarris, introduced by Schmid et al. (2000). The difference between the last two detectors is given by the way the derivatives are computed. Harris

Conclusion

We presented a salient point detector based on wavelets. The wavelet-based salient points are interesting because they are located in visual focus points without gathering in textured regions. We used the Haar transform for point extraction, which is simple but may lead to bad localization. A better approach is to use Daubechies4 wavelets which avoid these drawbacks.

We also compared our wavelet-based salient point extraction algorithm with two corner detectors using the criteria: repeatability

Acknowledgements

We would like to thank Etienne Loupias who designed and contributed to the first salient point extraction algorithm.

References (21)

There are more references available in the full text version of this article.

Cited by (85)

  • Generic visual categorization using composite Gabor and moment features

    2015, Optik
    Citation Excerpt :

    Interest point detection plays an important role in content based image retrieval in order to represent the local properties of the image. The salient points are not confined to corners, but show variations that happen at different resolutions in the images [12]. In this paper, the interest points are detected using wavelet transform to detect global variations as well as local ones.

  • Automatic image segmentation using salient key point extraction and star shape prior

    2014, Signal Processing
    Citation Excerpt :

    For Smith and Brady (SUSAN detector), salient points are pixels that have few neighbors with similar values [30]. Localization accuracy is one of the most often used criterions to evaluate salient points [24,35]. There are many different salient point detection approaches seeking for locating all salient points as accurate as possible in the image.

  • Facial-feature detection and localization based on a hierarchical scheme

    2014, Information Sciences
    Citation Excerpt :

    Wavelet-based saliency detection is an effective approach for describing different parts of a face image, as it can express image variations at different resolutions. An extensive comparison of saliency-detection techniques can be found in [31–35,42]. An orthogonal wavelet transform with a compact support, i.e. its value is zero outside a bounded interval, can lead to a non-redundant and a complete representation of signals.

  • Hierarchical Salient Point Selection for image retrieval

    2012, Pattern Recognition Letters
    Citation Excerpt :

    The query image is represented as a collection of feature vectors with one feature vector for a block or region. Many approaches for feature extraction in CBIR systems are based on salient points (Ke and Sukthankar, 2004; Ledwich and Williams, 2004; Lowe, 2004; Loupias et al., 2000; Wang et al., 2005; Sebe and Lew, 2003; Banerjeea et al., 2009; Lo and Tsai, 2008), which are points of high variability in the features of the local pixel neighborhood. Based on these published results, salient points have been proved to be effective in feature extractions for the applications of CBIR.

View all citing articles on Scopus
View full text