Random interest regions for object recognition based on texture descriptors and bag of features

https://doi.org/10.1016/j.eswa.2011.07.097Get rights and content

Abstract

In this work we propose a novel method for object recognition based on a random selection of interest regions, texture features (local binary/ternary patterns and local phase quantization) for describing each region, a bag-of-features approach for describing each object, and classification using support vector machines (SVMs). In our approach, a set of features is extracted from each subwindow of the object image. These sets are quantified, and the resulting global descriptor vector is used as a characterization of the image (e.g., as a feature vector for learning an image classification rule based on a SVM classifier). The standard texture descriptor is not widely utilized in region description. One of the first texture descriptors explored in region description is the CS-LBP descriptor, where a local binary pattern (LBP) feature is used as the local feature in the SIFT method, the most well-known object recognition algorithm. Our approach based on texture descriptors is much simpler than the SIFT algorithm, yet it performs comparably well. Furthermore, we show that the fusion between our approach and SIFT obtains a very high AUC in the well-known PASCAL VOC2006 dataset.

Highlights

► A bag-of-features approach based on the combination of different texture descriptors. ► A random selection of interest regions coupled with standard texture descriptor permits to obtain a method simpler than SIFT. ► The fusion of descriptors outperforms the winner of VOC2006.

Introduction

Detecting and describing local image features is valuable in an number of applications, including wide baseline matching for stereo pairs (Baumberg, 2000, Tuytelaars and Gool, 2004), object retrieval in videos (Sivic, Schaffalitzky, & Zisserman, 2004), object recognition (Lowe, 2004), texture recognition (Lazebnik, Schmid, & Ponce, 2005), robot localization (Se, Lowe, & Little, 2002), visual data mining (Sivic & Zisserman, 2004), and symmetry detection (Turina, Tuytelaars, & Gool, 2001). The basic idea is to match regions of interest in one image to the same regions of interest in other images that were taken from different viewpoints. This is accomplished by first detecting regions in a given image that are covariant to a class of transformations. Invariant descriptors are then extracted for each region, and these are used to match identical regions between images. Region matching using local image features are tolerant to illuminations changes, blur, zoom effects, various degrees of occlusion, and distortions in perspective.

This paper focuses on region description. For a comparison of region detection approaches (see Mikolajczyk et al., 2005). Once the regions are detected, invariant descriptors are extracted that can be used to match these regions in other images. Various approaches to region description have been proposed that emphasize different properties of images, such as color, texture, edges, and pixel intensities. In a number of comparison studies (e.g., Mikolajczyk & Schmid, 2005) the best results represent properties of the region of interest using descriptors that are based on histogram distributions. For example, the 2D histogram approach called intensity-domain spin image (Lazebnik et al., 2005) represents regions using the distance from the center point and intensity values. SIFT (Lowe, 2004), a 3D histogram, takes the gradient locations and orientations and weighs them by the gradient magnitude and a Gaussian window that is superimposed over the region. Some interesting variations of the SIFT descriptor use a log-polar location grid instead of a Cartesian location grid (Mikolajczyk & Schmid, 2005). The SURF descriptor (Bay, Tuytelaars, & Gool, 2006) utilizes properties of the best region detection and description methods by coupling a Hessian matrix-based measure as a detector with Haar wavelet responses as the descriptor. Computational complexity is reduced in this method by relying on integral images for image convolutions. In Ling and Jacobs (2005) a geodesic intensity histogram (GIH) provides a deformation invariant local descriptor. Other descriptors include PCA-SIFT (Ke & Sukthankar, 2004), moment invariants (Gool, Moons, & Ungureanu, 1996), and complex filters (Schaffalitzky & Zisserman, 2002).

In Winder and Brown (2007) the description process is divided into a number of modules that are plugged together in different combinations. Some of these combinations give rise to the region of interest descriptor approaches described above, such as SIFT, but others have yet to be explored. By dividing the process into modules, choice of parameter values can be optimized using learning algorithms.

There are a number of powerful texture operators that have yet to be explored in region description. In Heikkilä, Matti Pietikäinen, and Schmid (2009) the gradient feature in SIFT is replaced using a newly proposed LBP-based texture descriptor called the center-symmetric local binary pattern (CS-LBP). This method resulted in a computationally simpler descriptor than SIFT. It also had the advantage of being more robust to illumination problems.

Finally, in Nowak, Jurie, and Triggs (2006), it is shown that for moderate to large numbers of interest regions, random sampling gives equal or better classification than the sophisticated multiscale interest operators that are in common use. Starting from this result, we simplify the standard SIFT as follows:

  • We divide each object image into subwindows and extract a set of texture features from the subwindows; we have tested local binary/ternary patterns and local phase quantization as the texture features.

  • For each of the 10 classes in the well-known PASCAL VOC2006 dataset, we compute textons by clustering the descriptors of the regions of each class with k-means; by concatenating the textons over the 10 classes, we obtain a global texton vocabulary.

  • We represent each image as a histogram of texton labels, and these histograms are used to train a support vector machine classifier.

It has been reported in several papers (e.g., Nanni, Brahnam, & Lumini, 2010), that a random subspace of SVMs performs better than other SVMs and that a combination of descriptors improves performance. For this reason, we explore a random subspace ensemble of SVMs rather than a stand-alone SVM, and we use a fusion of different descriptors to improve performance. Our fusion approach obtains a very high AUC in the PASCAL VOC2006 dataset.

Section snippets

SIFT descriptor

The SIFT descriptor (Lowe, 2004) is a 3D histogram that takes the gradient locations and orientations and weighs them by the gradient magnitude and a Gaussian window that is superimposed over the region. The location is quantized into a 4 × 4 location grid. The gradient angle is quantized into eight orientations. A trilinear interpolation is used to distribute the value of each gradient sample into adjacent histogram bins to offset boundary effects in the presence of small shifts of the interest

Proposed approach

In this work we have simplified the procedure of standard SIFT, producing a very simple and efficient, method based on the following steps:

  • STEP1: The image is normalized by transforming the values using contrast-limited adaptive histogram equalization using the function adapthisteq.m in matlab. The image is then resized (doubling the dimension of the image) until both the x-dimension and the y-dimension of the image are at least 50 pixels.

Experimental results

The recognition of object categories is one of the most challenging problems in computer vision. In our experiments we use the PASCAL Visual Object Classes Challenge 2006 protocol (VOC2006) (Everingham, Zisserman, Williams, & VanGool, 2006). The dataset includes 10 categories of images: bicycle, bus, car, cat, cow, dog, horse, motorbike, person, and sheep. We use trainval images for training and the test images for testing our classifier (see http://www.pascal-network.org/challenges/VOC/voc2006

Conclusions

In this paper we present a new method for recognizing object categories. The proposed method combines the strengths of high performing texture descriptors (LPQ and LTP/LBP) with a bag-of-features approach. Starting from the conclusion of Nowak et al. (2006), where it is shown that random sampling gives equal or better performance than the sophisticated multiscale interest operators that are in common use, we further simplify the idea of SIFT. We simply divide each object image into subwindows,

Acknowledgements

The authors would like to thank T. Ojala, M. Pietikäinen and T. Mäenpää for sharing their LBP code and V. Ojansivu and J. Heikkilä for sharing their LPQ code.

References (32)

  • M. Heikkilä et al.

    Description of interest regions with local binary patterns

    Pattern Recognition

    (2009)
  • L. Nanni et al.

    Local binary patterns for a hybrid fingerprint matcher

    Pattern Recognition

    (2008)
  • L. Nanni et al.

    Local binary patterns variants as texture descriptors for medical image analysis

    Artificial Intelligence in Medicine

    (2010)
  • Ahonen, T., & Pietikäinen, M. (2007). Soft histograms for local binary patterns. In Proc. finnish signal processing...
  • Ahonen, T., Matas, J., He, C., & Pietikäinen, M. (2009). Rotation invariant image description with local binary pattern...
  • Bay, H., Tuytelaars, T., & Gool, L. V. (2006). SURF: Speeded up robust features. In European conference on computer...
  • Baumberg, A. (2000). Reliable feature matching across widely separated views. In IEEE Conference on computer vision and...
  • N. Cristianini et al.

    An introduction to support vector machines and other kernel-based learning methods

    (2000)
  • Everingham, M., Zisserman, A., Williams, C. K. I., & VanGool, L. (2006). The PASCAL visual object classes challenge...
  • Gool, L. J. V., Moons, T., & Ungureanu, D. (1996). Affine/photometric invariants for planar intensity patterns. In 4th...
  • Hafiane, A., Seetharaman, G., Palaniappan, K., & Zavidovique, B. (2008). Rotationally invariant hashing of median...
  • Iakovidis, D. K., Keramidas, E., & Maroulis, D. (2008). Fuzzy local binary patterns for ultrasound texture...
  • Ke, Y., & Sukthankar, R. (2004). PCA-SIFT: A more distinctive representation for local image descriptors. In IEEE...
  • S. Lazebnik et al.

    A sparse texture representation using local affine regions

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2005)
  • Liao, S., & Chung, A. C. S. (2007). Face recognition by using elongated local binary patterns with average maximum...
  • S. Liao et al.

    Dominant local binary patterns for texture classification

    IEEE Transactions on Image Processing (TIP, 2009)

    (2009)
  • Cited by (13)

    • Deep learning for biological image classification

      2017, Expert Systems with Applications
      Citation Excerpt :

      To describe these images properly for these algorithms, texture descriptors can be used. In this paper, the images are described using Haralick’s texture features since they have been widely and successfully employed for this purpose (Leng & Huang, 2012; Nanni, Brahnam, & Lumini, 2012). For such, a Spatial Gray Level Dependence Matrix (SGLD) from the image and Haralick’s texture features are computed from the SGLD.

    • Combination of projectors, standard texture descriptors and bag of features for classifying images

      2016, Neurocomputing
      Citation Excerpt :

      An ensemble of BOF can be coupled (based on a set of descriptors) with an ensemble of global approaches (based on another set of descriptors) for improving the performance. Here, as in [26], we divide each object image into sub-windows and extract a set of texture features from the sub-windows; we have tested LTP and LPQ as the texture features. For each of the classes of a given dataset, we compute textons by clustering the descriptors of the regions of each class with k-means; by concatenating the textons over the different classes, we obtain a global texton vocabulary.

    • Heterogeneous bag-of-features for object/scene recognition

      2013, Applied Soft Computing Journal
      Citation Excerpt :

      Another interesting result is reported in [21], where it is shown that, when the number of regions is quite large, random sampling gives equal or better classification rates than the other more complex operators that are in common use. Starting from these and other results, in this work we improve our previously published system for object recognition [9] considering the ideas reported in PCA-SIFT [3,4] and in [7]: We use both local and global descriptors to represent an image; we have tested and fused together several texture descriptors (i.e. local binary/ternary patterns, local phase quantization, histogram of oriented edges, Gabor like features, SIFT).

    • Local appearance modeling for objects class recognition

      2019, Pattern Analysis and Applications
    View all citing articles on Scopus
    View full text