Elsevier

Signal Processing

Volume 125, August 2016, Pages 304-314
Signal Processing

Rotation invariant HOG for object localization in web images

https://doi.org/10.1016/j.sigpro.2016.01.016Get rights and content

Highlights

  • Geometrical invariant Object localization/recognition in Web images.

  • The Rotation Invariant HOG (RIHOG).

  • Fast scale invariant object localization/recognition in images.

  • The Rapid Ranking Method (RRM) is based on Top-Down searching under RIHOG.

  • Geometrical invariant image ranking method.

Abstract

To localize objects in Web images using an invariant descriptor is crucial. The HOG (histogram of oriented gradients) descriptor is used to increase the accuracy of localization. It is a shape descriptor that considers frequencies of gradient orientation in localized portions of an image. This well known descriptor does not cover rotation variations of an object in images. This paper introduces a rotation invariant feature descriptor based on HOG. The proposed descriptor is used in a top-down searching technique that covers the scale variation of the objects in images. The efficiency of this method is validated by comparing the performance with existing research in a similar domain on the Caltech-256 Web dataset. The proposed method not only provides robustness against geometrical transformations of objects but also is computationally more efficient.

Introduction

Web images usually contain a high degree of background clutter and also contain multiple objects in each image. To retrieve an image, usually Content-Based Image Retrieval (CBIR) systems [1], [2] are used. These systems try to retrieve images similar to a user-defined specification or pattern (e.g. shape sketch and image example). Generally, the algorithms used in these systems are commonly divided into three tasks: extraction, selection, and classification [3]. The extraction task transforms the rich content of images into various features. Feature extraction is the process of generating features to be used in selection and classification tasks. Feature selection reduces the number of features provided to the classification task. Those features, which are likely to assist in discrimination, are selected and used in the classification task. Among these three activities, feature extraction is the most critical because the particular features made available for discrimination directly influence the performance of the classification task. Our study focuses on feature extraction and its effect on image ranking performance. Usually, there are several objects in a Web image. Thus, object representation in global feature extraction likely [4] will not result in accurate object categorization.

In some applications the accuracy of image retrieval is of utmost importance. For instance, the user is interested in knowing which images in the database contain the given image of query object (In the literature this query image is also called a template image or query object). In such cases, it is first necessary to search inside the images to find or localize the object. Then to rank the image among all images in the database, it is needed to assign a weight to this image based on the similarity of the found object with the given template. The traditional solution is template matching. Given a template, all the possible locations in the image are searched by a sliding window. Template matching has some major issues such as dependency on the scale and orientation of the template. Furthermore, the complexity of the search is O(n2) for an image of size n×n. To address such dependency issues, one can test all possible scales and orientations of the objects in the image but that would be an inefficient and very time consuming approach.

We proposed the Rotation Invariant HOG (RIHOG) feature to cover the different possible orientations of the objects in the image. To address all possible scales and different locations of the object of interest in the image, we proposed the top-down searching method. So, by searching each image in the proposed top-down approach and using RIHOG as features for comparison of selected regions with the template, we can find the object of interest in images of the database and rank them accordingly using their RIHOG correlation with given template image. We referred to this ranking approach as Rapid Ranking Method (RRM). In case there are several objects of the same type in one image (e.g. several guns in one image), by applying this method, the top-down searching window converges to the object which has the highest similarity with the template image under RIHOG features. This method exhibits robustness against geometrical transformations of objects and has an efficient computational complexity. The rest of this paper is organized as follows. In following section, we mention some related works. In Section 3, we explain our method and how we use it for image ranking application. Section 4 presents experimental results.

Section snippets

Related research

Several geometric invariant feature extraction algorithms appear in literatures. Among them, BRISK [8], FREAK [9], SURF [10], and SIFT [11], are widely used features in computer vision applications. Canclini et al. [12] showed that in applications related to object recognition and retrievals, SIFT feature outperforms the other mentioned geometrical invariant methods in terms of True Positive results. A derivative of SIFT is HOG which is a well-known shape descriptor that is used in several

Proposed method

The main goal of this study is to provide a fast, accurate, and invariant object localization method to be used in object retrieval applications. In the feature selection section we focus on the HOG features, with the following attribute: it is robust against small local rotational variations of objects. Besides, it provides a distinctive description of objects. Furthermore, as this feature uses histograms, we can decrease its extraction complexity to O(c) by using the integral form of the

Experimental results

In this study, we use Caltech-256 [20] which consists of 30,607 Web images. Images are assigned to 256 categories and evaluated by humans in order to ensure image quality and relevance. Images usually contain several objects with non-uniform background and a variety of rotations, scales, and illuminations. The C256 dataset is used for large scale image classification, object detection, and retrieval purposes [21], [22], [23], [24].

As the Caltech-256 is a large dataset, bootstrapping [25] is

Conclusion

The main objective of this study was to find a proper representation (features) of the objects in order to have high ranking performance in databases of Web images. Since HOG feature is proven to be an efficient method for object detection, we focused on this feature and how to address its rotation and scale dependency issues. To address rotation variation of objects in images, the RIHOG feature was introduced which is rotation invariant. Furthermore, a top-down searching method is introduced

References (37)

  • Y. Pang et al.

    Efficient hog human detection

    Signal Process.

    (2011)
  • Y. Yuan et al.

    Multi-spectral pedestrian detection

    Signal Process.

    (2015)
  • R. Jafari et al.

    Eye-gaze estimation under various head positions and iris states

    Expert Syst. Appl.

    (2015)
  • W.J.X. Wangming, L. Xinhai, Application of image sift features to the context of cbir, in: International Conference on...
  • A. Marchiori, C. Brodley, J. Dy, C. Pavlopoulou, A. Kak, L. Broderick, A.M. Aisen, Cbir for medical images-an...
  • R. Choras

    Image feature extraction techniques and their applications for cbir and biometrics systems

    Int. J. Biol. Biomed. Eng.

    (2007)
  • W. Niblack et al.

    The Qbic projectquerying images by content, using color, texture, and shape

    Int. Soc. Opt. Photonics

    (1993)
  • N. Chen et al.

    Fast detection of human using differential evolution

    Signal Process.

    (2014)
  • S. Leutenegger, M. Chli, R.Y. Siegwart, Brisk: binary robust invariant scalable keypoints, in: IEEE International...
  • A. Alahi, R. Ortiz, P. Vandergheynst, Freak: fast retina keypoint, in: IEEE Conference on Computer Vision and Pattern...
  • A.C. Murillo, J.J. Guerrero, C. Sagues, Surf features for efficient robot localization with omnidirectional images, in:...
  • L. Fan, Intra-class variation, affine transformation and background clutter: towards robust image matching, in: IEEE...
  • A. Canclini, M. Cesana, A. Redondi, M. Tagliasacchi, J. Ascenso, R. Cilla, Evaluation of low-complexity visual feature...
  • N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: International Conference on Computer...
  • K. Liu et al.

    Rotation invariant hog descriptors using fourier analysis in polar and spherical coordinates

    Int. J. Comput. Vis.

    (2008)
  • Y. Su, Y. Wang, Rotation invariant shape contexts based on feature-space Fourier transformation, in: International...
  • L. Oswaldo, D. Delgado, Trainable classifier-fusion schemes: an application to pedestrian detection, in: IEEE...
  • F. Porikli, Integral histogram: a fast way to extract histograms in cartesian spaces, in: IEEE Conference on Computer...
  • Cited by (11)

    • Extended variational inference for gamma mixture model in positive vectors modeling

      2021, Neurocomputing
      Citation Excerpt :

      Recently, researchers have proposed a number of handcrafted local and global descriptors as features. One of the most notable and efficient descriptors is HOG (histogram of oriented gradient), which has been successfully applied in various applications such as object detection and categorization [36–38]. In the experiments, we apply an variant of HOG namely the rectangular HOG (R-HOG) descriptor [39].

    • Subpixel blob localization and shape estimation by gradient search in parameter space of anisotropic Gaussian kernels

      2020, Signal Processing
      Citation Excerpt :

      There are three kinds of important features: edges/contours, corners/junctions, and blobs/regions of interest. These features are successfully applied in many tasks such as object extraction and expression, image restoration, image registration, object tracking and recognition in image sequence [1-5]. Edges generally indicate lines or curves in images on which the grayscale or color abruptly alters and often correspond to the contours of objects of interest [6].

    • Content-Based Image Retrieval Using Angles Across Scales

      2022, IEEE Geoscience and Remote Sensing Letters
    • 2D Object Recognition Techniques: State-of-the-Art Work

      2021, Archives of Computational Methods in Engineering
    View all citing articles on Scopus
    View full text