The analysis and applications of adaptive-binning color histograms

https://doi.org/10.1016/j.cviu.2003.10.010Get rights and content

Abstract

Histograms are commonly used in content-based image retrieval systems to represent the distributions of colors in images. It is a common understanding that histograms that adapt to images can represent their color distributions more efficiently than do histograms with fixed binnings. However, existing systems almost exclusively adopt fixed-binning histograms because, among existing well-known dissimilarity measures, only the computationally expensive Earth Mover’s Distance (EMD) can compare histograms with different binnings. This paper addresses the issue by defining a new dissimilarity measure that is more reliable than the Euclidean distance and yet computationally less expensive than EMD. Moreover, a mathematically sound definition of mean histogram can be defined for histogram clustering applications. Extensive test results show that adaptive histograms produce the best overall performance, in terms of good accuracy, small number of bins, no empty bin, and efficient computation, compared to existing methods for histogram retrieval, classification, and clustering tasks.

Introduction

In content-based image retrieval systems, histograms are often used to represent the distributions of colors in images. There are two general methods of generating histograms: fixed binning and adaptive binning. Typically, a fixed-binning method induces histogram bins by partitioning the color space into rectangular bins [8], [9], [21], [25], [32], [35], [38]. Once the bins are derived, they are fixed and the same binning scheme is applied to all images. On the other hand, adaptive binning adapts to the actual distributions of colors in images [3], [11], [22], [27], [31]. As a result, different binnings are induced for different images.

It is a common understanding that adaptively binned histograms can represent the distributions of colors in images more efficiently than do histograms with fixed binning [11], [27], [31]. However, existing systems almost exclusively adopt fixed-binning histograms because among existing well-known dissimilarity measures, only the Earth Mover’s Distance (EMD) can compare histograms with different binnings [27], [31]. But, EMD is computationally more expensive than other dissimilarity measures because it requires an optimization process.

Another major concern is that fixed-binning histograms have been regarded as vectors in a linear vector space, with each bin representing a dimension of the space. This convenient vector interpretation makes it possible to apply various well-known algorithms, such as clustering, Principle Component Analysis, and Singular Value Decomposition to process and analyze histograms [10], [26], [34]. Unfortunately, this approach is not satisfactory because the algorithms are applied in a linear vector space, which assumes the Euclidean distance as the measure of vector difference. And Euclidean distance has been found to be less reliable than other measures for computing histogram dissimilarity [5], [27], [33]. As a result, the effectiveness and reliability of the approach is compromised.

Adaptive histograms cannot be conveniently mapped into a linear vector space because different histograms may have different bins. Although multidimensional scaling (MDS) [4] can be used to recover the Euclidean coordinates of the histograms from pairwise distances between them, it is computationally expensive to apply MDS on a large number of (say, more than 100) histograms. Moreover, MDS incurs an error in recovering the coordinates, further compromising the effectiveness of adaptive histograms in practical applications.

To address the above issues, this paper proposes a new dissimilarity measure for adaptive color histograms (Section 5) that is more reliable than the Euclidean distance and yet computationally less expensive than the Earth Mover’s Distance. Moreover, a mathematically sound definition of mean histogram can be defined for histogram clustering applications. Extensive test results (Section 6) show that the use of adaptive histograms produces the best overall performance, in terms of good accuracy, small number of bins, no empty bin, and efficient computation, compared to existing methods in histogram retrieval, classification, and clustering tasks.

Section snippets

Related work

There are two types of fixed binning schemes: regular partitioning and color space clustering. The first method simply partitions the axes of a target color space into regular intervals, thus producing rectangular bins. Typically, one of the three color axes is regarded as conveying more important information and is partitioned into more intervals than are the other two axes. For example, VisualSeek [35] partitions the HSV space into 18 × 3 × 3 color bins and 4 grey bins, producing 166 bins.

Adaptive binning

Adaptive binning of the colors in an image can be achieved by an appropriate vector quantization algorithm such as k-means clustering or its variants [24]. This section describes an adaptive variant of k-means that can automatically determine the appropriate number of clusters required. The algorithm can be summarized as follows:

Adaptive Clustering

  • Repeat

    • For each pixel p,

      • Find the nearest cluster k to pixel p.

      • If no cluster is found or distance dkpS,

        • create a new cluster with pixel p;

      • Else, if d

Overview of histogram similarity

Before discussing the mathematics of adaptive histograms, let us motivate the mathematical formulation by first describing a possible definition of similarity measure for adaptive color histograms. To begin, let us first consider two adaptive histograms H and H, each having only one bin located at c and c, with bin counts h and h, respectively. Let f(x) and f(x) denote the actual density distributions of colors in and around the two bins, where x denote 3D color coordinates. Then, the

Adaptive color histograms

An adaptive color histogram H=(n,C,H) is defined as a 3-tuple consisting of a set C of n bins with bin centroids ci, i=1,…,n, and a set H of corresponding bin counts hi>0. Given two adaptive histograms G=(m,{bi},{gi}) and H=(n,{ci},{hi}), define the weighted correlation G·H as in Eq. (6)G·H=∑i=1mj=1nw(bi,cj)gihj.

A histogram H can be normalized into H by dividing each bin count by the histogram norm ∥H∥=H·H. The similarity s(G,H) between G and H is then defined as the weighted correlation

Performance evaluation

Four types of tests were conducted to evaluate the performance of adaptive color histograms and weighted correlation dissimilarity measure: color retention, image retrieval, image classification, and image clustering.

Conclusions

This paper presented an adaptive color clustering method and a dissimilarity measure for comparing histograms with different binnings. The color clustering algorithm is an adaptive variant of the k-means clustering algorithm and it can determine the number of clusters required to adequately describe the colors in an image. The dissimilarity measure computes a weighted correlation between two histograms, and the weights are defined in terms of the volumes of intersection between overlapping

Acknowledgements

This research is supported by NUS ARF R-252-000-072-112 and NSTB UPG/98/015.

References (38)

  • J. Burbea et al.

    On the convexity of some divergence measures based on entropy functions

    IEEE Trans. Inf. Theory

    (1982)
  • I.J. Cox, M.L. Miller, S.O. Omohundro, P.N. Yianilos, PicHunter: Bayesian relevance feedback for image retrieval, in:...
  • C.Y. Fung, K.F. Loe, Learning primitive and scene semantics of images for classification and retrieval, in: Proc. ACM...
  • Y. Gong, G. Proietti, C. Faloutsos, Image indexing and retrieval based on human perceptual color clustering, in: Proc....
  • S.-S. Guan et al.

    Investigation of parametric effects using small colour differences

    Color Res. Appl.

    (1999)
  • J. Hafner et al.

    Efficient color histogram indexing for quadratic form distance functions

    IEEE Trans. PAMI

    (1995)
  • T. Indow

    Predictions based on Munsell notation. I. perceptual color differences

    Color Res. Appl.

    (1999)
  • H. Jeffreys

    Theory of Probability

    (1948)
  • L. Kaufmann et al.

    Clustering by means of medoids

  • Cited by (46)

    • Fisher encoding of differential fast point feature histograms for partial 3D object retrieval

      2016, Pattern Recognition
      Citation Excerpt :

      SQFD starts from Harris 3D keypoints to build a feature set composed of normalized local descriptors. Next, a local clustering algorithm [39] is applied to obtain a set of representative descriptors. Shape matching is performed with the signature quadratic form distance.

    • Fast K-means algorithm based on a level histogram for image retrieval

      2014, Expert Systems with Applications
      Citation Excerpt :

      The K-means algorithm is simple but requires considerable time spent in training groups, and the performance results depend on the merits of that group. Leow and Li (2004) defined a new dissimilarity measure that was more reliable than the Euclidean distance, yet computationally less expensive than the Earth Mover’s distance. Moreover, a mathematically sound definition of a mean histogram can be defined for histogram clustering applications.

    • Ptolemaic access methods: Challenging the reign of the metric space model

      2013, Information Systems
      Citation Excerpt :

      For each cloud, its center was generated at random, while other 10 000 points were generated under normal distribution around the center (the mean and variance in each dimension were adjusted to not generate points outside the unitary cube). Then an adaptive variant of the k-means clustering [50] was used to create 20–40 centroids representing the original data. The weight of each centroid corresponded to the number of points assigned to the centroid in the last iteration of the k-means clustering.

    • Data-aware 3D partitioning for generic shape retrieval

      2013, Computers and Graphics (Pergamon)
    • The effect of color in Airbnb listings on guest ratings

      2021, Advances in Hospitality and Tourism Research
    View all citing articles on Scopus
    View full text