The analysis and applications of adaptive-binning color histograms
Introduction
In content-based image retrieval systems, histograms are often used to represent the distributions of colors in images. There are two general methods of generating histograms: fixed binning and adaptive binning. Typically, a fixed-binning method induces histogram bins by partitioning the color space into rectangular bins [8], [9], [21], [25], [32], [35], [38]. Once the bins are derived, they are fixed and the same binning scheme is applied to all images. On the other hand, adaptive binning adapts to the actual distributions of colors in images [3], [11], [22], [27], [31]. As a result, different binnings are induced for different images.
It is a common understanding that adaptively binned histograms can represent the distributions of colors in images more efficiently than do histograms with fixed binning [11], [27], [31]. However, existing systems almost exclusively adopt fixed-binning histograms because among existing well-known dissimilarity measures, only the Earth Mover’s Distance (EMD) can compare histograms with different binnings [27], [31]. But, EMD is computationally more expensive than other dissimilarity measures because it requires an optimization process.
Another major concern is that fixed-binning histograms have been regarded as vectors in a linear vector space, with each bin representing a dimension of the space. This convenient vector interpretation makes it possible to apply various well-known algorithms, such as clustering, Principle Component Analysis, and Singular Value Decomposition to process and analyze histograms [10], [26], [34]. Unfortunately, this approach is not satisfactory because the algorithms are applied in a linear vector space, which assumes the Euclidean distance as the measure of vector difference. And Euclidean distance has been found to be less reliable than other measures for computing histogram dissimilarity [5], [27], [33]. As a result, the effectiveness and reliability of the approach is compromised.
Adaptive histograms cannot be conveniently mapped into a linear vector space because different histograms may have different bins. Although multidimensional scaling (MDS) [4] can be used to recover the Euclidean coordinates of the histograms from pairwise distances between them, it is computationally expensive to apply MDS on a large number of (say, more than 100) histograms. Moreover, MDS incurs an error in recovering the coordinates, further compromising the effectiveness of adaptive histograms in practical applications.
To address the above issues, this paper proposes a new dissimilarity measure for adaptive color histograms (Section 5) that is more reliable than the Euclidean distance and yet computationally less expensive than the Earth Mover’s Distance. Moreover, a mathematically sound definition of mean histogram can be defined for histogram clustering applications. Extensive test results (Section 6) show that the use of adaptive histograms produces the best overall performance, in terms of good accuracy, small number of bins, no empty bin, and efficient computation, compared to existing methods in histogram retrieval, classification, and clustering tasks.
Section snippets
Related work
There are two types of fixed binning schemes: regular partitioning and color space clustering. The first method simply partitions the axes of a target color space into regular intervals, thus producing rectangular bins. Typically, one of the three color axes is regarded as conveying more important information and is partitioned into more intervals than are the other two axes. For example, VisualSeek [35] partitions the HSV space into 18 × 3 × 3 color bins and 4 grey bins, producing 166 bins.
Adaptive binning
Adaptive binning of the colors in an image can be achieved by an appropriate vector quantization algorithm such as k-means clustering or its variants [24]. This section describes an adaptive variant of k-means that can automatically determine the appropriate number of clusters required. The algorithm can be summarized as follows:
Adaptive Clustering
Repeat
For each pixel p,
Find the nearest cluster k to pixel p.
If no cluster is found or distance dkp⩾S,
create a new cluster with pixel p;
Else, if d
Overview of histogram similarity
Before discussing the mathematics of adaptive histograms, let us motivate the mathematical formulation by first describing a possible definition of similarity measure for adaptive color histograms. To begin, let us first consider two adaptive histograms H and H′, each having only one bin located at and , with bin counts h and h′, respectively. Let and denote the actual density distributions of colors in and around the two bins, where denote 3D color coordinates. Then, the
Adaptive color histograms
An adaptive color histogram is defined as a 3-tuple consisting of a set of n bins with bin centroids , i=1,…,n, and a set of corresponding bin counts hi>0. Given two adaptive histograms and , define the weighted correlation G·H as in Eq. (6)
A histogram H can be normalized into by dividing each bin count by the histogram norm . The similarity s(G,H) between G and H is then defined as the weighted correlation
Performance evaluation
Four types of tests were conducted to evaluate the performance of adaptive color histograms and weighted correlation dissimilarity measure: color retention, image retrieval, image classification, and image clustering.
Conclusions
This paper presented an adaptive color clustering method and a dissimilarity measure for comparing histograms with different binnings. The color clustering algorithm is an adaptive variant of the k-means clustering algorithm and it can determine the number of clusters required to adequately describe the colors in an image. The dissimilarity measure computes a weighted correlation between two histograms, and the weights are defined in terms of the volumes of intersection between overlapping
Acknowledgements
This research is supported by NUS ARF R-252-000-072-112 and NSTB UPG/98/015.
References (38)
- et al.
Histograms analysis for image retrieval
Pattern Recogn.
(2001) - et al.
Entropy differential metric, distance and divergence measures in probability spaces: a unified approach
J. Multivariate Anal.
(1982) - et al.
A relevance feedback mechanism for content-based image retrieval
Infor. Proc. Manage.
(1999) - et al.
Color matching for image retrieval
Pattern Recogn. Lett.
(1995) - et al.
Content-based image retrieval using a composite color-shape approach
Inf. Process. Manage.
(1998) - et al.
On image classification: city images vs. landscapes
Pattern Recogn.
(1998) - IEC 61966-2.1. Default RGB Colour Space—sRGB. International Electrotechnical Commission, Geneva, Switzerland, 1999. See...
Billmeyer and Saltzman’s Principles of Color Technology
(2000)- et al.
Image retrieval using fuzzy evaluation of color similarity
Int. J. PR AI
(1994) - et al.
Modern Multidimensional Scaling
(1997)
On the convexity of some divergence measures based on entropy functions
IEEE Trans. Inf. Theory
Investigation of parametric effects using small colour differences
Color Res. Appl.
Efficient color histogram indexing for quadratic form distance functions
IEEE Trans. PAMI
Predictions based on Munsell notation. I. perceptual color differences
Color Res. Appl.
Theory of Probability
Clustering by means of medoids
Cited by (46)
Fisher encoding of differential fast point feature histograms for partial 3D object retrieval
2016, Pattern RecognitionCitation Excerpt :SQFD starts from Harris 3D keypoints to build a feature set composed of normalized local descriptors. Next, a local clustering algorithm [39] is applied to obtain a set of representative descriptors. Shape matching is performed with the signature quadratic form distance.
Fast K-means algorithm based on a level histogram for image retrieval
2014, Expert Systems with ApplicationsCitation Excerpt :The K-means algorithm is simple but requires considerable time spent in training groups, and the performance results depend on the merits of that group. Leow and Li (2004) defined a new dissimilarity measure that was more reliable than the Euclidean distance, yet computationally less expensive than the Earth Mover’s distance. Moreover, a mathematically sound definition of a mean histogram can be defined for histogram clustering applications.
Ptolemaic access methods: Challenging the reign of the metric space model
2013, Information SystemsCitation Excerpt :For each cloud, its center was generated at random, while other 10 000 points were generated under normal distribution around the center (the mean and variance in each dimension were adjusted to not generate points outside the unitary cube). Then an adaptive variant of the k-means clustering [50] was used to create 20–40 centroids representing the original data. The weight of each centroid corresponded to the number of points assigned to the centroid in the last iteration of the k-means clustering.
Data-aware 3D partitioning for generic shape retrieval
2013, Computers and Graphics (Pergamon)Optimal bin number for histogram binning method to calibrate binary probabilities.
2023, CEUR Workshop ProceedingsThe effect of color in Airbnb listings on guest ratings
2021, Advances in Hospitality and Tourism Research