Clustered entropy for edge detection

doi:10.1016/j.matcom.2020.11.021

Mathematics and Computers in Simulation

Volume 182, April 2021, Pages 620-645

https://doi.org/10.1016/j.matcom.2020.11.021 Get rights and content

Abstract

The quantity of information in images can be evaluated by means of the Shannon entropy. When dealing with natural images with a large scale of gray levels, as well as with images containing textures or suffering some degradation such as noise or blurring, this measure tends to saturate. That is, it reaches high values due to a large amount of irrelevant information, making it useless for measuring significant information. In this paper we present a new information measure, the clustered entropy. This information measure, based on clustering local histograms, has a zero value for quasi-homogeneous regions and reaches high values for regions containing edges. The clustered entropy is used in this paper as an edge detector, by centering a sliding window on every pixel of an image, and calculating the clustered entropy of the corresponding histogram. A search for local maxima throughout the resulting matrix of entropies provides the final image of edges. The mathematical properties of clustered entropy are studied, a comparison between the clustered and the normal entropy is done, and some comparative experiments of edge detection are shown in this paper.

Introduction

In 1949 Claude E. Shannon proposed a measure of the information provided by a probabilistic experiment. In fact, the information obtained from an experiment is equal to the uncertainty eliminated upon accomplishing it. Let $X$ be a random experiment with $K + 1$ possible results $x_{0}, x_{1}, \dots, x_{K}^{}$ with respective occurrence probabilities $P = {p_{0}^{}, p_{1}^{}, \dots, p_{K}^{}}$ $p_{i} = P r (X = x_{i}) \geq 0, i = 0, 1, \dots, K, \sum_{i = 0}^{K} p_{i} = 1 .$ The Shannon entropy [7] is given by $H (P) = \sum_{i = 0}^{K} ϕ (p_{i}),$ where $ϕ (x) = - x log x$ if $x > 0$ , $0$ otherwise by convention.¹ By natural extension, the entropy of a distribution of absolute frequencies $P = {f_{0}, f_{1}, \dots, f_{K}}$ is defined as $H (P) = \sum_{i = 0}^{K} ϕ (p_{i}) = \frac{1}{N} (\sum_{i = 0}^{K} ϕ (f_{i}) - ϕ (N)), where p_{i} = \frac{f_{i}}{N} \forall i and N = \sum_{i = 0}^{K} f_{i},$ where appropriate, any distribution $P$ of probabilities (or absolute frequencies) could be considered as broadened to an unlimited gray scale in $Z$ by enlarging it with null probabilities (or null frequencies). The corresponding Shannon entropy can be defined as $H (P) = \sum_{i \in Z} ϕ (p_{i})$ .

Image segmentation is a main task in image processing. It is a process that divides the image into regions with a certain meaning for the observer and is commonly a required step in image analysis. Segmentation algorithms are generally based on two properties:

•
Discontinuity: An object in an image is distinguished from the background or other objects by means of a discontinuous change in the composition of gray levels along the frontier of the object.
•
Homogeneity: The objects of the scene are formed by internally homogeneous parts that correspond with regions of constant composition in the image.

In practice, these properties are rarely fulfilled due to several causes such as a variation of intensity in the interior of the objects, blurred contours and edges, a not uniform background, the presence of textures, noise, changes in brightness, degradation, etc.

A widely used technique as a preliminary step to achieving segmentation is edge detection. There are several methods of edge detection, many of them based on gradients. The gradient of a function in a point indicates the maximum variation direction of the function in the environment of that point, and the module of the gradient measures the magnitude of this variation. The contours of objects in an image should present a high gradient, while homogeneous regions present a low gradient [9], [13]. The main methods based on gradients are: the Roberts operator, based on a 2 × 2 mask; the Sobel operator, more unbiased and based on a 3 × 3 mask; the Canny filter [4], very popular and more sophisticated and robust against Gaussian noise.

Amongst methods not based on gradients it is worth to mentioning the Jensen–Shannon divergence $J S$ [2], [11], which uses a double sliding window. A window $W$ is divided in two equal subwindows $W_{1}$ and $W_{2}$ , for each position of the window $W$ the histograms $P_{1}, P_{2}$ of $W_{1}$ and $W_{2}$ respectively are calculated as absolute frequencies of gray levels, and the corresponding $J S$ divergence is computed. A local maxima search in the obtained divergence matrix² provides an image of the edges.

The unweighted Jensen–Shannon divergence between two probability distributions $P$ and $Q$ is defined by $J S = H (\frac{1}{2} P + \frac{1}{2} Q) - \frac{1}{2} H (P) - \frac{1}{2} H (Q)$ where $H (\cdot)$ is the Shannon entropy.

In image processing, the entropy of the gray-level normalized³ histogram of a homogeneous sample of the image should be low (less information), whilst in non-homogeneous samples, in particular those containing edges between different regions, the entropy should be high (more information).

Since the Jensen–Shannon divergence is an entropic measure useful to obtain an image segmentation, a question that arises in natural way is: why not use the local entropy of the histogram (the entropy of a window centered in a pixel) as an edge detector? The following arguments can be given:

•
$H$ is computationally less expensive than the $J S$ divergence,
•
it only requires a single sliding window,
•
it is invariant against permutations and displacements of gray levels,
•
it has a value of zero in homogeneous zones, and a high value in regions with several gray levels (namely containing edges).

These reasons suggest that $H$ could be used as an edge detector. Fig. 1 shows such an experiment. The synthetic image (a) is the original, in which all the regions are homogeneous, that is, they have a unique gray level.⁴ Image (b) is a graphical representation of entropy values of a circular sliding window of radius 3 centered on every pixel of the original image; a white pixel means zero entropy and black pixel means maximum entropy. All the entropy images in this paper have been normalized so that the black pixels correspond to the maximum value found in the matrix. As shown, the edges have been clearly detected. Moreover, the entropy along the edges has a high value in all of them, regardless of the gray levels of the adjacent regions. This does not happen when using edge detectors based on gradient operators, which increase with the difference between gray levels.

It turns out, however, that the regions are in practice not completely homogeneous in most cases, and in consequence the entropy of a sliding window can be not so useful as in the above example. Fig. 2 shows such a situation. Images (a) and (b) are the same as in Fig. 1, but affected by Gaussian noise (zero mean and deviation 5 and 10 respectively). Images (c) and (d) are the corresponding entropy matrices. In both cases a circular sliding window of radius 3 was used. It can be seen that a human observer could still see without any difficulty the same regions as in original image. However, the matrices of entropies in images (c) and (d) are now very different: high entropies arise everywhere, making the edge detection more difficult than in the previous experiment. In the case of (d), the edge detection becomes impossible. As a curiosity, the lighter background (lower entropies) in the upper central circle in the entropy matrices is due to a very simple reason: the original circle is white, so that the Gaussian noise is affecting only half of pixels inside, thus allowing less variety of resulting gray levels, hence a lower entropy.

Noise is not the only factor that could make the entropy grow, this can also occur with any scattering of gray levels, such as shading, as shown in Fig. 3. The background in image (a) is a continuous shading in which the gray level of every pixel is given by its $x$ -coordinate; the central band is again composed of vertical lines of lighter gray levels that increase from left to right. As can be seen in the matrix of entropies shown in image (b), all the sliding windows gave exactly the same entropy regardless of the location, thus making any edge detection impossible.

In general, what happens in the experiments shown in Fig. 2, Fig. 3 can be called ‘saturation’ of the entropy due to the gray level scattering. In other words, the entropy of a sliding window can reach a high value if the window contains a wide variety of gray levels, which can be due to two main reasons: (1) the presence of different objects, and (2) the scattering of a unique gray level into several levels. In the first case the sliding window is near an edge and we expect to achieve a high entropy, but in the second case, the entropy, that we expect to be low, becomes saturated and is useless for detecting edges.

A possible solution to this problem could be the reduction of the scale in a adequate number of levels, also known as quantization. Another possible solution consists in amending the dispersion of the original gray level without loss of details or underlined shapes in the image. This alteration must gather the frequencies of the disturbed gray levels into a unique frequency of a representative level in each region.

This study presents a form for solving the problem of the saturation in the second case, making it worth using the entropy as an edge detector. The structure of the paper is as follows. Section 2 describes the CLOSE [1] algorithm to correct gray level scattering. Section 3 defines the clustered entropy and analyzes some of its properties. Section 4 presents a new technique for edge detection by means of clustered entropy, and Section 5 presents some illustrative experiments. Conclusions are in Section 6.

Section snippets

Gray level histogram clustering

In this Section the CLOSE (Clustering by LOcal SEparation) algorithm is described. It is a method to partition a histogram (either normalized or not) of gray levels into a set of sub-histograms called clusters, depending on a parameter called the scattering limit. Once an original histogram has been partitioned, each cluster in the set can be transformed in a degenerate⁵ sub-histogram having its unique non-zero

Clustered entropy

In this Section, a new measure of information is presented, by using the CLOSE algorithm. The term clustered entropy has been already used with a related sense [14]. Here, the clustered entropy of a given histogram is defined as the Shannon entropy of the corresponding clustered histogram (see Section 2). Since a clustered histogram contains no scattering and thus gives the same ‘relevant’ information as the unclustered one, the clustered entropy can be used as an unsaturated measure of

The clustered entropy as an edge detector

From the experiments shown in Fig. 5, Fig. 6 we can see that the clustered entropy can improve the classic entropy, in the sense that it is a more robust measure of information against some effects such as blurring, noise and shadows. In this Section, the clustered entropy of some samples in real images is compared with the unclustered entropy. Then, an algorithm to detect edges in images, based on a search for local maxima of clustered entropies, is presented.

Experiments

Experiments in this Section are divided into two parts: synthetic and natural images. In all of them, the results are shown for four different methods to facilitate visual comparison: the Sobel operator [9], [13], the Canny filter [4], the $J S$ detector [2], [6], [10], and the clustered entropy $C H$ . In all the experiments, parameter values have been tested and finally chosen to achieve a reasonably good result, intending to get an optimal balance between false positives and false negatives. For

Conclusions

The clustered entropy is presented in this paper as a new measure in the field of Information Theory that can be used in many fields of Signal Processing. Here it has been applied for edge detection in gray-level images. Given a probability distribution or histogram of an image sample, a clustered histogram is obtained by grouping gray levels. Then, the clustered entropy is defined as the Shannon entropy of the clustered histogram.

Mathematical properties of the clustered entropy are

Acknowledgment

This work has been partially supported by the Spanish Ministry for Science, Innovation and Universities under grant FIS2017-90102-R.

Authors thank Angela Tate for the English revision of the paper, as well as the anonymous reviewers for their careful reading of our manuscript and their many insightful comments and suggestions.

References (16)

FawcettT.
An introduction to ROC analysis
Pattern Recognit. Lett.
(2006)
Gómez-LoperaJ.F. et al.
Entropic image segmentation of sessile drops over patterned acetate
Math. Comput. Simulation
(2015)
ZhangY.J.
A survey on evaluation methods for image segmentation
Pattern Recognit.
(1996)
Atae-AllahZ. et al.
A filter to remove Gaussian noise by clustering the gray scale
J. Math. Imaging Vis.
(2002)
Barranco-LópezV. et al.
Entropic texture-edge detection for image segmentation
Electron. Lett.
(1995)
BrodatzP.
Textures: A Photographic Album for Artists and Designers
(1966)
CannyJ.
A computational approach to edge detection
IEEE Trans. Pattern Recognit. Mach. Intell.
(1986)
GrayM.
Entropy and Information Theory
(2011)

There are more references available in the full text version of this article.

Cited by (0)

View full text

Original articlesClustered entropy for edge detection

Abstract

Introduction

Section snippets

Gray level histogram clustering

Clustered entropy

The clustered entropy as an edge detector

Experiments

Conclusions

Acknowledgment

Pattern Recognit. Lett.

Math. Comput. Simulation

Pattern Recognit.

A filter to remove Gaussian noise by clustering the gray scale

J. Math. Imaging Vis.

Entropic texture-edge detection for image segmentation

Electron. Lett.

Textures: A Photographic Album for Artists and Designers

A computational approach to edge detection

IEEE Trans. Pattern Recognit. Mach. Intell.

Entropy and Information Theory

Original articles
Clustered entropy for edge detection