Use of power law models in detecting region of interest

doi:10.1016/j.patcog.2007.01.004

Pattern Recognition

Volume 40, Issue 9, September 2007, Pages 2521-2529

https://doi.org/10.1016/j.patcog.2007.01.004 Get rights and content

Abstract

In this paper, we shall address the issue of semantic extraction of different regions of interest. The proposed approach is based on statistical methods and models inspired from linguistic analysis. Here, the models used are Zipf law and inverse Zipf law. They are used to model the frequency of appearance of the patterns contained in images as power law distributions. The use of these models allows to characterize the structural complexity of image textures. This complexity measure indicates a perceptually salient region in the image. The image is first partitioned into sub-images that are to be compared in some sense. Zipf or inverse Zipf law are applied to these sub-images and they are classified according to the characteristics of the power law models involved. The classification method consists in representing the characteristics of the Zipf and inverse Zipf model of each sub-image by a point in a representation space in which a clustering process is performed. Our method allows detection of regions of interest which are consistent with human perception, inverse Zipf law is particularly significant. This method has good performances compared to more classical detection methods. Alternatively, a neural network can be used for the classification phase.

Introduction

Automatic detection of regions of interest in images is one of the most critical problems in computer vision. For a human observer, detecting a perceptually important region in an image is a natural task which is done instantaneously, but for a machine it is far more difficult, as the machine lacks the cultural references and knowledge to identify the content of the scene. One of the causes for this difficulty is the subjective nature of the notion of region of interest (ROI). In the most general sense, a ROI, as its name suggests, is a part of the image for which the observer of the image shows interest. Of course, the interest shown by the observer in viewing the image is determined not only by the image itself, but also by the observer's own sensitivity. For a given image, different people could find different regions of interest. However, it can be said, in most cases, regions of interest generally have visually and structurally distinctive features than the rest of the image. Then some structural characteristics can be used to detect the ROI of an image without making hypotheses about the semantic content of the picture. The detection of the ROI consists in finding a region of the image which appears different from the background with respect to low-level features such as contrast, colour, region size and shape, distribution of contours or texture pattern. Different methods have been proposed to detect regions of interest in an image. Some are based on models of low-level human vision, such as the method proposed by Osberger and Maeder [1] which detects perceptually important regions on the image by building importance maps based on various visual characteristics. The method proposed by Itti et al. [2] is based on multiscale centre-surround contrast and the method presented by Syeda-Mahmood [3] uses a segmentation in homogenous colour regions. Other methods are based on different structural characteristics of the image without explicit reference to the human vision. The method proposed by Di Gesu et al. [4] uses symmetry transforms, Stentiford [5] uses dissimilarities in local neighbourhoods, Kadir and Brady [6] use a local measure of entropy to detect salient features in an image, Wang et al. [7] use a wavelet transform and Carlotto and Stein [8] use a fractal model.

In this paper, we shall use a new approach based on a statistical model which originates in linguistic analysis: Zipf law. This model proved efficient to represent the distribution of the frequencies of words in the form of a power law. It has been discovered in 1949 by Zipf [9] in written English texts, and Zipf distributions have been found in every natural language. Similar power laws have been put into evidence in different domains of human and natural sciences, such as income distribution by Pareto [10] and Gibrat [11], population of cities by Zipf [12] and Gabaix [13], distribution of biological species by Yule [14], distribution of galactic voids by Gaite and Manrubia [15], Internet traffic by Breslau et al. [16] and Huberman et al. [17] and in the structure of music by Manaris et al. [18] and of natural noises by Dellandrea et al. [19]. In the domain of digital images, Zipf law has been used by Vincent et al. [20] to evaluate the distortion of compressed images and by Caron et al. [21] to detect artificial objects in natural environments. The method described in this paper aims to detect the regions of interest in various digital images. A potential application is automatic determination of the ROI for JPEG2000 compression. Some restrictive constrains have been assumed so that a process is possible. We assume the ROI to be detected is a single connected region in the image. The ROI must both have a significant size and appear different from the background with respect to its structural complexity. Most often, the ROI is belonging to the foreground of the scene and contains more details than the background but our method also handles the case when the ROI is more uniform than the background.

The power law model will first be recalled, and then we will present the adaptation that is needed to apply it to image analysis. Then, the detection methods based on Zipf and inverse Zipf laws will be presented, and the influence of initial image segmentation will be discussed.

Section snippets

Power law models

The power law distribution model known as Zipf law or rank-frequency law has been determined empirically by Zipf. According to Zipf law, in a topologically ordered set of symbols like a text, the frequency of occurrence of the different symbol patterns, for example the words of the text, follows a power law. To be more precise, different patterns occur in the signal, they are n-tuples of symbols. Let us note as $P_{1}, P_{2}, \dots, P_{R}$ these patterns. The frequencies of those patterns depend on the signal,

Application to images

Some years ago, Lindsey and Strömberg [28] used the frequencies of simple features. In this study, the approach finds its base in some law that has been verified in different fields. We show both Zipf law and inverse Zipf law can be used for image analysis. To adapt the model to image analysis, it is necessary to define the equivalent of the notion of word in the case of images. In order to respect the usual topology that structure the plane of the image, we will use image patterns defined as

Detection using Zipf law

The ROI of the image can be defined as a part of the image whose structural content differs from the background of the picture. Power law models can provide an indication on the structural content of the picture, however, they provide no spatial information by themselves. Therefore, we are introducing a more local study followed by a global analysis of the local results. In order to reveal some local information, it is necessary to divide the image into sub-images as shown in Fig. 5. In each of

Detection using inverse Zipf law

Like Zipf law, inverse Zipf law can be used to detect regions of interest in an image. The image is again partitioned into sub-images and the characteristics of the inverse Zipf plots corresponding to the sub-image are represented graphically by points in a graph. The significant characteristics of an inverse Zipf plot we are using are its slope and the number of patterns which appear only once in the picture. On the graphical representation, the horizontal coordinate represents the slope and

Influence of initial segmentation

The size of the ROI depends on the size of the whole image. Here we are going to study the influence of the image size on the characteristics of its pattern distribution. Indeed, the size of the sub-images must be chosen properly. They must be large enough to present a statistically significant pattern distribution but also be small enough to allow a precise determination of the ROI. As an image always aims to show something, and to bring some information to the observer, we have chosen to

Experimental results

The method has been tested on a base of 100 digital photographic images containing a distinct ROI which has been determined by a human observer. The images have been gathered among different public image databases. The images are uncompressed or mildly compressed and their size varies from $256 \times 256$ to $2000 \times 1500$ . The images have been segmented into the optimal number of sub-images according to their size. The detection is considered to be successful if the subjective ROI is included in the region

Conclusion

Power law models such as Zipf and inverse Zipf laws can be used to characterize the structural complexity of an image. Our former nonconstrainable hypotheses show that the use of a method based on those models allows to detect automatically the ROI of an image, given that this ROI appears sufficiently distinct from the background. Both Zipf law and inverse Zipf law can be used to detect ROI, however, the method based on inverse Zipf law produces better results for our application. The method

About the Author—YVES CARON has received his Ph.D. in Computer Science in 2004 from University of Tours and is now working in a private company in the field of image analysis. He has worked on different applications of Zipf law in image analysis.

References (28)

V. Di Gesu et al.
Local operators to detect regions of interest
Pattern Recognition Lett.
(1997)
C.S. Lindsey et al.
Image classification using the frequencies of simple features
Pattern Recognition Lett.
(2000)
W. Osberger, A.J. Maeder, Automatic identification of perceptually important regions in an image, in: 11th...
L. Itti et al.
A model of saliency-based visual attention for rapid scene analysis
IEEE Trans. Pattern Anal. Mach. Intell.
(1998)
T.F. Syeda-Mahmood
Data and model-driven selection using colour regions
Int. J. Comput. Vision
(1997)
F. Stentiford, An estimator for visual attention through competitive novelty with application to image compression, in:...
T. Kadir et al.
Saliency, scale and image description
Int. J. Comput. Vision
(2001)
J.Z. Wang et al.
Unsupervised multiresolution segmentation for images with low depth of field
IEEE Trans. Pattern Anal. Mach. Intell.
(2001)
M.J. Carlotto et al.
A method for searching artificial objects on planetary surfaces
J. Br. Interplanet. Soc.
(1990)
G.K. Zipf
Human Behavior and the Principle of Least Effort
(1949)

V. Pareto

Cours d’économie politique

(1897)

R. Gibrat

Les Inégalites Économiques

(1931)

B.M. Hill

Zipf's law and prior distributions for the composition of a population

J. Am. Statist. Assoc.

(1970)

X. Gabaix

Zipf's law for cities: an explanation

Q. J. Econ.

(1999)

Cited by (22)

A review on the characterization of signals and systems by power law distributions
2015, Signal Processing
Citation Excerpt :
Researchers plotted a log–log chart of the decreasing frequencies of the patterns vs. their rank, and concluded that Zipf law was a good fit, despite the fact that two linear zones appeared in the graph. Caron et al. [10] concluded that both Zipf and the inverse Zipf laws could be used to detect ROI in images, since the differences observed in the log–log graphs of these distributions helped to understand better the details in an image. In Fig. 2, we observe that the cluster {EQ,FF,TO} corresponds to natural phenomena, and the cluster {IP,ID,AC,LC,RP,TE,TA} reflects cases involving human activities.
Power laws, also known as Pareto-like laws or Zipf-like laws, are commonly used to explain a variety of real world distinct phenomena, often described merely by the produced signals. In this paper, we study twelve cases, namely worldwide technological accidents, the annual revenue of America׳s largest private companies, the number of inhabitants in America׳s largest cities, the magnitude of earthquakes with minimum moment magnitude equal to 4, the total burned area in forest fires occurred in Portugal, the net worth of the richer people in America, the frequency of occurrence of words in the novel Ulysses, by James Joyce, the total number of deaths in worldwide terrorist attacks, the number of linking root domains of the top internet domains, the number of linking root domains of the top internet pages, the total number of human victims of tornadoes occurred in the U.S., and the number of inhabitants in the 60 most populated countries. The results demonstrate the emergence of statistical characteristics, very close to a power law behavior. Furthermore, the parametric characterization reveals complex relationships present at higher level of description.
A novel monochromatic cue for detecting regions of visual interest
2014, Image and Vision Computing
Finding regions of interest (ROIs) is a fundamentally important problem in the area of computer vision and image processing. Previous studies addressing this issue have mainly focused on investigating chromatic cues to characterize visually salient image regions, while less attention has been devoted to monochromatic cues. The purpose of this paper is the study of monochromatic cues, which have the potential to complement chromatic cues, for the detection of ROIs in an image. This paper first presents a taxonomy of existing ROI detection approaches using monochromatic cues, ranging from well-known algorithms to the most recently published techniques. We then propose a novel monochromatic cue for ROI detection. Finally, a comparative evaluation has been conducted on large scale challenging test sets of real-world natural scenes. Experimental results demonstrate that the use of our proposed monochromatic cue yields a more accurate identification of ROIs. This paper serves as a benchmark for future research on this particular topic and a steppingstone for developers and practitioners interested in adopting monochromatic cues to ROI detection systems and methodologies.
Visual search: A large-scale perspective
2013, Handbook of Statistics
Citation Excerpt :
In this section, we will motivate why we need large data for many interesting problems in computer vision. When dealing with large amounts of categorical data (such as words in dictionary) or distribution of integrated quantities (such as an area of connected components, count of patterns of pixels (Caron et al., 2007)), it has been empirically observed that they follow Zipf’s law (Newman, 2005). Zipf’s law is a discrete power law probability distribution, that is long tailed.
Advances in the proliferation of media devices as well as Internet technologies have generated massive image data sets and made them easier to access and share today. These large-scale data sources not only provide rich test beds for solving existing computer vision problems, but also pose a unique challenge for large-scale data processing that demands an effective information retrieval system to browse and search. This is also motivated by many real-world applications where visual search has been shown to offer compelling interfaces and functionalities by capturing visual attributes better than modalities such as audio, text, etc. In this chapter, we describe state-of-the-art techniques in large-scale visual search. Specifically, we outline each phase in a typical retrieval pipeline including information extraction, representation, indexing, and matching, and we focus on practical issues such as memory footprint and speed while dealing with large data sets. We tabulate several public data sets commonly used as benchmarks, along with their summary. The scope of data reveals a wide variety of potential applications for a vision-based retrieval system. Finally, we address several promising research directions by introducing some other core components that serve to improve the current retrieval system.
A saliency map based on sampling an image into random rectangular regions of interest
2012, Pattern Recognition
Citation Excerpt :
Several approaches to saliency which employ power laws have thus been proposed. A Zipf's law based saliency map has been proposed in [23]. This work is similar to those of [21,22], except that it operates directly on the pixels rather than on the spectral space.
In this article we propose a novel approach to compute an image saliency map based on computing local saliencies over random rectangular regions of interest. Unlike many of the existing methods, the proposed approach does not require any training bases, operates on the image at the original scale and has only a single parameter which requires tuning. It has been tested on the two distinct tasks of salient region detection (using MSRA dataset) and eye gaze prediction (using York University and MIT datasets). The proposed method achieves state-of-the-art performance on the eye gaze prediction task as compared with nine other state-of-the-art methods.
A review of power laws in real life phenomena
2012, Communications in Nonlinear Science and Numerical Simulation
Citation Excerpt :
They obtained a linear trace, concluding that this distribution fitted the number of distinct patterns, with respect to their frequencies. Caron et al. [20] concluded that both Zipf and the inverse Zipf laws could be used to detect ROI in image, since the differences observed in the log–log graphs of these distributions helped to understand better the details in an image. This section investigates the cumulative distributions of several random variables of different kinds.
Power law distributions, also known as heavy tail distributions, model distinct real life phenomena in the areas of biology, demography, computer science, economics, information theory, language, and astronomy, amongst others. In this paper, it is presented a review of the literature having in mind applications and possible explanations for the use of power laws in real phenomena. We also unravel some controversies around power laws.
Transition thresholds and transition operators for binarization and edge detection
2010, Pattern Recognition
Citation Excerpt :
They divide a gray-intensity image into several regions and decide how to binarize each region. ( 2) Caron et al. [3] detect region of interest characterizing each pixel with a gray-intensity template of 3×3, the frequency of which appears to obey a power law distribution. ( 3) Milewski et al. [4] presented a methodology for separating handwritten foreground pixels, from background in carbon copied medical forms.
The transition method for image binarization is based on the concept of t-transition pixels, a generalization of edge pixels, and t-transition sets. We introduce a novel unsupervised thresholding for unimodal histograms to estimate the transition sets. We also present dilation and incidence transition operators to refine the transition set. Afterward, we propose the simple edge transition operator for detecting edges. Our experiments show that the new approach increases the effectiveness of OCR applications outperforming several top-ranked binarization algorithms.

View all citing articles on Scopus

About the Author—PASCAL MAKRIS is actually working as Associate Professor in Laboratoire d’Informatique in University of Tours where he received his Ph.D. He is member of the Reconnaissance de Formes et Analyse d’Image team (RFIA). He is interested in Power Law use in the field of different phenomena study, sound and image. He works particularly in the field of medical images.

About the Author—NICOLE VINCENT is full Professor since 1996. She presently heads the Center of Research in Computer Science (CRIP5) and the team Systèmes Intelligents de Perception (SIP) in the university Paris Descartes–Paris 5. After studying in Ecole Normale Supérieure and graduation in Mathematics, Nicole Vincent received a Ph.D. in Computer Science in 1988 from Lyon Insa. She has been involved with several projects in pattern recognition, signal and image processing and video analysis. Her research interest concerns environmental perception by vision in control, security and for medical applications.

View full text

Use of power law models in detecting region of interest

Abstract

Introduction

Section snippets

Power law models

Application to images

Detection using Zipf law

Detection using inverse Zipf law

Influence of initial segmentation

Experimental results

Conclusion

Pattern Recognition Lett.

Pattern Recognition Lett.

A model of saliency-based visual attention for rapid scene analysis

IEEE Trans. Pattern Anal. Mach. Intell.

Data and model-driven selection using colour regions

Int. J. Comput. Vision

Saliency, scale and image description

Int. J. Comput. Vision

Unsupervised multiresolution segmentation for images with low depth of field