Finding a small number of regions in an image using low-level features

doi:10.1016/S0031-3203(01)00230-8

Pattern Recognition

Volume 35, Issue 11, November 2002, Pages 2323-2339

https://doi.org/10.1016/S0031-3203(01)00230-8 Get rights and content

Abstract

Many computer vision applications, such as object recognition, active vision, and content-based image retrieval (CBIR) could be made both more efficient and effective if the objects of most interest could be segmented easily from the background. In this paper, we discuss how to compute and process low-level features in order to obtain “reasonable” regions for the putative objects. This process is a precursor to the detection of salient objects in an image, a subject discussed in a companion report [1]. Although considerable work has been done on image segmentation, there still does not exist an “off-the-shelf” solution applicable to all types of images. A major issue has been the lack of a good measure of quality of a particular segmentation. In this paper, three different measures are considered: the non-parametric measure (NP) proposed by Pauwels and Frederix [2], the modified Hubert index (MH) [3], and a threshold-based measure with a manually selected threshold. From the experimental results, we have found that the simple threshold-based measure gave consistently better results than the other two more complex, statistically based measures. The particular image segmentation method we have employed in this study is also described in detail.

Introduction

The seemingly straightforward and effortless human task of segmenting objects from their background is extremely difficult to simulate in the computer. Both Ullman [4] and Marr [5] raise the question about the actual goal of segmentation, particularly in a bottom-up manner. Marr asks: “What, for example, is an object, and what makes it so special that it should be recoverable as a region in an image? Is a nose an object? Is a head one? …” They both conclude that it is extremely difficult, if not impossible, either to formulate what should be recovered as a region from an image or to separate complete objects, such as a car or a house, from a complex scene. Although the problem of the unclear definition of an object or the goal of segmentation seems to be unsolvable, the task of object detection and recognition is performed smoothly and accurately within the human visual system, with no sign of ambiguity.

Many computer vision applications, such as object recognition [6], [7], active vision [8], and content-based image retrieval (CBIR) [9], [10] could be made both more efficient and effective if objects of interest were first segmented from their background. In the case of object recognition, especially in a complex scene, the recognition process could be made more efficient and robust even if only a rough estimate of the location and size of the salient objects was obtained [6]. As stated in Ref. [11], the performance of such an object-based attention system depends largely on the quality of the initial segmentation.

In light of the above, this paper is concerned with finding a “small” number of regions in a color image using low-level features. This process is a precursor to the detection of the salient objects in the image [1]. To achieve appropriate color image segmentation, a set of biologically motivated feature maps is extracted from the image. In order to maintain generality of use, no context-dependent information is assumed and an object is defined simply as a coherent and homogenous region. For a particular application, if higher-level, top-down information is known a priori, this information could be used to group the regions into logical entities that resemble the physical objects.

A considerable amount of research has addressed the image segmentation problem. However, there still does not exist an “off-the-shelf” solution applicable to all types of images. One of the major issues has been the lack of a good measure of the quality of a particular segmentation. In this paper, three different measures are considered, and we have found that a simple threshold-based measure with a manually selected threshold, obtained by extensive experimentation, gave consistently better results than other more complex, statistically based measures.

Parameters are a significant aspect of any mathematical formulation of an algorithm. However, rarely can they be obtained through theoretical arguments. In these circumstances, the optimum values for parameters depend on subjective judgements, such as the degree to which a region approximates an observed object in a scene. To reduce this type of bias, systematic and extensive experimentation has been performed to find suitable parameter values. In the future, a neural net approach could be adopted.

The paper is organized as follows. Section 2 is a short review of the background behind image segmentation. In Section 3, we discuss the color and texture features. The fusion of the feature spaces in order to improve perceptual uniformity will also be presented. Section 4 discusses the particular image segmentation method we employ, as well as some important implementation. Parameter choices are often the bugaboo of image segmentation algorithms. This is discussed in Section 5 along with some experiments we have done to show the usefulness of our approach. We conclude the paper in Section 6.

Section snippets

Background

The first theory for explaining perceptual grouping was the Gestalt theory proposed by Wertheimer in 1912 [12]. He proposed that there is a tendency for humans to seek the most unambiguous and simple interpretation of the visual world. Although introduced at the beginning of the 20th century, this principle remains valid and is the basis for most grouping methods. Only a relatively few aspects of the Gestalt theory have been incorporated into computer vision systems, such as similarity,

Feature selection

Before an image can be segmented, it must be transformed into a set of feature maps that permit similarity and surface continuity to be defined. The most commonly used features are color [21], texture [27], and position [28]. Since the objective of segmentation is to achieve image regions that are meaningful to humans, the feature space should also be perceptually uniform.¹

Non-parametric density estimation for image clustering

The segmentation method used in this research follows the non-parametric clustering approach in [2], [21]. It is based on estimating the underlying density of the datum points and allocating each point to one of the identified populations. If the form and number of underlying population densities can be determined in advance, parametric density estimation methods could be used; otherwise, non-parametric density estimation methods are required instead.

Non-parametric clustering begins with the

Weights for colour, texture, and position

The purpose of imposing weighting factors for colour, texture, and position features is to normalize their dynamic range and to improve the perceptual uniformity of the combined feature space. Obviously, it would be preferable to evaluate the perceptual differences among these features through psychophysical experiments. However, this was beyond the scope of this paper. Alternatively, these weights can be determined by finding a parameter set that produces the best overall segmentation results.

Concluding remarks

The objective of this paper was to examine the problem of finding a “small” number of regions in a colour image using low-level features, with particular emphasis on practical algorithms for content-based image retrieval. Finding such regions is a first step in the detection of the salient objects in the image, then followed by object recognition. To mimic the perceptual grouping mechanism in the human visual system, a number of biologically motivated features for representing the visual

About the Author—HANG FAI LAU received the B.Eng. and M.Eng. degrees in Electrical and Computer Engineering from McGill University in 1997 and 2000, respectively. Since 2000 he has been working at VisionSphere Technologies in software development. His current interest is face recognition.

References (42)

E.J. Pauwels et al.
Finding salient regions in images
J. Comput. Vision Image Understand
(1999)
N.R. Pal et al.
A review on image segmentation techniques
Pattern Recognition
(1993)
P. Lambert et al.
Symbolic fusion of luminance-hue-chroma features for region segmentation
Pattern Recognition
(1999)
S.H. Park et al.
Color image segmentation based on 3-d clustering: morphological approach
Pattern Recognition
(1998)
D.L. Pham et al.
An adaptive fuzzy c-means algorithm for image segmentation in the presence of intensity inhomogeneities
Pattern Recognition Lett.
(1999)
A.K. Jain et al.
Unsupervised texture segmentation using gabor filters
Pattern Recognition
(1991)
A.R. Rao et al.
Identifying high level features of texture perception
CVGIP: Graph. Models Image Process.
(1993)
A.H.F. Lau, M.D. Levine, Finding perceptually salient “objects” using low-level features, internal report,...
A.K. Jain et al.
Algorithms for clustering data
(1988)
S. Ullman, High-level vision: object recognition and visual cognition. The MIT Press, Cambridge, MA, 1998, pp. 234–235...

D. Marr, Vision: a computational investigation into the Human presentation and processing of visual information W.H....

W.E.L Grimson, The combinatorics of object recognition in cluttered environments using constrained search, in:...

M. Lades

Face recognition technology

H. Christensen et al.

Active Robot Vision

(1993)

J. Ashley et al.

Automatic and semiautomatic methods for image annotation and retrieval in query by image content (qbic)

SPIE

(1995)

H. Greenspan, S. Belongie, C. Carson, J. Malik, Recognition of images in large databases using color and texture,...

W. Osberger, A.J. Maeder, Automatic identification of perceptually important regions in an image, in: ICPR’98,...

M. Wertheimer

Experimentelle studien “uiber des sehen von bewegung

Z. Psychol.

(1912)

S. Sarkar et al.

Perceptual organization in computer vision: A review and a proposal for a classificatory structure

IEEE Trans. SMC

(1993)

S. Belongie, J. Malik, Finding boundaries in natural images: a new method using point descriptors and area completion,...

Y. Deng, B.S. Manjunath, H. Shin, Color image segmentation, in: Proceedings of IEEE Conference on Computer Vision and...

Cited by (28)

YOLOx model-based object detection for microalgal bioprocess
2023, Algal Research
Microalgae can fix carbon dioxide from flue gases and utilize nutrients from wastewater while producing valuable biomass production of biofuels, chemicals, and fertilizers. Microalgae detection is of great significance for identifying mixed algae species in nature. Unfortunately, current manual detection methods are time-consuming, low in precision, and poor in universality. A YOLOx-s-based microalgae detection method was first proposed in this study, which uses a multi-scale and multi-morphology microalgae (Chlorella, Scenedesmus, and Spirulina species) image dataset as model input. When Focal Loss was selected for the classification loss, the number imbalance problem in microalgal detection was solved. As DIoU Loss was employed in regression loss, tiny-scale microalgae were well detected with shortened 20.24 % processing time, and the ASFF module benefited the overall model performance by fusing features. The Precision and Recall of improved YOLOx-s in detecting microalgae achieved 95.93 % and 93.48 %, and the mAP was improved by 3.33 % compared to the original YOLOx-s.
Combining boundary and region features inside the combinatorial pyramid for topology-preserving perceptual image segmentation
2012, Pattern Recognition Letters
Citation Excerpt :
Natural images are generally composed of physically disjoint objects whose associated groups of image pixels may not be visually uniform. Hence, it is very difficult to formulate what should be recovered as a region or boundary from an image or to separate complex objects from a natural scene (Lau and Levine, 2002). With the aim of organizing low-level image features into higher level relational structures, the perceptual organization of the image content is usually thought as a process of grouping visual information into a hierarchy of levels of abstraction.
Combinatorial pyramids represent the image as a stack of successively reduced combinatorial maps, which encode the whole image at different levels of abstraction. Within this framework, this paper proposes to conduct the perceptual organization of the image content in two consecutive stages. The first stage builds the lower set of levels of the hierarchy according to simple face (regions) features (colour and size). On the top of this hierarchy, the second stage will mainly employ boundary features, encoded in the darts of the combinatorial maps, to obtain a second set of levels of abstraction. The Berkeley data set BSDS300 is used to quantitatively compare the performance of the proposal to a number of perceptual grouping approaches, showing that it yields better or similar results than most of these algorithms while offering two interesting features: computation at multiple image resolutions and preservation of the image topology.
Unbalanced region matching based on two-level description for image retrieval
2005, Pattern Recognition Letters
Research on integrating spatial information into content-based image retrieval (CBIR) is aimed at solving the problem caused by global feature based algorithm. Most systems derive the spatial information from image segmentation. However, the description of images based on one-level segmentation (OLD) and the inevitable inaccuracy of segmentation results seriously limit the performance. A two-level description (TLD) describes images by a rough description and a detailed description to avoid improper spatial constraint caused by OLD is proposed. Similarity measurement based on unbalanced region matching (URM) is introduced to take advantage of TLD to reduce the influence of segmentation. A novel spatial descriptor integrating shape, size, and density as well as position and spatial layout information together is also proposed. The performance of the integrated system is illustrated by experimental results with 1000 query images randomly selected from a database of 10,000 general-purpose images.
Saliency Preserving Image Retargeting Using Probability Density
2022, IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)
Based on Fast-RCNN Multi Target Detection of Crop Diseases and Pests in Natural Light
2021, Lecture Notes on Data Engineering and Communications Technologies
Reliable and accurate wheel size measurement under highly reflective conditions
2018, Sensors (Switzerland)

View all citing articles on Scopus

About the Author—MARTIN D. LEVINE received the B.Eng. and M.Eng. degrees in Electrical Engineering from McGill University, Montreal, in 1960 and 1963, respectively, and the Ph.D. degree in Electrical Engineering from the Imperial College of Science and Technology, University of London, London, England, in 1965. He is currently a Professor in the Department of Electrical and Computer Engineering, McGill University and served as the founding Director of the McGill Center for Intelligent Machines (CIM) from 1986 to 1998. During 1972–1973 he was a member of the Technical Staff at the Image Processing Laboratory of the Jet Propulsion Laboratory, Pasadena, CA. During the 1979–1980 academic year, he was a Visiting Professor in the Department of Computer Science, Hebrew University, Jerusalem, Israel.

His research interests include computer vision, image processing and artificial intelligence, and he has numerous publications to his credit on these topics. As well, he has consulted for various government agencies and industrial organizations in these areas. Dr. Levine is a founding partner of AutoVu Technologies Inc. and VisionSphere Technologies Inc. for which he is the Chief Scientific Officer. He is a member of the Scientific Board of ART Advanced Research Technologies Inc. and AutoVu Technologies Inc.

Dr. Levine has authored the book entitled Vision in Man and Machine and has coauthored Computer Assisted Analyses of Cell Locomotion and Chemotaxis. Dr. Levine is on the Editorial Board of the journal Computer Vision and Understanding, having also served on the Editorial Boards of the IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE and Pattern Recognition. He was the Editor of the Plenum Book Series on Advances in Computer Vision and Machine Intelligence. He was the General Chairman of the Seventh International Conference on Pattern Recognition held in Montreal during the summer of 1984 and served as President of the International Association of Pattern Recognition during 1988–1990. He was also the founding President of the Canadian Image Processing and Pattern Recognition Society.

Dr. Levine was elected as a Fellow of the Canadian Institute for Advanced Research in 1984. During the period 1990–96 he served as a CIAR/PRECARN Associate. He is a Fellow of the IEEE and the International Association for Pattern Recognition. Dr. Levine was presented with the 1997 Canadian Image Processing and Pattern Recognition Society Service Award for his outstanding contributions to research and education in Computer Vision.

View full text

Finding a small number of regions in an image using low-level features

Abstract

Introduction

Section snippets

Background

Feature selection

Non-parametric density estimation for image clustering

Weights for colour, texture, and position

Concluding remarks

J. Comput. Vision Image Understand

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition Lett.

Pattern Recognition

CVGIP: Graph. Models Image Process.

Algorithms for clustering data

Face recognition technology

Active Robot Vision

Automatic and semiautomatic methods for image annotation and retrieval in query by image content (qbic)

SPIE

Experimentelle studien “uiber des sehen von bewegung

Z. Psychol.

Perceptual organization in computer vision: A review and a proposal for a classificatory structure

IEEE Trans. SMC