Finding a small number of regions in an image using low-level features
Introduction
The seemingly straightforward and effortless human task of segmenting objects from their background is extremely difficult to simulate in the computer. Both Ullman [4] and Marr [5] raise the question about the actual goal of segmentation, particularly in a bottom-up manner. Marr asks: “What, for example, is an object, and what makes it so special that it should be recoverable as a region in an image? Is a nose an object? Is a head one? …” They both conclude that it is extremely difficult, if not impossible, either to formulate what should be recovered as a region from an image or to separate complete objects, such as a car or a house, from a complex scene. Although the problem of the unclear definition of an object or the goal of segmentation seems to be unsolvable, the task of object detection and recognition is performed smoothly and accurately within the human visual system, with no sign of ambiguity.
Many computer vision applications, such as object recognition [6], [7], active vision [8], and content-based image retrieval (CBIR) [9], [10] could be made both more efficient and effective if objects of interest were first segmented from their background. In the case of object recognition, especially in a complex scene, the recognition process could be made more efficient and robust even if only a rough estimate of the location and size of the salient objects was obtained [6]. As stated in Ref. [11], the performance of such an object-based attention system depends largely on the quality of the initial segmentation.
In light of the above, this paper is concerned with finding a “small” number of regions in a color image using low-level features. This process is a precursor to the detection of the salient objects in the image [1]. To achieve appropriate color image segmentation, a set of biologically motivated feature maps is extracted from the image. In order to maintain generality of use, no context-dependent information is assumed and an object is defined simply as a coherent and homogenous region. For a particular application, if higher-level, top-down information is known a priori, this information could be used to group the regions into logical entities that resemble the physical objects.
A considerable amount of research has addressed the image segmentation problem. However, there still does not exist an “off-the-shelf” solution applicable to all types of images. One of the major issues has been the lack of a good measure of the quality of a particular segmentation. In this paper, three different measures are considered, and we have found that a simple threshold-based measure with a manually selected threshold, obtained by extensive experimentation, gave consistently better results than other more complex, statistically based measures.
Parameters are a significant aspect of any mathematical formulation of an algorithm. However, rarely can they be obtained through theoretical arguments. In these circumstances, the optimum values for parameters depend on subjective judgements, such as the degree to which a region approximates an observed object in a scene. To reduce this type of bias, systematic and extensive experimentation has been performed to find suitable parameter values. In the future, a neural net approach could be adopted.
The paper is organized as follows. Section 2 is a short review of the background behind image segmentation. In Section 3, we discuss the color and texture features. The fusion of the feature spaces in order to improve perceptual uniformity will also be presented. Section 4 discusses the particular image segmentation method we employ, as well as some important implementation. Parameter choices are often the bugaboo of image segmentation algorithms. This is discussed in Section 5 along with some experiments we have done to show the usefulness of our approach. We conclude the paper in Section 6.
Section snippets
Background
The first theory for explaining perceptual grouping was the Gestalt theory proposed by Wertheimer in 1912 [12]. He proposed that there is a tendency for humans to seek the most unambiguous and simple interpretation of the visual world. Although introduced at the beginning of the 20th century, this principle remains valid and is the basis for most grouping methods. Only a relatively few aspects of the Gestalt theory have been incorporated into computer vision systems, such as similarity,
Feature selection
Before an image can be segmented, it must be transformed into a set of feature maps that permit similarity and surface continuity to be defined. The most commonly used features are color [21], texture [27], and position [28]. Since the objective of segmentation is to achieve image regions that are meaningful to humans, the feature space should also be perceptually uniform.1
Non-parametric density estimation for image clustering
The segmentation method used in this research follows the non-parametric clustering approach in [2], [21]. It is based on estimating the underlying density of the datum points and allocating each point to one of the identified populations. If the form and number of underlying population densities can be determined in advance, parametric density estimation methods could be used; otherwise, non-parametric density estimation methods are required instead.
Non-parametric clustering begins with the
Weights for colour, texture, and position
The purpose of imposing weighting factors for colour, texture, and position features is to normalize their dynamic range and to improve the perceptual uniformity of the combined feature space. Obviously, it would be preferable to evaluate the perceptual differences among these features through psychophysical experiments. However, this was beyond the scope of this paper. Alternatively, these weights can be determined by finding a parameter set that produces the best overall segmentation results.
Concluding remarks
The objective of this paper was to examine the problem of finding a “small” number of regions in a colour image using low-level features, with particular emphasis on practical algorithms for content-based image retrieval. Finding such regions is a first step in the detection of the salient objects in the image, then followed by object recognition. To mimic the perceptual grouping mechanism in the human visual system, a number of biologically motivated features for representing the visual
About the Author—HANG FAI LAU received the B.Eng. and M.Eng. degrees in Electrical and Computer Engineering from McGill University in 1997 and 2000, respectively. Since 2000 he has been working at VisionSphere Technologies in software development. His current interest is face recognition.
References (42)
- et al.
Finding salient regions in images
J. Comput. Vision Image Understand
(1999) - et al.
A review on image segmentation techniques
Pattern Recognition
(1993) - et al.
Symbolic fusion of luminance-hue-chroma features for region segmentation
Pattern Recognition
(1999) - et al.
Color image segmentation based on 3-d clustering: morphological approach
Pattern Recognition
(1998) - et al.
An adaptive fuzzy c-means algorithm for image segmentation in the presence of intensity inhomogeneities
Pattern Recognition Lett.
(1999) - et al.
Unsupervised texture segmentation using gabor filters
Pattern Recognition
(1991) - et al.
Identifying high level features of texture perception
CVGIP: Graph. Models Image Process.
(1993) - A.H.F. Lau, M.D. Levine, Finding perceptually salient “objects” using low-level features, internal report,...
- et al.
Algorithms for clustering data
(1988) - S. Ullman, High-level vision: object recognition and visual cognition. The MIT Press, Cambridge, MA, 1998, pp. 234–235...
Face recognition technology
Active Robot Vision
Automatic and semiautomatic methods for image annotation and retrieval in query by image content (qbic)
SPIE
Experimentelle studien “uiber des sehen von bewegung
Z. Psychol.
Perceptual organization in computer vision: A review and a proposal for a classificatory structure
IEEE Trans. SMC
Cited by (28)
YOLOx model-based object detection for microalgal bioprocess
2023, Algal ResearchCombining boundary and region features inside the combinatorial pyramid for topology-preserving perceptual image segmentation
2012, Pattern Recognition LettersCitation Excerpt :Natural images are generally composed of physically disjoint objects whose associated groups of image pixels may not be visually uniform. Hence, it is very difficult to formulate what should be recovered as a region or boundary from an image or to separate complex objects from a natural scene (Lau and Levine, 2002). With the aim of organizing low-level image features into higher level relational structures, the perceptual organization of the image content is usually thought as a process of grouping visual information into a hierarchy of levels of abstraction.
Unbalanced region matching based on two-level description for image retrieval
2005, Pattern Recognition LettersSaliency Preserving Image Retargeting Using Probability Density
2022, IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)Based on Fast-RCNN Multi Target Detection of Crop Diseases and Pests in Natural Light
2021, Lecture Notes on Data Engineering and Communications TechnologiesReliable and accurate wheel size measurement under highly reflective conditions
2018, Sensors (Switzerland)
About the Author—HANG FAI LAU received the B.Eng. and M.Eng. degrees in Electrical and Computer Engineering from McGill University in 1997 and 2000, respectively. Since 2000 he has been working at VisionSphere Technologies in software development. His current interest is face recognition.
About the Author—MARTIN D. LEVINE received the B.Eng. and M.Eng. degrees in Electrical Engineering from McGill University, Montreal, in 1960 and 1963, respectively, and the Ph.D. degree in Electrical Engineering from the Imperial College of Science and Technology, University of London, London, England, in 1965. He is currently a Professor in the Department of Electrical and Computer Engineering, McGill University and served as the founding Director of the McGill Center for Intelligent Machines (CIM) from 1986 to 1998. During 1972–1973 he was a member of the Technical Staff at the Image Processing Laboratory of the Jet Propulsion Laboratory, Pasadena, CA. During the 1979–1980 academic year, he was a Visiting Professor in the Department of Computer Science, Hebrew University, Jerusalem, Israel.
His research interests include computer vision, image processing and artificial intelligence, and he has numerous publications to his credit on these topics. As well, he has consulted for various government agencies and industrial organizations in these areas. Dr. Levine is a founding partner of AutoVu Technologies Inc. and VisionSphere Technologies Inc. for which he is the Chief Scientific Officer. He is a member of the Scientific Board of ART Advanced Research Technologies Inc. and AutoVu Technologies Inc.
Dr. Levine has authored the book entitled Vision in Man and Machine and has coauthored Computer Assisted Analyses of Cell Locomotion and Chemotaxis. Dr. Levine is on the Editorial Board of the journal Computer Vision and Understanding, having also served on the Editorial Boards of the IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE and Pattern Recognition. He was the Editor of the Plenum Book Series on Advances in Computer Vision and Machine Intelligence. He was the General Chairman of the Seventh International Conference on Pattern Recognition held in Montreal during the summer of 1984 and served as President of the International Association of Pattern Recognition during 1988–1990. He was also the founding President of the Canadian Image Processing and Pattern Recognition Society.
Dr. Levine was elected as a Fellow of the Canadian Institute for Advanced Research in 1984. During the period 1990–96 he served as a CIAR/PRECARN Associate. He is a Fellow of the IEEE and the International Association for Pattern Recognition. Dr. Levine was presented with the 1997 Canadian Image Processing and Pattern Recognition Society Service Award for his outstanding contributions to research and education in Computer Vision.