Elsevier

Pattern Recognition

Volume 33, Issue 4, April 2000, Pages 671-684
Pattern Recognition

Object localization using color, texture and shape

https://doi.org/10.1016/S0031-3203(99)00079-5Get rights and content

Abstract

We address the problem of localizing objects using color, texture and shape. Given a handrawn sketch for querying an object shape, and its color and texture, the proposed algorithm automatically searches the image database for objects which meet the query attributes. The database images do not need to be presegmented or annotated. The proposed algorithm operates in two stages. In the first stage, we use local texture and color features to find a small number of candidate images in the database, and identify regions in the candidate images which share similar texture and color as the query. To speed up the processing, the texture and color features are directly extracted from the Discrete Cosine Transform (DCT) compressed domain. In the second stage, we use a deformable template matching method to match the query shape to the image edges at the locations which possess the desired texture and color attributes. This algorithm is different from other content-based image retrieval algorithms in that: (i) no presegmentation of the database images is needed, and (ii) the color and texture features are directly extracted from the compressed images. Experimental results demonstrate performance of the algorithm and show that substantial computational savings can be achieved by utilizing multiple image cues.

Introduction

We are now living in the age of multimedia, where digital libraries are beginning to play a more and more important role. In contrast to traditional databases which are mainly accessed by textual queries, digital libraries, including image and video databases, require representation and management using visual or pictorial cues. The current trend in image and video database research reflects this need. A number of content-based image database retrieval systems have been designed and built using pictorial cues including shape, texture, and color. Among them, QBIC (Querying by Image Content) [1] can query large on-line image databases using image content (color, texture, shape, and geometric composition). It uses both semantic and statistical features to describe the image content. Photobook [2] is a set of interactive tools for browsing and searching image databases. It uses both semantic-preserving content-based features and text annotations for querying. The Virage search engine enables search using texture, color and composition for images and videos [3], [4]. A novel image region segmentation method was used in Ref. [5] to facilitate automatic region segmentation for color/texture based image retrieval. Vinod and Murase [6] proposed to locate an object by matching the corresponding DCT coefficients in the transformed domain [6]. Color, texture and shape features have also been applied to index and browse digital video databases [7]. For all these applications, object shape, as an important visual cue for human perception, plays a significant role. Queries typically involve a set of curves (open or closed) which need to be located in the images or video frames of the database.

In most of the image retrieval approaches, the challenge is to extract appropriate features such that they are representative of a specific image attribute and at the same time, are able to discriminate images with different attributes. Color histogram [8] is a commonly used color feature; responses to specially tuned spatial and orientation filters are widely used to characterize a texture. Invariant moments and histograms of edge turning angles are used as shape features [9]. Once features are extracted to characterize the image property of interest, the matching and retrieval problem is reduced to computing the similarity in the feature space and finding database images which are most similar to the query image. However, it is not always clear whether a given set of features is appropriate for a specific application.

Feature-based methods can be applied only when the object of interest (and the associated features) has been segmented from the background. Deformable template-based methods [10], [11], [12], [13], [14] do not compute any specific shape features. Various deformable template models have been proposed to perform tasks including image registration, object detection and localization, feature tracking, and object matching. These deformable models are popular because (i) they combine both the structural knowledge and local image features, and (ii) they are versatile in incorporating intra-object class variations. We have proposed one such method for shape matching [11]. The advantage of this method is that it does not compute specific shape features, and no segmentation of the input image is necessary. However, the generality of the approach and avoidance of segmentation are achieved at the cost of expensive computation. As a result, Deformable Template Matching (DTM) method is currently more suited for off-line retrieval tasks rather than online retrievals.

In order to make the DTM method feasible for online retrievals, we have adopted a hierarchical retrieval scheme which integrates the three important image content cues: shape, texture, and color. In the first (screening) stage, the database is browsed using some simple and efficient matching criteria. In particular, texture and color features are used as supplemental clues to help locate promising regions in the image which are likely to contain the desired objects. This eliminates a large portion of the database images from further screening. Once a small set of candidate regions is obtained, we then use the deformable template matching method to localize the objects in the proximity of these regions in the second stage. A diagram of this system is given in Fig. 1. This hierarchical mechanism can improve both efficiency and accuracy.

The motivation of this work is threefold: (i) the region cues (texture and color) may come naturally as a constraint in the retrieval task, (ii) the region cues may be used to expedite the localization process: the deformable template matching process need not be executed where the region cues are quite different from the desired ones, and (iii) region-based matching methods are more robust to misalignment and position shift than edge-based methods. We use the region information to obtain some good yet coarse initializations. The contributions of this work are as follows: (i) we extract color and texture features directly from the compressed image data, (ii) we use the region attributes to direct the shape-based search to save computational costs, and (iii) we sensibly fuse multiple content cues to efficiently retrieve images from a nonannotated image database where the only information available is the bit stream of the images.

The remainder of the paper is organized as follows. In Section 2 we describe the screening process using color and texture, where these features are extracted from the DCT domain to browse the database and retrieve a small number of images as well as to identify specific locations for the object of interest in these images. In Section 3 we describe the deformable template approach to the shape matching problem, where the query shape is used as a prototype template which can be deformed. We integrate the color, texture, and shape matching in Section 4 and present the two-stage matching algorithm. Experimental results are presented in Section 5. Section 6 summarizes the paper and proposes future work.

Section snippets

Matching using color and texture

Texture and color features have been used in several content-based image database systems to retrieve objects or images of a specific texture and color composition [2], [15], [16], [17]. We use texture and color cues in addition to shape information to localize objects. For example, one may be interested in finding a fish, with a particular shape, color and texture. The texture and color information can be specified in terms of a sample pattern, as in the case “I want to retrieve all fish

Deformable template matching

Shape-based matching is a difficult problem in content-based retrieval due to the following factors:

  • For a query shape, one generally has no prior information about its presence in database images, including the number of occurrences and its location, scale, and orientation.

  • Often, the desired object has not been segmented from the background in the image.

  • There is a need to accommodate both rigid and nonrigid deformations in the query shape.

  • Most quantitative shape features cannot efficiently

Integrating texture, color and shape

We have integrated texture, color, and shape cues to improve the performance of the retrieval process. The integrated system operates in two stages. Since region-based matching methods are relatively robust to minor displacements as long as the two matching regions substantially overlap, we browse the database using color and texture in the first stage, so that only a small set of images, and a small number of locations in the candidate images are identified. In the second stage, the identified

Experimental results

We have applied the integrated retrieval algorithm to an image database containing 592 color images of people, animals, birds, fishes, flowers, outdoor and indoor scenes, etc. These images are of varying sizes from 256×384 to 420×562. They have been collected from different sources including the Kodak Photo CD, web sites (Electronic Zoo/Net Vet-Animal Image Collection URL: http://netvet/wusti.edu/pix.htm), and HP Labs. Some sample images from the database are illustrated in Fig. 3.

To gain some

Conclusion

We have proposed an algorithm for object localization using shape, color, and texture. Shape-based deformable template matching methods have the potential in object retrieval because of their versatility and generalizability in handling different classes of objects and different instances of objects belonging to the same shape class. But, one disadvantage in adopting them in content-based image retrieval systems is their computational cost. We have proposed efficient methods to compute texture

Acknowledgements

The authors would like to thank Dr. Hongjiang Zhang of HP labs for providing some of the test images.

About the Author—YU ZHONG received the B.S. and M.S. degrees in Computer Science and Engineering from Zhejiang University, Hangzhou, China in 1988 and 1991, the M.S. degree in Statistics from Simon Fraser University, Burnaby, Canada, in 1993, and the Ph.D. degree in Computer Science from Michigan State University, East Lansing, Michigan, in 1997. She is currently a postdoctoral fellow at Carnegie Mellon University. Her research interests include image/video processing, pattern recognition, and

References (26)

  • K Karu et al.

    Is there any texture in the image?

    Pattern Recognition

    (1996)
  • A.K Jain et al.

    Unsupervised texture segmentation using gabor filters

    Pattern Recognition

    (1991)
  • W. Niblack, R. Barber, W. Equitz, The QBIC project: querying images by content using color, texture, and shape,...
  • A. Pentland, R.W. Picard, S. Sclaroff, Photobook: tools for content-based manipulation of image databases, Proceedings...
  • J.R. Bach, C. Fuller, A. Gupta, The Virage image search engine: an open framework for image management, Proceedings of...
  • A. Hampapur, A. Gupta, B. Horowitz, C.F. Shu, C. Fuller, J. Bach, M. Gorkani, R. Jain, Virage video engine, Proceedings...
  • W.Y. Ma, B.S. Manjunath, Netra: a toolbox for navigating large image databases, in Proceedings of the International...
  • V.V. Vinod, H. Murase, Object location using complementary color features: histogram and DCT, Proceedings of the 13th...
  • H.J Zhang et al.

    Video parsing and browsing using compressed data

    Multimedia Tools and Applications

    (1995)
  • M.J Swain et al.

    Color indexing

    Int. J. Comput. Vision

    (1991)
  • A. Vailaya, Y. Zhong, A.K. Jain, A hierarchical system for efficient image retrieval, Proceedings of the 13th...
  • U Grenander et al.

    Representation of knowledge in complex systems

    J. Roy. Statist. Soc. (B)

    (1994)
  • A.K. Jain, Y. Zhong, S. Lakshmanan, Object matching using deformable templates, IEEE Trans. Pattern Anal. Mach. Intell....
  • Cited by (0)

    About the Author—YU ZHONG received the B.S. and M.S. degrees in Computer Science and Engineering from Zhejiang University, Hangzhou, China in 1988 and 1991, the M.S. degree in Statistics from Simon Fraser University, Burnaby, Canada, in 1993, and the Ph.D. degree in Computer Science from Michigan State University, East Lansing, Michigan, in 1997. She is currently a postdoctoral fellow at Carnegie Mellon University. Her research interests include image/video processing, pattern recognition, and computer vision.

    About the Author—ANIL JAIN is a University Distinguished Professor and Chair of the Department of Computer Science at Michigan State University. His research interests include statistical pattern recognition, Markov random fields, texture analysis, neural networks, document image analysis, fingerprint matching and 3D object recognition. He received the best paper awards in 1987 and 1991 and certificates for outstanding contributions in 1976, 1979, 1992, and 1997 from the Pattern Recognition Society. He also received the 1996 IEEE Trans. Neural Networks Outstanding Paper Award. He was the Editor-in-Chief of the IEEE Trans. on Pattern Analysis and Machine Intelligence (1990–1994). He is the co-author of Algorithms for Clustering Data, Prentice-Hall, 1988, has edited the book Real-Time Object Measurement and Classification, Springer-Verlag, 1988, and co-edited the books, Analysis and Interpretation of Range Images, Springer-Verlag, 1989, Markov Random Fields, Academic Press, 1992, Artificial Neural Networks and Pattern Recognition, Elsevier, 1993, 3D Object Recognition, Elsevier, 1993, and BIOMETRICS: Personal Identification in Networked Society to be published by Kluwer in 1998. He is a Fellow of the IEEE and IAPR, and has received a Fulbright research award.

    View full text