Design of a two-stage content-based image retrieval system using texture similarity

https://doi.org/10.1016/S0306-4573(02)00097-3Get rights and content

Abstract

Efficacy and efficiency are two important issues in designing content-based image retrieval systems. In this paper, we present an efficient two-stage image retrieval system with high performance of efficacy based on two novel texture features, the composite sub-band gradient (CSG) vector and the energy distribution pattern (EDP)-string. Both features are generated from the sub-images of a wavelet decomposition of the original image. At the first stage, a fuzzy matching process based on EDP-strings is performed and serves as a signature filter to quickly remove a large number of non-promising database images from further consideration. At the second stage, the images passing through the filter will be compared with the query image based on their CSG vectors for detailed feature inspection. By exercising a database of 2400 images obtained from the Brodatz album, we demonstrated that both high efficacy and high efficiency can be achieved simultaneously by our proposed system.

Introduction

The research on content-based image retrieval systems has becoming pervasive in recent years due to the advent of large storage media and computer communication techniques. Traditional approach to image retrieval is to annotate images by text and then use text-based database management system to perform image retrieval. There are several drawbacks of using keywords or text phrases associated with images to achieve visual information retrieval. First, the process of annotating images manually is very labor-intensive and time-consuming. Second, it is extremely difficult to describe the content of different types of images with human languages. The keywords for describing image contents become inadequate especially when the size of the image database grows. Third, there is always a gap between the user and the system because keywords are inherently subjective. As a consequence, the performance of text-based approach to image retrieval is very sensitive to the keywords employed by the user and the system.

To overcome the difficulties encountered by a text-based image retrieval system, content-based image retrieval (CBIR) was proposed in the early 1990s (Rui & Huang, 1999; Smeulders, Worring, Santini, Gupta, & Jain, 2000). In CBIR, the system can discriminate and retrieve images from the database based on their visual contents. Some examples of visual contents include shape, color, or the combination of both (Jain & Vailaya, 1996; Mehtre et al., 1997, Mehtre et al., 1998). In such a system, more objective low-level image descriptions can be automatically extracted by machines and subsequently used as discriminating features for image retrieval. However, machine-extractable low-level image features are by no means to replace keywords. Instead, the keywords can be used together with low-level image features and treated as metadata or high-level descriptions to classify images in advance before inspecting more detailed visual contents to facilitate image retrieval. Since text-based retrieval has been extensively discussed in the literature, we only focus on the issues of CBIR in this paper.

There have been growing interests in CBIR in the last decade. Examples are QBIC system (Flickner et al., 1995), Photobook system (Pentland, Picard, & Sclaroff, 1994), VisualSEEk system (Smith & Chang, 1996), Virage search engine (Bach et al., 1996), Four Eyes system (Minka, 1995; Minka & Picard, 1997), NeTra system (Ma & Manjunath, 1997), CANDID system (Kelly, Cannon, & Hush, 1995), and WaveGuide (Liang & Kuo, 1999). Generally speaking, these systems provide a set of image features for content description and similarity measuring as well as a “query-by-visual-example” (Hirata & Kato, 1992) man-machine interface for image retrieval. Although various CBIR techniques have been established and good performance results were demonstrated, there are still many open issues to be solved.

In this paper, we proposed a two-stage CBIR system which can efficiently and accurately retrieve images from an image database based on two novel texture features. Texture provides surface characteristics for the analysis of many types of images including natural scenes, remotely sensed data, and biomedical modalities. It plays an important role in the human visual system for recognition and interpretation. Our system can perform image retrieval by texture similarity. Assume that the images stored in the database can be discriminated by their textures. To invoke the system, the user must provide an example texture image as the query. The system then tries to retrieve from the database the images with similar texture attributes. A large number of non-plausible images are expected to be filtered out at the first stage. Thus the effort spent on comparing detailed features of images at the second stage will be significantly reduced. The issues addressed in this paper include the methods of texture features extraction, the algorithm of similarity matching, and the method of signature filtering to speed up the process of image retrieval. A prototype system was written in language C and implemented on a Pentium II 350 PC. The database contained 2400 texture images of 128×128 pixels which were cropped without overlapping from a set of 150 textures of 512×512 pixels selected from the Brodatz Album (Brodatz, 1966). Two images are regarded as similar only if they are cropped from the same original image. We adopt this rigorous criterion to avoid any possible influence due to subjective factors. There are two types of feature descriptors, one coarse and one detail, associated with each image in our prototype system. Both features are derived from the wavelet transform (Daubechies, 1988; Daubechies, 1990; Mallat, 1989) of an original image. In image retrieval, the coarse feature descriptor (energy distribution pattern string, EDP-string) was used at the first stage to quickly screen non-promising images from further consideration. The detailed feature descriptor (composite sub-band gradient vector, CSG vector) was subsequently used at the second stage to find the truly matched images. The discriminatory power of the two feature descriptors is fully discussed in this paper. Our experimental results demonstrated that about 93% efficacy can be achieved in image retrieval. With our special signature filtering scheme, image retrieval became 2–5 times faster depending on the efficacy requirement. In our prototype system containing 2400 images, the system achieved 90.58% efficacy with a response time of 21.55 ms per query and 92.61% efficacy with a response time of 54.16 ms per query on a Pentium II 350 PC.

Section snippets

Efficacy of image retrieval

In a CBIR system, the performance of image retrieval is usually measured by the following widely used formula (Kankanhalli, Mehtre, & Wu, 1996):ηT=n/NifN⩽Tn/TifN>Twhere n is the number of similar images retrieved, N is the total number of similar images in the database, and ηT is called the efficacy of retrieval for a given short-list of size T. If NT, ηT reduces to the traditional recall measure of information retrieval. If N>T, ηT computes the precision measure of information retrieval. By

Signature filtering by EDP-strings

In this section, we introduce a special type of signature, called energy distribution pattern string (or EDP-string for short), which will be used to quickly prune off non-promising images and speed up image retrieval. The generation of an EDP-string is based on a pyramidal wavelet decomposition for an image. Notice that there may be several images mapped to the same EDP-string. Conceptually, all texture images associated with the same EDP-string can be envisioned as belonging to the same

Gradient vector

Features derived from gradient direction images can be used for texture analysis (Gorkani & Picard, 1994; Haralick & Shapiro, 1992). Gradient direction images generated by a gradient operator reflect the magnitude and direction of maximal gray-level change at each pixel of an input image. Such information provides important cues for human visual system. A number of gradient operators such as the popular Sobel operator (Ballard & Brown, 1982; Haralick & Shapiro, 1992) can be used for generating

A two-stage architecture of image retrieval

A two-stage system of image retrieval by texture similarity is depicted in Fig. 3. Image collection and image retrieval are the two types of major activities that occur in this system. Feature extraction is always required for both image collection and image retrieval. The features extracted from an image are represented by a CSG vector and an EDP-string. Notice that several different images can be mapped to the same EDP-string. Similarly, different images may also be mapped to the same CSG

Performance evaluation

We have implemented a prototype two-stage texture image retrieval system in language C on a Pentium II 350 PC. A set of 150 images of 512×512 pixels with different textures were selected from the Brodatz Album. Each texture image was then partitioned into 16 non-overlapping images of 128×128 pixels. Thus, the image database in our prototype system contains 2400 texture images. Every database image was also used as a query image. We evaluated the efficacy of our image retrieval system based on

Comparison with other methods

In this section, we would like to compare the powerfulness of CSG vectors with two other closely related texture descriptors, the wavelet energy signature (Wouwer et al., 1999) and the gradient vector (Fountain & Tan, 1998), in terms of the efficacy of image retrieval. The results of comparison among these three methods are shown in Table 7, where T represents the size of a short list provided by the user.

In our experiment, there are 16 relevant images in the database with respect to each query

Conclusions

Content-based image retrieval systems retrieve desired images from their databases based on the visual cues provided in the query images. Possible visual contents of an image include color, shape, texture as well as the spatial relationships between objects. Among them, texture characterization of an image is probably the most difficult feature to be approximated. For many black-and-white gray-level images, textures become the only clues to enable us to discriminate images. In this paper, we

Acknowledgements

This research work was supported by National Science Council of ROC under contract no. NSC 90-2213-E-005-015.

References (26)

  • I. Daubechies

    Orthonormal bases of compactly supported wavelets

    Communications in Pure Applied Mathematics

    (1988)
  • I. Daubechies

    The wavelet transform, time-frequency localization and signal analysis

    IEEE Transactions on Information Theory

    (1990)
  • M. Flickner et al.

    Query by image and video content: the QBIC system

    IEEE Computers

    (1995)
  • Cited by (0)

    View full text