Elsevier

Pattern Recognition

Volume 42, Issue 2, February 2009, Pages 283-292
Pattern Recognition

A genetic programming framework for content-based image retrieval

https://doi.org/10.1016/j.patcog.2008.04.010Get rights and content

Abstract

The effectiveness of content-based image retrieval (CBIR) systems can be improved by combining image features or by weighting image similarities, as computed from multiple feature vectors. However, feature combination do not make sense always and the combined similarity function can be more complex than weight-based functions to better satisfy the users’ expectations. We address this problem by presenting a Genetic Programming framework to the design of combined similarity functions. Our method allows nonlinear combination of image similarities and is validated through several experiments, where the images are retrieved based on the shape of their objects. Experimental results demonstrate that the GP framework is suitable for the design of effective combinations functions.

Introduction

Advances in data storage and image acquisition technologies have allowed the creation of large image data sets. In order to deal with these data, it is necessary to develop appropriate information systems which can support different services. The focus of this paper is on content-based image retrieval (CBIR) systems [1]. Basically, CBIR systems try to retrieve images similar to a user-defined specification or pattern (e.g., shape sketch, image example). Their goal is to support image retrieval based on content properties (e.g., shape, texture, and color).

A feature extraction algorithm encodes image properties into a feature vector and a similarity function computes the similarity between two images as a function of the distance between their feature vectors. An image database can be indexed by using multiple pairs of feature extraction algorithms and similarity functions. We call each pair a database descriptor, because they tell how the images are distributed in the distance space. By replacing the similarity function, for example, we can make groups of relevant images more or less compact, and increase or decrease their separation [2]. These descriptors are commonly chosen in a domain-dependent fashion, and, generally, are combined in order to meet users’ needs. For example, while one user may wish to retrieve images based on their color features, another one may wish to retrieve images according to their texture properties.

Feature vector and descriptor do not have the same meaning here. The importance of considering the pair, feature extraction algorithm and similarity function, as a descriptor should be better understood. In CBIR systems, it is common to find solutions that combine image features irrespective of the similarity functions [3]. However, these techniques do not make sense, for example, when the image content is a shape and the properties are curvature values along it and color/texture properties inside it. The similarity function usually has a crucial role in making the descriptor as invariant as possible to changes in image scale and rotation. This is true even when we consider only shape descriptors. It does not make sense, for example, to combine multiscale fractal dimensions [2] with bean angle statistics (BAS) [4] irrespective of their similarity functions. The importance of the similarity function coupled with the feature extraction algorithm is illustrated in Fig. 1. Precision–recall curves were computed from an MPEG-7 part B database [5] for four different descriptors. They provide different combinations of feature extraction algorithms that encode BAS [4] and segment saliences (SS) [6], with Euclidean metric and matching by optimum correspondent subsequence (OCS) [7] as similarity functions. We are not mixing properties, only replacing similarity functions, to show their role in the effectiveness of each descriptor. Both SS and BAS have been proposed with OCS. Fig. 1 shows that the configurations which use OCS yield the best effectiveness.

At a higher level, we really wish to combine descriptors encoding several properties in order to address the semantic gap problem: it is not easy for a user to map her/his visual perception of an image into low level features. Without mixing distinct properties in a same feature vector, this combination could be done by weighting the similarity values resulting from different descriptors [8], [9], [10]. However, more complex functions than a linear combination are likely to provide more flexibility in matching the results with the users’ expectations. We address the problem by presenting a genetic programming (GP) framework to the design of combined similarity functions. Our solution relies on the creation of a composite descriptor, which is simply the combination of pre-defined descriptors using the GP technique. We employ GP to combine the similarity values obtained from each descriptor, creating a more effective fused similarity function. As far as we know, this approach is original and opens a new and productive field for investigation (considering, for example, different applications, descriptors, and GP parameters).

Our motivation to choose GP stems from its success in many other machine learning applications [11], [12], [13]. Some works, for example, show that GP can provide better results for pattern recognition than classical techniques, such as Support Vector Machines [14]. Different from previous approaches based on genetic algorithms (GAs), which learn the weights of the linear combination function [15], our framework allows nonlinear combination of descriptors. It is validated through several experiments with two image collections under a wide range of conditions, where the images are retrieved based on the shape of their objects. These experiments demonstrate the effectiveness of the framework according to various evaluation criteria, including precision–recall curves, and using a GA-based approach (its natural competitor) as one of the baselines. Given that it is not based on feature combination, the framework is also suitable for information retrieval from multimodal queries, as for example by text, image, and audio.

The remainder of this paper is organized as follows. Section 2 gives the background information on GAs and GP. Section 3 introduces a generic model for CBIR which includes the notion of simple and composite descriptors. Section 4 presents a formal definition of the combination function discovery problem and describes our framework based on GP. Section 5 describes several experiments, which validate our approach, while 6 Results, 7 Related works discuss the main achieved results and related works, respectively. In Section 8 we conclude the paper, explaining implications of this study and presenting future research directions.

Section snippets

Genetic programming

GAs [16] and GP [11] belong to a set of artificial intelligence problem-solving techniques based on the principles of biological inheritance and evolution. Each potential solution is called an individual (i.e., a chromosome) in a population. Both GA and GP work by iteratively applying genetic transformations, such as crossover and mutation, to a population of individuals to create more diverse and better performing individuals in subsequent generations. A fitness function is available to assign

CBIR model

In this section, we formalize how a CBIR system can be modeled.

Definition 1

An image I^ is a pair (DI, I), where:

  • DIZ2 is a finite set of pixels, and

  • I:DID is a function that assigns to each pixel p in DI a vector I(p) of values in some arbitrary space D (for example, D=R3 when a color in the RGB system is assigned to a pixel).

Definition 2

A simple descriptor (briefly, descriptor) D is defined as a pair (εD,δD), where:

  • εD:I^Rn is a function, which extracts a feature vector vI^ from an image I^.

  • δD:Rn×RnR is a

GP framework for CBIR

The present framework uses GP to combine simple descriptors. This decision stemmed from three reasons: (i) the large size of the search space for combination functions; (ii) previous success of using GP in information retrieval; and (iii) no prior work on applying GP to image retrieval.

The corresponding CBIR system can be characterized as follows. For a given large image database and a given user-defined query pattern (e.g., a query image), the system retrieves a list of images from the

Experiments

The experiments described below were carried out for shape-based descriptors. However, the proposed framework is generic and allows the combination of descriptors that encode different properties (i.e., color, texture, etc.).

Results

As mentioned earlier, the objective of an image retrieval system is to match database images to a user's query and place them in descending order of their predicted relevance (similarity).

Descriptors combination

In general, approaches for descriptors combination rely on assigning weights to indicate the importance of a descriptor [8], [9], [10], [25]. Basically, the higher the weight the more important a descriptor is assumed to be.

The main drawback of these approaches is the fact that it is not easy to define good weight values for a given application, or even for a given user in advance. Therefore, several techniques (such as Refs. [26] and [27]) based on user feedback have been proposed to assist

Conclusions

We considered the problem of combining simple descriptors for content-based image retrieval. Our solution uses genetic programming (GP) to discover an effective combination function. The proposed framework was validated for shape-based image retrieval, through several experiments involving two image databases, and many simple descriptors and fitness functions.

We conclude that the new framework is flexible and powerful for the design of effective combination functions. The effectiveness results

Acknowledgments

This work was supported by FAPESP, CNPq, CAPES, FAPEMIG, and Microsoft Research.

About the Author—RICARDO DA SILVA TORRES received his B.Sc. in Computer Engineering from the University of Campinas, Brazil, in 2000. He got his doctorate in Computer Science from the same university in 2004. He has been Professor at Institute of Computing, University of Campinas, since 2005. His research interests include image analysis, content-based image retrieval, image databases, digital libraries, and geographic information systems.

References (39)

  • R.S. Torres et al.

    A graph-based approach for multiscale shape analysis

    Pattern Recognition

    (2004)
  • A.W.M. Smeulders et al.

    Content-based image retrieval at the end of the years

    IEEE TPAMI

    (2000)
  • L.J. Latecki et al.

    Shape similarity measure based on correspondence of visual parts

    IEEE TPAMI

    (2000)
  • Y.P. Wang et al.

    Optimal correspondence of string subsequences

    IEEE TPAMI

    (1990)
  • K. Porkaew, S. Mehrotra, M. Ortega, K. Chakrabarti, Similarity search using multiple examples in MARS, in: Visual...
  • M.S. Lew (Ed.), Principles of Visual Information Retrieval—Advances in Pattern Recognition, Springer,...
  • H. Shao, J.-W. Zhang, W.C. Cui, H. Zhao, Automatic feature weight assignment based on genetic algorithm for image...
  • J.R. Koza

    Genetic Programming: On the Programming of Computers by Means of Natural Selection

    (1992)
  • W. Fan et al.

    The effects of fitness functions on genetic programming-based ranking discovery for web search

    JASIST

    (2004)
  • Cited by (157)

    • Full-Reference Image Quality Expression via Genetic Programming

      2023, IEEE Transactions on Image Processing
    View all citing articles on Scopus

    About the Author—RICARDO DA SILVA TORRES received his B.Sc. in Computer Engineering from the University of Campinas, Brazil, in 2000. He got his doctorate in Computer Science from the same university in 2004. He has been Professor at Institute of Computing, University of Campinas, since 2005. His research interests include image analysis, content-based image retrieval, image databases, digital libraries, and geographic information systems.

    About the Author—ALEXANDRE X. FALCÃO received his B.Sc. in Electrical Engineering (1988) from the University of Pernambuco, PE, Brazil. He has worked in image processing and analysis since 1991. In 1993, he received his M.Sc. in Electrical Engineering from the University of Campinas, SP, Brazil. During 1994-1996, he worked at the University of Pennsylvania, PA, USA, on interactive image segmentation for his doctorate. He got his doctorate in Electrical Engineering from the University of Campinas in 1996. In 1997, he developed video quality evaluation methods for Globo TV, RJ, Brazil. He has been Professor at the Institute of Computing, University of Campinas, since 1998, and his research interests include image segmentation and analysis, volume visualization, content-based image retrieval, mathematical morphology, digital TV, medical imaging applications and pattern recognition.

    About the Author—MARCOS ANDRÉ GONÇALVES concluded his doctoral degree in Computer Science at Virginia Tech in 2004. He earned a Master degree from University of Campinas (UNICAMP) in 1997 and a Bachelor degree from the Federal University of Ceará (UFC) in 1995, both in Computer Science. He has published 6 book chapters, 16 journal papers, and more than 60 conference/workshop papers in the digital library, databases, and information retrieval fields.

    About the Author—JOÃO PAULO PAPA received his B.Sc. in Information Systems from the State University of São Paulo, SP, Brazil. He has worked in image processing since 1999. In 2005, he received his M.Sc. in Computer Science from the Federal University of São Carlos, SP, Brazil. He has been a full Ph.D. student from University of Campinas since 2005, and his research interests include image restoration, pattern recognition and image processing.

    About the Author—BAOPING ZHANG is a Software Engineer at Microsoft Corporation. She was previously a member of the Digital Library Research Laboratory at Virginia Tech. She has a Ph.D. in Computer Science from Virginia Tech, and she has worked on text classification and genetic programming.

    About the Author—WEIGUO FAN is an associate professor of information systems and computer science at Virginia Tech. He received his Ph.D. in Information Systems from the University of Michigan, Ann Arbor, in July 2002, his M.Sc. in Computer Science from the National University of Singapore in 1997, and his B.E. in Information and Control Engineering from the Xi’an Jiaotong University, PR China, in 1995. His research interests focus on the design and development of novel information technologies — information retrieval, data mining, text/web mining, personalization and knowledge management techniques — to support better business information management and decision making. He has worked on the development of adaptive and intelligent text mining and web mining techniques for more advanced business intelligence applications, such as search engine ranking function discovery and optimization, text summarization, Web-based information extraction and question answering. He has published more than 80 refereed journal and conference papers. His research has appeared in many prestigious information technology journals such as IEEE Transactions on Knowledge and Data Engineering, IEEE Intelligent Systems, Information Systems, Decision Support Systems, ACM Transactions on Internet Technology, Pattern Recognition, etc., and in many leading information technology conferences such as SIGIR, WWW, CIKM, HLT, etc. His text mining research has been cited more than 500 times according to Google Scholar. His research is currently funded by four NSF grants and one PWC grant.

    About the Author—EDWARD A. FOX holds a Ph.D. and M.S. in Computer Science from Cornell University, and a B.S. from M.I.T. Since 1983 he has been at Virginia Polytechnic Institute and State University (VPI&SU or Virginia Tech), where he serves as Professor of Computer Science. He directs the Digital Library Research Laboratory and the Networked Digital Library of Theses and Dissertations. He has been (co)PI on over 100 research and development projects. In addition to his courses at Virginia Tech, Dr. Fox has taught about 70 tutorials in over 25 countries. He has given over 60 keynote/banquet/international invited/distinguished speaker presentations, over 140 refereed conference/workshop papers, and over 250 additional presentations.

    View full text