A genetic programming framework for content-based image retrieval
Introduction
Advances in data storage and image acquisition technologies have allowed the creation of large image data sets. In order to deal with these data, it is necessary to develop appropriate information systems which can support different services. The focus of this paper is on content-based image retrieval (CBIR) systems [1]. Basically, CBIR systems try to retrieve images similar to a user-defined specification or pattern (e.g., shape sketch, image example). Their goal is to support image retrieval based on content properties (e.g., shape, texture, and color).
A feature extraction algorithm encodes image properties into a feature vector and a similarity function computes the similarity between two images as a function of the distance between their feature vectors. An image database can be indexed by using multiple pairs of feature extraction algorithms and similarity functions. We call each pair a database descriptor, because they tell how the images are distributed in the distance space. By replacing the similarity function, for example, we can make groups of relevant images more or less compact, and increase or decrease their separation [2]. These descriptors are commonly chosen in a domain-dependent fashion, and, generally, are combined in order to meet users’ needs. For example, while one user may wish to retrieve images based on their color features, another one may wish to retrieve images according to their texture properties.
Feature vector and descriptor do not have the same meaning here. The importance of considering the pair, feature extraction algorithm and similarity function, as a descriptor should be better understood. In CBIR systems, it is common to find solutions that combine image features irrespective of the similarity functions [3]. However, these techniques do not make sense, for example, when the image content is a shape and the properties are curvature values along it and color/texture properties inside it. The similarity function usually has a crucial role in making the descriptor as invariant as possible to changes in image scale and rotation. This is true even when we consider only shape descriptors. It does not make sense, for example, to combine multiscale fractal dimensions [2] with bean angle statistics (BAS) [4] irrespective of their similarity functions. The importance of the similarity function coupled with the feature extraction algorithm is illustrated in Fig. 1. Precision–recall curves were computed from an MPEG-7 part B database [5] for four different descriptors. They provide different combinations of feature extraction algorithms that encode BAS [4] and segment saliences (SS) [6], with Euclidean metric and matching by optimum correspondent subsequence (OCS) [7] as similarity functions. We are not mixing properties, only replacing similarity functions, to show their role in the effectiveness of each descriptor. Both SS and BAS have been proposed with OCS. Fig. 1 shows that the configurations which use OCS yield the best effectiveness.
At a higher level, we really wish to combine descriptors encoding several properties in order to address the semantic gap problem: it is not easy for a user to map her/his visual perception of an image into low level features. Without mixing distinct properties in a same feature vector, this combination could be done by weighting the similarity values resulting from different descriptors [8], [9], [10]. However, more complex functions than a linear combination are likely to provide more flexibility in matching the results with the users’ expectations. We address the problem by presenting a genetic programming (GP) framework to the design of combined similarity functions. Our solution relies on the creation of a composite descriptor, which is simply the combination of pre-defined descriptors using the GP technique. We employ GP to combine the similarity values obtained from each descriptor, creating a more effective fused similarity function. As far as we know, this approach is original and opens a new and productive field for investigation (considering, for example, different applications, descriptors, and GP parameters).
Our motivation to choose GP stems from its success in many other machine learning applications [11], [12], [13]. Some works, for example, show that GP can provide better results for pattern recognition than classical techniques, such as Support Vector Machines [14]. Different from previous approaches based on genetic algorithms (GAs), which learn the weights of the linear combination function [15], our framework allows nonlinear combination of descriptors. It is validated through several experiments with two image collections under a wide range of conditions, where the images are retrieved based on the shape of their objects. These experiments demonstrate the effectiveness of the framework according to various evaluation criteria, including precision–recall curves, and using a GA-based approach (its natural competitor) as one of the baselines. Given that it is not based on feature combination, the framework is also suitable for information retrieval from multimodal queries, as for example by text, image, and audio.
The remainder of this paper is organized as follows. Section 2 gives the background information on GAs and GP. Section 3 introduces a generic model for CBIR which includes the notion of simple and composite descriptors. Section 4 presents a formal definition of the combination function discovery problem and describes our framework based on GP. Section 5 describes several experiments, which validate our approach, while 6 Results, 7 Related works discuss the main achieved results and related works, respectively. In Section 8 we conclude the paper, explaining implications of this study and presenting future research directions.
Section snippets
Genetic programming
GAs [16] and GP [11] belong to a set of artificial intelligence problem-solving techniques based on the principles of biological inheritance and evolution. Each potential solution is called an individual (i.e., a chromosome) in a population. Both GA and GP work by iteratively applying genetic transformations, such as crossover and mutation, to a population of individuals to create more diverse and better performing individuals in subsequent generations. A fitness function is available to assign
CBIR model
In this section, we formalize how a CBIR system can be modeled. Definition 1 An image is a pair (, ), where: is a finite set of pixels, and is a function that assigns to each pixel p in a vector of values in some arbitrary space (for example, when a color in the RGB system is assigned to a pixel).
Definition 2
A simple descriptor (briefly, descriptor) D is defined as a pair , where:
- •
is a function, which extracts a feature vector from an image .
- •
is a
GP framework for CBIR
The present framework uses GP to combine simple descriptors. This decision stemmed from three reasons: (i) the large size of the search space for combination functions; (ii) previous success of using GP in information retrieval; and (iii) no prior work on applying GP to image retrieval.
The corresponding CBIR system can be characterized as follows. For a given large image database and a given user-defined query pattern (e.g., a query image), the system retrieves a list of images from the
Experiments
The experiments described below were carried out for shape-based descriptors. However, the proposed framework is generic and allows the combination of descriptors that encode different properties (i.e., color, texture, etc.).
Results
As mentioned earlier, the objective of an image retrieval system is to match database images to a user's query and place them in descending order of their predicted relevance (similarity).
Descriptors combination
In general, approaches for descriptors combination rely on assigning weights to indicate the importance of a descriptor [8], [9], [10], [25]. Basically, the higher the weight the more important a descriptor is assumed to be.
The main drawback of these approaches is the fact that it is not easy to define good weight values for a given application, or even for a given user in advance. Therefore, several techniques (such as Refs. [26] and [27]) based on user feedback have been proposed to assist
Conclusions
We considered the problem of combining simple descriptors for content-based image retrieval. Our solution uses genetic programming (GP) to discover an effective combination function. The proposed framework was validated for shape-based image retrieval, through several experiments involving two image databases, and many simple descriptors and fitness functions.
We conclude that the new framework is flexible and powerful for the design of effective combination functions. The effectiveness results
Acknowledgments
This work was supported by FAPESP, CNPq, CAPES, FAPEMIG, and Microsoft Research.
About the Author—RICARDO DA SILVA TORRES received his B.Sc. in Computer Engineering from the University of Campinas, Brazil, in 2000. He got his doctorate in Computer Science from the same university in 2004. He has been Professor at Institute of Computing, University of Campinas, since 2005. His research interests include image analysis, content-based image retrieval, image databases, digital libraries, and geographic information systems.
References (39)
- et al.
BAS: a perceptual shape descriptor based on the beam angle statistics
Pattern Recognition Lett.
(2003) - et al.
Contour salience descriptors for effective image retrieval and analysis
Image Vision Comput.
(2007) - et al.
Object detection in multi-modal images using genetic programming
Appl. Soft Comput.
(2004) - et al.
Mathematical aggregation operators in image retrieval: effect on retrieval performance and role in relevance feedback
Signal Processing
(2005) - et al.
Interactive content-based image retrieval using relevance feedback
Comput. Vision Image Understanding
(2002) - et al.
Target detection in SAR imagery by genetic programming
Adv. Eng. Software
(1999) - et al.
Object detection using feature subset selection
Pattern Recognition
(2004) - et al.
Genetic algorithm based feature selection in SAR images
Image Vision Comput.
(2003) - et al.
Image classification: an evolutionary approach
Pattern Recognition Lett.
(2002) - et al.
Content-based image retrieval: theory and applications
Rev. Inf. Teór. Apl.
(2006)
A graph-based approach for multiscale shape analysis
Pattern Recognition
Content-based image retrieval at the end of the years
IEEE TPAMI
Shape similarity measure based on correspondence of visual parts
IEEE TPAMI
Optimal correspondence of string subsequences
IEEE TPAMI
Genetic Programming: On the Programming of Computers by Means of Natural Selection
The effects of fitness functions on genetic programming-based ranking discovery for web search
JASIST
Cited by (157)
Unsupervised selective rank fusion for image retrieval tasks
2020, NeurocomputingGenetic programming for predictions of effectiveness of rolling dynamic compaction with dynamic cone penetrometer test results
2019, Journal of Rock Mechanics and Geotechnical EngineeringA MODIFIED INERTIAL VISCOSITY ALGORITHM FOR AN INFINITE FAMILY OF NONEXPANSIVE MAPPINGS AND ITS APPLICATION TO IMAGE RESTORATION
2024, Journal of Industrial and Management OptimizationA Survey on Evolutionary Computation for Computer Vision and Image Analysis: Past, Present, and Future Trends
2023, IEEE Transactions on Evolutionary ComputationFull-Reference Image Quality Expression via Genetic Programming
2023, IEEE Transactions on Image ProcessingA genetic programming approach for searching on nearest neighbors graphs
2022, Multimedia Tools and Applications
About the Author—RICARDO DA SILVA TORRES received his B.Sc. in Computer Engineering from the University of Campinas, Brazil, in 2000. He got his doctorate in Computer Science from the same university in 2004. He has been Professor at Institute of Computing, University of Campinas, since 2005. His research interests include image analysis, content-based image retrieval, image databases, digital libraries, and geographic information systems.
About the Author—ALEXANDRE X. FALCÃO received his B.Sc. in Electrical Engineering (1988) from the University of Pernambuco, PE, Brazil. He has worked in image processing and analysis since 1991. In 1993, he received his M.Sc. in Electrical Engineering from the University of Campinas, SP, Brazil. During 1994-1996, he worked at the University of Pennsylvania, PA, USA, on interactive image segmentation for his doctorate. He got his doctorate in Electrical Engineering from the University of Campinas in 1996. In 1997, he developed video quality evaluation methods for Globo TV, RJ, Brazil. He has been Professor at the Institute of Computing, University of Campinas, since 1998, and his research interests include image segmentation and analysis, volume visualization, content-based image retrieval, mathematical morphology, digital TV, medical imaging applications and pattern recognition.
About the Author—MARCOS ANDRÉ GONÇALVES concluded his doctoral degree in Computer Science at Virginia Tech in 2004. He earned a Master degree from University of Campinas (UNICAMP) in 1997 and a Bachelor degree from the Federal University of Ceará (UFC) in 1995, both in Computer Science. He has published 6 book chapters, 16 journal papers, and more than 60 conference/workshop papers in the digital library, databases, and information retrieval fields.
About the Author—JOÃO PAULO PAPA received his B.Sc. in Information Systems from the State University of São Paulo, SP, Brazil. He has worked in image processing since 1999. In 2005, he received his M.Sc. in Computer Science from the Federal University of São Carlos, SP, Brazil. He has been a full Ph.D. student from University of Campinas since 2005, and his research interests include image restoration, pattern recognition and image processing.
About the Author—BAOPING ZHANG is a Software Engineer at Microsoft Corporation. She was previously a member of the Digital Library Research Laboratory at Virginia Tech. She has a Ph.D. in Computer Science from Virginia Tech, and she has worked on text classification and genetic programming.
About the Author—WEIGUO FAN is an associate professor of information systems and computer science at Virginia Tech. He received his Ph.D. in Information Systems from the University of Michigan, Ann Arbor, in July 2002, his M.Sc. in Computer Science from the National University of Singapore in 1997, and his B.E. in Information and Control Engineering from the Xi’an Jiaotong University, PR China, in 1995. His research interests focus on the design and development of novel information technologies — information retrieval, data mining, text/web mining, personalization and knowledge management techniques — to support better business information management and decision making. He has worked on the development of adaptive and intelligent text mining and web mining techniques for more advanced business intelligence applications, such as search engine ranking function discovery and optimization, text summarization, Web-based information extraction and question answering. He has published more than 80 refereed journal and conference papers. His research has appeared in many prestigious information technology journals such as IEEE Transactions on Knowledge and Data Engineering, IEEE Intelligent Systems, Information Systems, Decision Support Systems, ACM Transactions on Internet Technology, Pattern Recognition, etc., and in many leading information technology conferences such as SIGIR, WWW, CIKM, HLT, etc. His text mining research has been cited more than 500 times according to Google Scholar. His research is currently funded by four NSF grants and one PWC grant.
About the Author—EDWARD A. FOX holds a Ph.D. and M.S. in Computer Science from Cornell University, and a B.S. from M.I.T. Since 1983 he has been at Virginia Polytechnic Institute and State University (VPI&SU or Virginia Tech), where he serves as Professor of Computer Science. He directs the Digital Library Research Laboratory and the Networked Digital Library of Theses and Dissertations. He has been (co)PI on over 100 research and development projects. In addition to his courses at Virginia Tech, Dr. Fox has taught about 70 tutorials in over 25 countries. He has given over 60 keynote/banquet/international invited/distinguished speaker presentations, over 140 refereed conference/workshop papers, and over 250 additional presentations.