Elsevier

Neurocomputing

Volume 208, 5 October 2016, Pages 99-107
Neurocomputing

A novel dynamic multi-model relevance feedback procedure for content-based image retrieval

https://doi.org/10.1016/j.neucom.2016.02.073Get rights and content

Abstract

This paper deals with the problem of image retrieval in large databases with a big semantic gap by a relevance feedback procedure. We present a novel algorithm for modelling the users׳s preferences in the content-based image retrieval system.

The proposed algorithm considers the probability of an image belonging to the set of those sought by the user, and estimates the parameters of several local logistic regression models whose inputs are the low-level image features. A Principal Component Analysis method is applied to the original vector to reduce its high dimensionality. The relevance probabilities predicted by these local models are combined by means of a weighted average. These weights are obtained according to the variance explained by the group of principal components used for each local model. These models are dynamically estimated in each iteration of the relevance feedback algorithm until the user is satisfied.

This novel procedure has been tested in a collection with a large semantic gap, the Wikipedia collection. Two types of experiments have been performed, one with an automatic user and another with a typical user. The method is compared to some recent similar approaches in literature, obtaining very good performance in terms of the MAP evaluation measure.

Introduction

In the last few years, the increasing number of image databases which need effective and efficient techniques for retrieving multimedia information has motivated the interest in Content-Based Image Retrieval (CBIR) systems [29], [25], [12]. Compared to traditional image retrieval systems based on textual information, CBIR systems represent an improvement taking advantage of the digital information stored in the image itself when image collections are not semantically annotated with textual labels. Thus, visual features are extracted from images in order to describe its content [9], and later be compared with the image query. These visual features used in CBIR systems can be classified into low level features (color, texture and shape) and high level features, which are usually obtained by combining low level features with a predefined model. High level features are not usually suitable for general purpose systems as they have a strong dependency on the application domain, so the extraction of good low level image descriptors in an important research activity in this field.

Nevertheless, although the low level features can easily describe the content of simple images, complex images and high level concepts cannot be properly described. This gap between high level concepts closer to human perception and low level features used to describe images is called semantic gap, and different methods have been proposed to deal with it [18], [20], [35], [8], [28]. In many cases, the strategies proposed are based on the integration of the information provided by the user into the decision process. The procedure in these systems is as follows. First, the user selects from a set of images (resulting from a previous search) those considered to be relevant, according to his/her particular search criterion, and rejects those that are not. Based on this selection, the system learns and offers the user a new set of images closer to his/her search in the next step. This process is repeated until the user considers the offered result as satisfactory. This way, the user has guided the search by indicating his/her preferences. Those systems that take the information provided by the user to improve the results of a new query are said to have a relevance feedback mechanism.

Regarding this relevance feedback mechanism, several researchers have proposed different techniques and algorithms to achieve it [37], [6]. The classical approaches were inspired by other techniques typically used in the context of classical information retrieval, such as the systems proposed by Ishikawa [16] Rui [27] or Ciocca an Schettini [4], based on moving the query point so that it appears closer to the relevant results and farther from non-relevant results, and updating the similarity measure according to user׳s criteria. Other authors propose probabilistic techniques mainly based on Bayesian frameworks that incorporate user preferences ([34], [30], [5], [7]). Another group of approaches are those based on classification and clustering which use support vector machines, a set of supervised learning method for classification that require positive and negative selections ([36], [32]), and others which learn from positive selections only [3], [13], [31].

Being part of this mainstream, in this paper, we present a new algorithm for relevance feedback in large image databases that is within the probabilistic techniques and is based on local regression models improving previous related works [19], [8]. In [19] an iterative relevance feedback scheme was proposed, based on logistic regression analysis for ranking a set of images in decreasing order of their evaluated relevance probabilities. The low level image features are grouped into n subsets with semantically related characteristics obtaining n smaller regression models to be adjusted. These models produce a different relevance probability that are combined by making use of the ordered weighted averaging (OWA) operators to rank the database according to the preferences of users. In [8] a very drastic dimensionality reduction is performed up to 3 dimensions. Each image is described in terms of the distances to the different levels of user preferences (relevant, neutral and non-relevant), changing this feature vector in each iteration. After that, only a global logistic regression model is adjusted to this reduced vector and the associated relevance probabilities are computed.

To improve these algorithms, in the present work we deal with the high dimension of the low-level feature vector by applying a Principal Component Analysis (PCA) to the original feature vector ([17]), obtaining a transformed vector which explains at least the 80% of the sample variability. The problem of the small sample size with respect to the number of features is solved by adjusting several partial generalized local regression models (rather than the global models used in [19], [8]), and combining their relevance probabilities by means of a weighted average where the weights are related to the variability of the vector components used in each model. In this way, we have developed a relevance feedback procedure whose nature is local, and it is able to adapt to the sample size in a dynamic way.

The remainder of this paper is organized as follows. Section 2 gives a detailed explanation of the proposed method, highlighting the chosen low level features, the dimensionality reduction technique and the local model applied. In Section 3 the experimental design as well as the two types of experiments performed are explained. This section also compares the method with some recent similar algorithms in literature. Finally, in Section 4 some conclusions are drawn and further extensions of the work are proposed.

Section snippets

Methodology

As we have introduced previously, we are concerned with CBIR in large databases. We try to model the random preferences of the user when doing queries in an image database by using a stochastic model. Each image in the database is represented by means of a low level feature vector in a very high dimensional space. So, we have to treat the problem of the reduction of dimensionality in order to fit a convenient model, which predicts the probability of an image to be relevant for the user.

Although

ImageCLEF wikipedia collection

The collection used for the experiments is The Wikipedia2011 collection that is part of the ImageCLEF campaign whose main mission is to promote research, innovation, and development of information access systems [23], proposing in each edition a task and providing a suitable benchmark to achieve the objectives. The ImageCLEF collections were built using the traditional TREC-style methodology to ensure representation, quantity, visual quality (high resolution, clarity and contrast) and

Conclusions and further work

In this paper, we present a novel algorithm for the incorporation of user preferences in an image retrieval system based exclusively on the visual content of the image, which is stored as a vector of low-level features. This procedure relies on local logistic regression models to bridge the semantic gap between the low-level features, used to describe each image, and user preferences. This is the heart of our approach. The main advantage of these models is the facility of incorporating the

Acknowledgement

This work has been partially supported by projects DPI2013-45742-R. DPI2013-47279-C2-1-R and TIN2013-47090-C3-1-P from Spanish government.

Esther de Ves was born in Almansa (Spain). She received the M.S. degree in Physics and the Ph.D. in Computer Science from the University of Valencia in 1993 and 1999, respectively. Since 1994 she has been with the Department of Computer Science from the University of Valencia where she is an Assistant Professor. Her current interests are in the areas of texture analysis and multimedia databases retrieval.

References (37)

  • Ritendra Datta, Dhiraj Joshi, Jia Li, James Z. Wang, Technical report cse 06–009. Image Retrieval: Ideas, Influences,...
  • A. Del Bimbo

    Visual Information Retrieval

    (1999)
  • Thomas Deselaers et al.

    Features for image retrievalan experimental comparison

    Inf. Retrieval

    (2008)
  • Jerome Friedman et al.

    Regularization paths for generalized linear models via coordinate descent

    J. Stat. Softw.

    (2010)
  • Theo Gevers, Arnold Smeulders, The pictoseek www image search system, in: Proceedings of the IEEE International...
  • Iker Gondra et al.

    Improving image retrieval performance by inter-query learning with one-class support vector machines

    Neural Comput. Appl.

    (2004)
  • Ruben Granados, Joan Benavent, Xaro Benavent, Esther de Ves, Ana García-Serrano, Multimodal information approaches for...
  • Rob Hess, An open-source siftlibrary, in: Proceedings of the International Conference on Multimedia, MM ׳10, ACM, New...
  • Cited by (0)

    Esther de Ves was born in Almansa (Spain). She received the M.S. degree in Physics and the Ph.D. in Computer Science from the University of Valencia in 1993 and 1999, respectively. Since 1994 she has been with the Department of Computer Science from the University of Valencia where she is an Assistant Professor. Her current interests are in the areas of texture analysis and multimedia databases retrieval.

    Xaro Benavent-García was born in Valencia (Spain). She received the M.S. degree in Computer Science from the Polytechnic University of Valencia in 1994, and the Ph.D. in Computer Science from the University of Valencia in 2001. Since 1996 she has been with the Department of Computer Science from the University of Valencia where she is an Assistant Professor. Her current interests are in the areas of image database retrieval and multimodal fusion algorithms.

    Inmaculada Coma Tatay graduated in Physics at University of Valencia, Ph.D. in Computer Engineering at the University of Valencia. She is currently a Lecturer in the University València where she previously was a research fellow and assistant professor.

    Guillermo Ayala was born in Cartagena (Spain) in 1962. He graduated in Mathematics (1985) and received his Ph.D in Statistics (1988), both at the University of Valencia. He was a scholarship holder at CMU S. Juan de Ribera (Burjasot, Spain) from 1979 to 1987. He is currently a professor at the Department of Statistics and Operations Research in the University of Valencia, Spain. His research interests are in the areas of medical image analysis and applications of stochastic geometry in computer vision.

    View full text