Elsevier

Neurocomputing

Volume 166, 20 October 2015, Pages 26-37
Neurocomputing

The effect of low-level image features on pseudo relevance feedback

https://doi.org/10.1016/j.neucom.2015.04.037Get rights and content

Abstract

Relevance feedback (RF) is a technique popularly used to improve the effectiveness of traditional content-based image retrieval systems. However, users must provide relevant and/or irrelevant images as feedback for their queries, which is a tedious task. To alleviate this problem, pseudo relevance feedback (PRF) can be utilized. It not only automates the manual component of RF, but can also provide reasonably good retrieval performance. Specifically, it is assumed that a fraction of the top-ranked images in the initial search results are pseudo-positive. The Rocchio algorithm is a classic approach for the implementation of RF/PRF, which is based on the query vector modification discipline. The aim is to reproduce a new query vector by taking the weighted sum of the original query and the mean vectors of the relevant and irrelevant sets. Image feature representation is the key factor affecting the PRF performance. This study is the first to examine the retrieval performances of 63 different image feature descriptors ranging from 64 to 10426 dimensionalities in the context of PRF. Experimental results are obtained based on the NUS-WIDE dataset which contains 22156 Flickr images associated with 69 concepts. It is shown that the combination of color moments, edges, wavelet textures, and locality-constrained linear coding of the bag-of-words model provides the optimal feature representation, giving relatively good retrieval effectiveness and reasonably good retrieval efficiency for Rocchio based PRF.

Introduction

Advances in computer and multimedia technologies have allowed the production of digital images and the creation of large repositories for image storage with little cost. This has led a rapid increase in the size of image collections for multi-fold purposes, including digital libraries, medical imaging, art and museum collections, journalism, advertising, home photo archives, and so on. Clearly, it is now necessary to design automated image retrieval systems which can operate on a large scale.

The traditional image retrieval approach is based on manual image indexing with keywords assigned to images by human indexers during the database creation stage. Relevant images can be retrieved by using the indexed keywords as queries. However, there are some limitations to manual indexing. For example, it is a very time-consuming and expensive process, especially when the size of the image collection is very large, e.g., hundreds of thousands of images [24]. In addition, different indexers may assign different keywords to the same images, or the same indexers may perform differently given different circumstances and different times. In addition, during retrieval, users may not be aware of or agree with the indexed keywords or terms for queries which can lead to unsatisfactory retrieval results.

Content-Based Image Retrieval (CBIR) [30], which was proposed in the early 1990s, is a technique for automatically indexing images by extracting (low-level) visual features, such as color, texture, and shape. The retrieval of images is based solely upon the indexed image features. Therefore, it is hypothesized that relevant images can be retrieved by calculating the similarity between the low-level image contents through browsing, navigation, query-by-example, and so on. Typically, images are represented as points in a high dimensional feature space. Then, a metric is used to measure the degree of dis/similarity between images in this space. Thus, images corresponding closely to the query are classified as similar to the query and retrieved. Although CBIR introduced automated image feature extraction and indexation, it did not overcome the so-called semantic gap which is described in greater detail below.

The semantic gap is the gap between the computer extracted and indexed low-level features and the high-level concepts (or semantics) of a user’s queries. In other words, the automated CBIR systems do not allow ready matching to the users’ requests. The notation of similarity in the user’s mind is typically based on high-level abstractions, such as activities, entities/objects, events, or some evoked emotions, among others. In this situation, retrieval by similarity using low-level features like color or shape will not be very effective. In other words, human similarity judgments do not obey the requirements of the similarity metric used in CBIR systems. In addition, general users usually find it difficult to search or query images by using color, texture, and/or shape features only. They tend to prefer textual or keyword-based queries since these are easier to use and allow their information needs to be represented more intuitively [30].

One method applied to solve the semantic gap problem that affects the retrieval effectiveness of CBIR systems is relevance feedback (RF) [1], [23]. RF is a technique used in traditional text-based information retrieval systems, namely a (supervised active learning) process intended to improve the system performance by refining the results of the original queries. The main idea, which is the so-called user-in-the-loop version, is to ask users to provide positive (relevant) and negative (irrelevant) examples as feedback on the initially retrieved document sets. RF can also be applied in image retrieval so that some retrieved images can be selected to receive positive or negative feedback for query refinement. In principle, RF is based on learning a set of ‘optimal’ feature weights for a query, or moving the query point toward the relevant images [40].

However, the duration of the iterative process required to meet the users’ needs varies depending on the relevance feedback algorithms used, the image database, and the users. In addition, it can be a tedious task to provide relevant and/or irrelevant images to the system. Pseudo relevance feedback (PRF) can be utilized to alleviate these problems. PRF automates the manual part of RF, so that users get improved retrieval performance without an extended interaction. Specifically, it is assumed that a fraction of the top-ranked images in the initial search results are pseudo-positive [37], [38]. One closely related concept is the recursive matching scheme used in facial image retrieval [15].

Although some of the top-retrieved documents may contain noise, PRF has been shown to outperform several other RF algorithms and it has been recognized as the baseline for RF methods [4], [18], [8].

The Rocchio algorithm is the classic algorithm used for the implementation of RF/PRF, modeling a way of incorporating RF information into the vector space model [25]. In particular, it focuses on query vector modification where a new query vector is produced by taking the weighted sum of the original query and the mean vectors of the relevant and irrelevant sets (c.f., Section 2.2).

Clearly, RF/PRF performance is heavily dependent on image feature representation. In past studies in the image retrieval literature, it has been shown that various visual features can be extracted for image content analysis and indexing [30], such as the combined color and texture features [9] or bag-of-words method [14]. Although some comparative studies have been conducted analyzing the correlation of various low-level features and their application in image retrieval [7], [22], the focus has not been on the relevance feedback problem. In other words, there have been very few studies focusing on comparing relevant feature representations in terms of PRF. Therefore, the aim of this paper is to answer the question of which type of image feature representation performs best for PRF, as determined by the Rocchio algorithm. Moreover, the query response (or execution) time for modifying the query vector is also very critical for interactive image retrieval applications, such as web searching. The computation cost required for using different feature representations containing different feature dimensionalities is likely to be different. This has also motivated us to further examine the retrieval efficiency of PRF using different image feature representations.

To sum up, the contributions of this paper are two-fold. First, the findings of this study allow us to identify the optimal feature representation(s) that can provide better retrieval effectiveness and efficiency, which has not been done before in PRF. Second, the optimal feature representation(s) can be used as the baseline for future related work using PRF.

The rest of this paper is organized as follows. Section 2 overviews several well-known image feature representation methods, and Section 3 briefly describes the Rocchio algorithm. Section 4 presents the experimental setup and results. Finally, some conclusions are given in Section 5.

Section snippets

Color

Color is the most commonly used visual feature for image retrieval due to the computational efficiency of its extraction. All colors can be represented by variable combinations of the three so-called additive primary colors: red (R), green (G), and blue (B).

A color histogram [33] is one common method used to represent color contents for indexing and retrieval. It shows the proportion of pixels of each color within the image, which is represented by the distribution of the number of pixels for

The Rocchio algorithm

The Rocchio algorithm can be regarded as a query vector modification approach that iteratively reformulates the query vector based on the feedback set in order to move the query toward a topological region of more relevant images and away from irrelevant ones [25]. It reformulates the query as a modified query qm byqm=αq0+β1|Dr|djDrdjγ1|Dnr|djDnrdjwhere q0is the original query vector, Drand Dnrare the set of known relevant and non-relevant images, respectively, and α, β, and γare

Experiments

This section is made up of two parts. First we introduce the experimental setup including the dataset used, low-level features extracted, the Rocchio parameters employed, and the evaluation metric considered. Second, relevant experimental results are presented, namely retrieval precision and query response time. In addition, the identified top feature representations are further compared using different Rocchio parameters, and their retrieval performance over the object and scene classes

Conclusion

Pseudo relevance feedback (PRF) is an approach for automating the manual component of the traditional relevance feedback process. The top-ranked images retrieved are then used for pseudo positive/negative feedback, even though some of them may contain noise information. CBIR systems using PRF can usually provide better retrieval performance than those without PRF.

The Rocchio algorithm is a classic algorithm used in the implementation of PRF, which is based on the query vector modification

Dr. Wei-Chao Lin is an assistant professor at the Department of Computer Science and Information Engineering, Hwa Hsia University of Technology, Taiwan. His research interests are machine learning and artificial intelligence applications.

References (40)

  • K. Collins-Thompson et al.

    Estimation and use of uncertainty in pseudo-relevance feedback

    in: ACM SIGIR Conference on Research and Development in Information Retrieval

    (2007)
  • Daubechies, I. (1992) Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics,...
  • J. Deng et al.

    What does classifying more than 10,000 image categories tell us?

    Eur. Conf. Comput. Vis.

    (2010)
  • T. Deselaers et al.

    Features for image retrieval: an experimental comparison

    Inf. Retrieval

    (2008)
  • J.V. Dillon, K. Collins-Thompson, A unified optimization framework for robust pseudo-relevance feedback algorithms, in:...
  • L. Duan et al.

    Improving web image search by bag-based reranking

    IEEE Trans. Image Process.

    (2011)
  • Y. Gao et al.

    Visual-textual joint relevance learning for tag-based social image search

    IEEE Trans. Image Process.

    (2013)
  • J. Huang et al.

    Image indexing using color correlograms

    IEEE Int. Conf. Comput. Vis. Pattern Recogn.

    (1997)
  • Y.-G. Jiang et al.

    Representations of keypoint-based semantic concept detection: a comprehensive study

    IEEE Trans. Multimedia

    (2010)
  • C.H.A. Koster et al.

    On the importance of parameter tuning in text categorization

  • Cited by (0)

    Dr. Wei-Chao Lin is an assistant professor at the Department of Computer Science and Information Engineering, Hwa Hsia University of Technology, Taiwan. His research interests are machine learning and artificial intelligence applications.

    Mr. Zong-Yao Chen is currently a Ph.D. student the Department of Information Management, National Central University, Taiwan. His research interests cover image processing, data mining and its applications.

    Dr. Shih-Wen Ke is an assistant professor at the Department of Information and Computer Engineering, Chung Yuan Christian University, Taiwan. His research covers information retrieval, machine learning, and data mining.

    Dr. Chih-Fong Tsai received a Ph.D. at School of Computing and Technology from the University of Sunderland, UK in 2005. He is now a professor at the Department of Information Management, National Central University, Taiwan. His current research focuses on multimedia information retrieval and data mining.

    Dr. Wei-Yang Lin is an associate professor at the Department of Computer Science and Information Engineering, National Chung Cheng University, Taiwan. His research covers computer vision and image processing.

    View full text