Improving metaheuristics convergence properties in inductive query by example using two strategies for reducing the search space

https://doi.org/10.1016/j.cor.2005.05.001Get rights and content

Abstract

In this paper we present two strategies for reducing the time of convergence of two metaheuristics (genetic algorithms and simulated annealing) in inductive query by example (IQBE), which is a process for assisting the users of a given information retrieval system in the formulation of queries. Both strategies are based on a reduction of the search space size of the metaheuristics. The first strategy that we introduce is a compression–expansion strategy, where the terms are arranged into sets of a given size. The second strategy we consider is the so-called restricted search, where the number of terms in the genetic algorithm and simulated annealing encodings is fixed to be a given number m. We describe the implementation of the strategies and analyze when they can be successful, and the main drawbacks associated with them.

Introduction

In recent years, the study of emergent and nature-based algorithms and their application to information retrieval problems has been massive [1], [2], [3], [4]. The majority of the information retrieval applications tackled with evolutionary-type approaches are related to relevance feedback (RF) or to inductive query by example (IQBE) applications [3]. RF is a method for query optimization which consists of using the information provided by the user after he or she has seen a first retrieval by the system. In IQBE, on the contrary, no feedback is needed from the user, and the query optimization is entirely based on off-line machine learning algorithms. In this paper we focus on the IQBE approach.

IQBE is a process for assisting the users in the formulation of queries, based on the application of machine learning methods. IQBE, initially proposed by Chen et al. [2], applies an off-line learning algorithm in order to find a set of relevant documents among the initial set of documents provided by a user. IQBE has been the subject of an intensive research in the last few years, since it is a paradigm which might be very useful for fulfilling the user's needs in future information retrieval systems.

In the literature many different approaches to IQBE can be found, the majority of them based on metaheuristics algorithms, such as genetic algorithms (GA) [5], genetic programming (GP) [6], simulated annealing (SA) [7] or hybrid techniques involving several of these algorithms. The first work on IQBE was the work by Chen et al. [2], where a standard GA with binary encoding and fixed side was tested and compared with an SA approach to IQBE. In addition to this first work in IQBE, GAs have been profusely applied to improve query definition, for example in [8], [9], [10], [11]. There are also approaches based on GP, for example [12], [13], and on SA [14], where a hybrid-simulated annealing-GP has been presented.

In spite of the huge work carried out in improving metaheuristic algorithms for IQBE, there are still several unresolved problems related to algorithms performance (specifically in GA and SA), such as poor results obtained in large queries design or convergence issues, mainly the time needed for convergence. This last point has not been fully considered in the literature until now, since the majority of the works are focused on improving the performance of the algorithms in terms of the fitness function considered, whereas the computational cost and time consumption of the algorithms have been considered a secondary issue.

In this paper we propose two different techniques for reducing the convergence time of metaheuristics (GAs and SA) in boolean IQBE, by means of a reduction of the metaheuristics’ search space. We consider the standard GA and SA algorithms, both using binary encoding, and the standard operators, selection, crossover and mutation in the GA [5] and bit mutation in the case of SA [7]. In this framework, we apply two techniques which allow a reduction of the convergence time of the algorithms. The idea is to test the two proposed techniques over simple, standard GA and SA algorithms applied to IQBE problems, by comparing the performance and convergence time of the algorithms with and without the proposed techniques.

First, we propose the use of a compression–expansion strategy. The objective of this strategy is evolving several bits at the same time, in such a way that, if the terms evolved at a time are significant enough, the algorithm is able to converge faster than the algorithms without compression scheme. In a second step, our algorithm refines the search in the expanded encoding, after removing the terms which have been eliminated in the compressed one. The compression–expansion encoding strategy proposed in this paper is inspired by the building block hypothesis in GAs [5]. Following the building block hypothesis, a GA constructs good solutions from small, high fitness sub-solutions. If, for a given problem, the building blocks can be identified in advance, evolving them together will considerably improve the performance of the algorithm used in this particular problem. The main difficulty is, however, that for a general problem it is nearly impossible to identify the building blocks in advance. However, IQBE is a special case, since the building blocks should be formed by those terms which are significant for the user.

The second technique we present in this paper is the so-called restricted search. The restricted search is an idea that has been successfully applied to the binary feature selection problem (FSP) in classification [15], [16]. The FSP consists of removing the meaningless features of a given classification problem, which only introduce noise to the classifier, without giving any helpful information to perform the classification task. The FSP can be tackled by means of metaheuristics such as GAs or SA, by using a binary string encoding. Using this encoding, a 1 stands for a given feature to be considered for the classifier, whereas a 0 in the encoding means that the feature should be removed from the training set. The restricted search for the FSP then consists in fixing the number of features to be considered by the classifier, introducing operators for removing or adding 1s to the binary strings in the GA (see [16] for details on restricted search for the FSP). In this paper we consider the application of the restricted search to the IQBE problem. We will show that this technique is able to improve the performance of a GA and an SA for IQBE in terms of convergence time to the optimal solution.

The rest of the paper is structured as follows: Section 2 presents the more important previous work related to IQBE and metaheuristics. Section 3 describes the compression–expansion strategy and the restricted search technique for improving the performance of GAs and SA in IQBE. In Section 4 we show by means of computational experiments the different possibilities that these techniques offer for improving the metaheuristics considered. Section 5 concludes the paper by giving some final remarks.

Section snippets

Previous approaches

IQBE is a process for assisting the users of an information retrieval system in the formulation of queries, based on the application of artificial intelligence methods. It was first proposed by Chen et al. [2]. In that paper, the authors presented a GA for learning the terms which better represent a relevant document provided by the user. In this approach an SA algorithm was also presented as another option for implementing an IQBE process. This first framework for IQBE was extended in order to

Improving metaheuristics convergence performance for IQBE

In this section we describe the two strategies presented for reducing the search space in IQBE problems. First we present the compression–expansion encoding strategy, and second, we introduce the restricted search strategy for IQBE.

Computational tests and analysis

In order to test our proposed techniques, we have used the well-known Cranfield documentary base. We have chosen this text collection because it has been used before in the majority of works dealing with metaheuristics for IQBE [3], [8], [18] for testing their performance, so it allows the comparison of results easily. The Cranfield documentary base has a total of 1398 documents related to all aspects of aeronautical engineering, with 225 associated queries. In order to test our techniques, we

Conclusions

In this paper we have presented and analyzed two strategies for reducing the time of convergence of two metaheuristics (genetic algorithms and simulated annealing) in IQBE. The first strategy that we have presented is a compression–expansion strategy, where the terms are arranged into sets of a given size. We have shown that if the significant terms of the query can be put together in the sets, the performance of the metaheuristics is considerably improved. However, if the significant terms are

Acknowledgements

The authors really appreciate the comments on the paper by the anonymous reviewers and Professor G. Laporte, which have helped to improve the quality of this article.

References (20)

  • M. Caramia et al.

    Improving search results with data mining in a thematic search engine

    Computers & Operations Research

    (2004)
  • H. Chen et al.

    A machine learning approach to inductive query by examples: an experiment using relevance feedback genetic algorithms and simulated annealing

    Journal of the American Society for Information Science

    (1998)
  • O. Cordón et al.

    A review on the application of evolutionary computation to information retrieval

    International Journal of Approximate Reasoning

    (2003)
  • S. Mitra et al.

    Data mining in soft computing framework: a survey

    IEEE Transactions on Neural Networks

    (2002)
  • D. Goldberg

    Genetic algorithms in search, optimization and machine learning

    (1989)
  • J. Koza

    Genetic programming

    (1992)
  • S. Kirpatrick et al.

    Optimization by simulated annealing

    Science

    (1983)
  • C. López-Pujalte et al.

    A test of genetic algorithms in relevance feedback

    Information Processing & Management

    (2002)
  • A.M. Robertson et al.

    An upperbound to the performance for ranked-output searching: optimal weighting of query terms using a genetic algorithm

    Journal of Documentation

    (1996)
  • L. Tamine et al.

    Multiple query evaluation based on an enhanced genetic algorithm

    Information Processing & Management

    (2003)
There are more references available in the full text version of this article.

Cited by (3)

  • Optimal switch location in mobile communication networks using hybrid genetic algorithms

    2008, Applied Soft Computing Journal
    Citation Excerpt :

    This constraint has been previously applied in the literature. Specifically, the same constraint regarding the number of 1s in GA and SA have been solved by means of the so-called restricted search operator in [26] and [27]. The restricted search basically considers one extra operator to be added to the conventional GA, in the following way: after the application of the crossover and mutation operators, the individual x will have p 1s that, in general, will be different from the desired number of desired 1s in x, M.

  • On the genotype compression and expansion for evolutionary algorithms in the continuous domain

    2021, GECCO 2021 Companion - Proceedings of the 2021 Genetic and Evolutionary Computation Conference Companion
View full text