Improving metaheuristics convergence properties in inductive query by example using two strategies for reducing the search space

doi:10.1016/j.cor.2005.05.001

Computers & Operations Research

Volume 34, Issue 1, January 2007, Pages 91-106

https://doi.org/10.1016/j.cor.2005.05.001 Get rights and content

Abstract

In this paper we present two strategies for reducing the time of convergence of two metaheuristics (genetic algorithms and simulated annealing) in inductive query by example (IQBE), which is a process for assisting the users of a given information retrieval system in the formulation of queries. Both strategies are based on a reduction of the search space size of the metaheuristics. The first strategy that we introduce is a compression–expansion strategy, where the terms are arranged into sets of a given size. The second strategy we consider is the so-called restricted search, where the number of terms in the genetic algorithm and simulated annealing encodings is fixed to be a given number m. We describe the implementation of the strategies and analyze when they can be successful, and the main drawbacks associated with them.

Introduction

In recent years, the study of emergent and nature-based algorithms and their application to information retrieval problems has been massive [1], [2], [3], [4]. The majority of the information retrieval applications tackled with evolutionary-type approaches are related to relevance feedback (RF) or to inductive query by example (IQBE) applications [3]. RF is a method for query optimization which consists of using the information provided by the user after he or she has seen a first retrieval by the system. In IQBE, on the contrary, no feedback is needed from the user, and the query optimization is entirely based on off-line machine learning algorithms. In this paper we focus on the IQBE approach.

IQBE is a process for assisting the users in the formulation of queries, based on the application of machine learning methods. IQBE, initially proposed by Chen et al. [2], applies an off-line learning algorithm in order to find a set of relevant documents among the initial set of documents provided by a user. IQBE has been the subject of an intensive research in the last few years, since it is a paradigm which might be very useful for fulfilling the user's needs in future information retrieval systems.

In the literature many different approaches to IQBE can be found, the majority of them based on metaheuristics algorithms, such as genetic algorithms (GA) [5], genetic programming (GP) [6], simulated annealing (SA) [7] or hybrid techniques involving several of these algorithms. The first work on IQBE was the work by Chen et al. [2], where a standard GA with binary encoding and fixed side was tested and compared with an SA approach to IQBE. In addition to this first work in IQBE, GAs have been profusely applied to improve query definition, for example in [8], [9], [10], [11]. There are also approaches based on GP, for example [12], [13], and on SA [14], where a hybrid-simulated annealing-GP has been presented.

In spite of the huge work carried out in improving metaheuristic algorithms for IQBE, there are still several unresolved problems related to algorithms performance (specifically in GA and SA), such as poor results obtained in large queries design or convergence issues, mainly the time needed for convergence. This last point has not been fully considered in the literature until now, since the majority of the works are focused on improving the performance of the algorithms in terms of the fitness function considered, whereas the computational cost and time consumption of the algorithms have been considered a secondary issue.

In this paper we propose two different techniques for reducing the convergence time of metaheuristics (GAs and SA) in boolean IQBE, by means of a reduction of the metaheuristics’ search space. We consider the standard GA and SA algorithms, both using binary encoding, and the standard operators, selection, crossover and mutation in the GA [5] and bit mutation in the case of SA [7]. In this framework, we apply two techniques which allow a reduction of the convergence time of the algorithms. The idea is to test the two proposed techniques over simple, standard GA and SA algorithms applied to IQBE problems, by comparing the performance and convergence time of the algorithms with and without the proposed techniques.

First, we propose the use of a compression–expansion strategy. The objective of this strategy is evolving several bits at the same time, in such a way that, if the terms evolved at a time are significant enough, the algorithm is able to converge faster than the algorithms without compression scheme. In a second step, our algorithm refines the search in the expanded encoding, after removing the terms which have been eliminated in the compressed one. The compression–expansion encoding strategy proposed in this paper is inspired by the building block hypothesis in GAs [5]. Following the building block hypothesis, a GA constructs good solutions from small, high fitness sub-solutions. If, for a given problem, the building blocks can be identified in advance, evolving them together will considerably improve the performance of the algorithm used in this particular problem. The main difficulty is, however, that for a general problem it is nearly impossible to identify the building blocks in advance. However, IQBE is a special case, since the building blocks should be formed by those terms which are significant for the user.

The second technique we present in this paper is the so-called restricted search. The restricted search is an idea that has been successfully applied to the binary feature selection problem (FSP) in classification [15], [16]. The FSP consists of removing the meaningless features of a given classification problem, which only introduce noise to the classifier, without giving any helpful information to perform the classification task. The FSP can be tackled by means of metaheuristics such as GAs or SA, by using a binary string encoding. Using this encoding, a 1 stands for a given feature to be considered for the classifier, whereas a 0 in the encoding means that the feature should be removed from the training set. The restricted search for the FSP then consists in fixing the number of features to be considered by the classifier, introducing operators for removing or adding 1s to the binary strings in the GA (see [16] for details on restricted search for the FSP). In this paper we consider the application of the restricted search to the IQBE problem. We will show that this technique is able to improve the performance of a GA and an SA for IQBE in terms of convergence time to the optimal solution.

The rest of the paper is structured as follows: Section 2 presents the more important previous work related to IQBE and metaheuristics. Section 3 describes the compression–expansion strategy and the restricted search technique for improving the performance of GAs and SA in IQBE. In Section 4 we show by means of computational experiments the different possibilities that these techniques offer for improving the metaheuristics considered. Section 5 concludes the paper by giving some final remarks.

Section snippets

Previous approaches

IQBE is a process for assisting the users of an information retrieval system in the formulation of queries, based on the application of artificial intelligence methods. It was first proposed by Chen et al. [2]. In that paper, the authors presented a GA for learning the terms which better represent a relevant document provided by the user. In this approach an SA algorithm was also presented as another option for implementing an IQBE process. This first framework for IQBE was extended in order to

Improving metaheuristics convergence performance for IQBE

In this section we describe the two strategies presented for reducing the search space in IQBE problems. First we present the compression–expansion encoding strategy, and second, we introduce the restricted search strategy for IQBE.

Computational tests and analysis

In order to test our proposed techniques, we have used the well-known Cranfield documentary base. We have chosen this text collection because it has been used before in the majority of works dealing with metaheuristics for IQBE [3], [8], [18] for testing their performance, so it allows the comparison of results easily. The Cranfield documentary base has a total of 1398 documents related to all aspects of aeronautical engineering, with 225 associated queries. In order to test our techniques, we

Conclusions

In this paper we have presented and analyzed two strategies for reducing the time of convergence of two metaheuristics (genetic algorithms and simulated annealing) in IQBE. The first strategy that we have presented is a compression–expansion strategy, where the terms are arranged into sets of a given size. We have shown that if the significant terms of the query can be put together in the sets, the performance of the metaheuristics is considerably improved. However, if the significant terms are

Acknowledgements

The authors really appreciate the comments on the paper by the anonymous reviewers and Professor G. Laporte, which have helped to improve the quality of this article.

References (20)

M. Caramia et al.
Improving search results with data mining in a thematic search engine
Computers & Operations Research
(2004)
H. Chen et al.
A machine learning approach to inductive query by examples: an experiment using relevance feedback genetic algorithms and simulated annealing
Journal of the American Society for Information Science
(1998)
O. Cordón et al.
A review on the application of evolutionary computation to information retrieval
International Journal of Approximate Reasoning
(2003)
S. Mitra et al.
Data mining in soft computing framework: a survey
IEEE Transactions on Neural Networks
(2002)
D. Goldberg
Genetic algorithms in search, optimization and machine learning
(1989)
J. Koza
Genetic programming
(1992)
S. Kirpatrick et al.
Optimization by simulated annealing
Science
(1983)
C. López-Pujalte et al.
A test of genetic algorithms in relevance feedback
Information Processing & Management
(2002)
A.M. Robertson et al.
An upperbound to the performance for ranked-output searching: optimal weighting of query terms using a genetic algorithm
Journal of Documentation
(1996)
L. Tamine et al.
Multiple query evaluation based on an enhanced genetic algorithm
Information Processing & Management
(2003)

There are more references available in the full text version of this article.

Cited by (3)

A Tutorial On the design, experimentation and application of metaheuristic algorithms to real-World optimization problems
2021, Swarm and Evolutionary Computation
In the last few years, the formulation of real-world optimization problems and their efficient solution via metaheuristic algorithms has been a catalyst for a myriad of research studies. In spite of decades of historical advancements on the design and use of metaheuristics, large difficulties still remain in regards to the understandability, algorithmic design uprightness, and performance verifiability of new technical achievements. A clear example stems from the scarce replicability of works dealing with metaheuristics used for optimization, which is often infeasible due to ambiguity and lack of detail in the presentation of the methods to be reproduced. Additionally, in many cases, there is a questionable statistical significance of their reported results. This work aims at providing the audience with a proposal of good practices which should be embraced when conducting studies about metaheuristics methods used for optimization in order to provide scientific rigor, value and transparency. To this end, we introduce a step by step methodology covering every research phase that should be followed when addressing this scientific field. Specifically, frequently overlooked yet crucial aspects and useful recommendations will be discussed in regards to the formulation of the problem, solution encoding, implementation of search operators, evaluation metrics, design of experiments, and considerations for real-world performance, among others. Finally, we will outline important considerations, challenges, and research directions for the success of newly developed optimization metaheuristics in their deployment and operation over real-world application environments.
Optimal switch location in mobile communication networks using hybrid genetic algorithms
2008, Applied Soft Computing Journal
Citation Excerpt :
This constraint has been previously applied in the literature. Specifically, the same constraint regarding the number of 1s in GA and SA have been solved by means of the so-called restricted search operator in [26] and [27]. The restricted search basically considers one extra operator to be added to the conventional GA, in the following way: after the application of the crossover and mutation operators, the individual x will have p 1s that, in general, will be different from the desired number of desired 1s in x, M.
The optimal positioning of switches in a mobile communication network is an important task, which can save costs and improve the performance of the network. In this paper we propose a model for establishing which are the best nodes of the network for allocating the available switches, and several hybrid genetic algorithms to solve the problem. The proposed model is based on the so-called capacitated p-median problem, which have been previously tackled in the literature. This problem can be split in two subproblems: the selection of the best set of switches, and a terminal assignment problem to evaluate each selection of switches. The hybrid genetic algorithms for solving the problem are formed by a conventional genetic algorithm, with a restricted search, and several local search heuristics. In this work we also develop novel heuristics for solving the terminal assignment problem in a fast and accurate way. Finally, we show that our novel approaches, hybridized with the genetic algorithm, outperform existing algorithms in the literature for the p-median problem.
On the genotype compression and expansion for evolutionary algorithms in the continuous domain
2021, GECCO 2021 Companion - Proceedings of the 2021 Genetic and Evolutionary Computation Conference Companion

View full text

Improving metaheuristics convergence properties in inductive query by example using two strategies for reducing the search space

Abstract

Introduction

Section snippets

Previous approaches

Improving metaheuristics convergence performance for IQBE

Computational tests and analysis

Conclusions

Acknowledgements

Improving search results with data mining in a thematic search engine

Computers & Operations Research

A machine learning approach to inductive query by examples: an experiment using relevance feedback genetic algorithms and simulated annealing

Journal of the American Society for Information Science

A review on the application of evolutionary computation to information retrieval

International Journal of Approximate Reasoning

Data mining in soft computing framework: a survey

IEEE Transactions on Neural Networks

Genetic algorithms in search, optimization and machine learning

Genetic programming

Optimization by simulated annealing

Science

A test of genetic algorithms in relevance feedback

Information Processing & Management

An upperbound to the performance for ranked-output searching: optimal weighting of query terms using a genetic algorithm

Journal of Documentation

Multiple query evaluation based on an enhanced genetic algorithm

Information Processing & Management