Improving metaheuristics convergence properties in inductive query by example using two strategies for reducing the search space
Introduction
In recent years, the study of emergent and nature-based algorithms and their application to information retrieval problems has been massive [1], [2], [3], [4]. The majority of the information retrieval applications tackled with evolutionary-type approaches are related to relevance feedback (RF) or to inductive query by example (IQBE) applications [3]. RF is a method for query optimization which consists of using the information provided by the user after he or she has seen a first retrieval by the system. In IQBE, on the contrary, no feedback is needed from the user, and the query optimization is entirely based on off-line machine learning algorithms. In this paper we focus on the IQBE approach.
IQBE is a process for assisting the users in the formulation of queries, based on the application of machine learning methods. IQBE, initially proposed by Chen et al. [2], applies an off-line learning algorithm in order to find a set of relevant documents among the initial set of documents provided by a user. IQBE has been the subject of an intensive research in the last few years, since it is a paradigm which might be very useful for fulfilling the user's needs in future information retrieval systems.
In the literature many different approaches to IQBE can be found, the majority of them based on metaheuristics algorithms, such as genetic algorithms (GA) [5], genetic programming (GP) [6], simulated annealing (SA) [7] or hybrid techniques involving several of these algorithms. The first work on IQBE was the work by Chen et al. [2], where a standard GA with binary encoding and fixed side was tested and compared with an SA approach to IQBE. In addition to this first work in IQBE, GAs have been profusely applied to improve query definition, for example in [8], [9], [10], [11]. There are also approaches based on GP, for example [12], [13], and on SA [14], where a hybrid-simulated annealing-GP has been presented.
In spite of the huge work carried out in improving metaheuristic algorithms for IQBE, there are still several unresolved problems related to algorithms performance (specifically in GA and SA), such as poor results obtained in large queries design or convergence issues, mainly the time needed for convergence. This last point has not been fully considered in the literature until now, since the majority of the works are focused on improving the performance of the algorithms in terms of the fitness function considered, whereas the computational cost and time consumption of the algorithms have been considered a secondary issue.
In this paper we propose two different techniques for reducing the convergence time of metaheuristics (GAs and SA) in boolean IQBE, by means of a reduction of the metaheuristics’ search space. We consider the standard GA and SA algorithms, both using binary encoding, and the standard operators, selection, crossover and mutation in the GA [5] and bit mutation in the case of SA [7]. In this framework, we apply two techniques which allow a reduction of the convergence time of the algorithms. The idea is to test the two proposed techniques over simple, standard GA and SA algorithms applied to IQBE problems, by comparing the performance and convergence time of the algorithms with and without the proposed techniques.
First, we propose the use of a compression–expansion strategy. The objective of this strategy is evolving several bits at the same time, in such a way that, if the terms evolved at a time are significant enough, the algorithm is able to converge faster than the algorithms without compression scheme. In a second step, our algorithm refines the search in the expanded encoding, after removing the terms which have been eliminated in the compressed one. The compression–expansion encoding strategy proposed in this paper is inspired by the building block hypothesis in GAs [5]. Following the building block hypothesis, a GA constructs good solutions from small, high fitness sub-solutions. If, for a given problem, the building blocks can be identified in advance, evolving them together will considerably improve the performance of the algorithm used in this particular problem. The main difficulty is, however, that for a general problem it is nearly impossible to identify the building blocks in advance. However, IQBE is a special case, since the building blocks should be formed by those terms which are significant for the user.
The second technique we present in this paper is the so-called restricted search. The restricted search is an idea that has been successfully applied to the binary feature selection problem (FSP) in classification [15], [16]. The FSP consists of removing the meaningless features of a given classification problem, which only introduce noise to the classifier, without giving any helpful information to perform the classification task. The FSP can be tackled by means of metaheuristics such as GAs or SA, by using a binary string encoding. Using this encoding, a 1 stands for a given feature to be considered for the classifier, whereas a 0 in the encoding means that the feature should be removed from the training set. The restricted search for the FSP then consists in fixing the number of features to be considered by the classifier, introducing operators for removing or adding 1s to the binary strings in the GA (see [16] for details on restricted search for the FSP). In this paper we consider the application of the restricted search to the IQBE problem. We will show that this technique is able to improve the performance of a GA and an SA for IQBE in terms of convergence time to the optimal solution.
The rest of the paper is structured as follows: Section 2 presents the more important previous work related to IQBE and metaheuristics. Section 3 describes the compression–expansion strategy and the restricted search technique for improving the performance of GAs and SA in IQBE. In Section 4 we show by means of computational experiments the different possibilities that these techniques offer for improving the metaheuristics considered. Section 5 concludes the paper by giving some final remarks.
Section snippets
Previous approaches
IQBE is a process for assisting the users of an information retrieval system in the formulation of queries, based on the application of artificial intelligence methods. It was first proposed by Chen et al. [2]. In that paper, the authors presented a GA for learning the terms which better represent a relevant document provided by the user. In this approach an SA algorithm was also presented as another option for implementing an IQBE process. This first framework for IQBE was extended in order to
Improving metaheuristics convergence performance for IQBE
In this section we describe the two strategies presented for reducing the search space in IQBE problems. First we present the compression–expansion encoding strategy, and second, we introduce the restricted search strategy for IQBE.
Computational tests and analysis
In order to test our proposed techniques, we have used the well-known Cranfield documentary base. We have chosen this text collection because it has been used before in the majority of works dealing with metaheuristics for IQBE [3], [8], [18] for testing their performance, so it allows the comparison of results easily. The Cranfield documentary base has a total of 1398 documents related to all aspects of aeronautical engineering, with 225 associated queries. In order to test our techniques, we
Conclusions
In this paper we have presented and analyzed two strategies for reducing the time of convergence of two metaheuristics (genetic algorithms and simulated annealing) in IQBE. The first strategy that we have presented is a compression–expansion strategy, where the terms are arranged into sets of a given size. We have shown that if the significant terms of the query can be put together in the sets, the performance of the metaheuristics is considerably improved. However, if the significant terms are
Acknowledgements
The authors really appreciate the comments on the paper by the anonymous reviewers and Professor G. Laporte, which have helped to improve the quality of this article.
References (20)
- et al.
Improving search results with data mining in a thematic search engine
Computers & Operations Research
(2004) - et al.
A machine learning approach to inductive query by examples: an experiment using relevance feedback genetic algorithms and simulated annealing
Journal of the American Society for Information Science
(1998) - et al.
A review on the application of evolutionary computation to information retrieval
International Journal of Approximate Reasoning
(2003) - et al.
Data mining in soft computing framework: a survey
IEEE Transactions on Neural Networks
(2002) Genetic algorithms in search, optimization and machine learning
(1989)Genetic programming
(1992)- et al.
Optimization by simulated annealing
Science
(1983) - et al.
A test of genetic algorithms in relevance feedback
Information Processing & Management
(2002) - et al.
An upperbound to the performance for ranked-output searching: optimal weighting of query terms using a genetic algorithm
Journal of Documentation
(1996) - et al.
Multiple query evaluation based on an enhanced genetic algorithm
Information Processing & Management
(2003)
Cited by (3)
A Tutorial On the design, experimentation and application of metaheuristic algorithms to real-World optimization problems
2021, Swarm and Evolutionary ComputationOptimal switch location in mobile communication networks using hybrid genetic algorithms
2008, Applied Soft Computing JournalCitation Excerpt :This constraint has been previously applied in the literature. Specifically, the same constraint regarding the number of 1s in GA and SA have been solved by means of the so-called restricted search operator in [26] and [27]. The restricted search basically considers one extra operator to be added to the conventional GA, in the following way: after the application of the crossover and mutation operators, the individual x will have p 1s that, in general, will be different from the desired number of desired 1s in x, M.
On the genotype compression and expansion for evolutionary algorithms in the continuous domain
2021, GECCO 2021 Companion - Proceedings of the 2021 Genetic and Evolutionary Computation Conference Companion