Abstract
This paper presents an attempt to tackle information retrieval (IR) with meta-heuristics. For this aim, we propose two ACO algorithms for information retrieval on large-scale data sets. The main hard issue of this study resides in modeling information retrieval using meta-heuristics that often necessitate links between documents in order to realize move operations from one document to another during the search process. The first novelty in this work is the design of such model to adapt ACO approaches and even other meta-heuristics to IR. The second one resides in the hybridization of ACO approaches with tabu search in order to achieve more efficiency. The designed algorithms and a classical information retrieval method were implemented for comparison purposes. Experiments were conducted on CACM, RCV1 and random benchmarks. Numerical results show that ACO is scalable while achieving the same performance as the traditional IR process in terms of solutions quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dorigo, M., Di Caro, G., Gambardella, L.M.: Ant algorithms for discrete optimization. Artif. Life. 5–3, 137–172 (1999)
Van Rijsbergen C.J.: Information Retrieval. Information Retrieval Group University of Glasgow, Glasgow (1979)
Bulheimer, B., Hartl, R.F., Strauss, C.: A new rank based version of the ant system, a computational study. Technical Report POM -03/97, Institute of Management Science, University of Vienna (1997)
Cordon, O., Deviana, I., Herrera, F., Moreno, L.: A new ACO model integrating evolutionary computation concepts: the best-worst ant system. In: From Ant Colonies to Artificial Ants, ANTS 2000, pp. 22–29 (2000)
Dorigo, M., Gambardella, L.M.: Ant algorithms for the traveling salesman problem. Biosystems 43, 73–81 (1997)
Hsinchun, C.: Machine learning for information retrieval: neural networks, symbolic learning and genetic algorithms. J. Am. Soc. Inf. Sci. 46, 194–216 (1995)
Doerner, K., Hartl, R.F., Reimann, M.: Cooperative ant colonies for optimizing resource allocation in transportation. LNCS Springer Verlag 2037, 70–79 (2001)
Colorni, A., Dorigo, M., Maniezzo, V., Trubian, M.: Ant system for job-shop scheduling. Belgian J. Oper. Res. Stat. Comput. Sci. 34–1, 39–53 (1994)
Gambardella, L.M., Taillard, E., Dorigo, M.: Ant algorithms for the QAP. Technical Report IDSIA 97–4. Lugano, Switzerland (1997)
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Pathak, P., Gordon, M., Fan, W.: Effective Information retrieval using genetic algorithms based matching functions adaptation. In: 33rd IEEE HICSS (2000)
Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Inf. Process. Manage. 24, 513–523 (1988)
Stutzle, T., Hoos, H.: Improving the ant system: a detailed report on the MAX-MIN ant system. In: ICANGA, pp. 245–249. Springer (1997)
Baeza-Yates, R., Ribiero-Neto, B.: Modern Information Retrieval. Wesley Longman Publishing Co., Inc., Boston (1999)
Mahdavi, M., Chehreghani, M.H., Abolhassani, H., Forsati R.: Novel meta-heuristic algorithms for clustering web documents. Appl. Math. Comput. 201, 441–451 (2008)
Zhengyu, Z., Xinghuan, C., Qingsheng, Z., Qihong, X.: A GA-based query optimization method for web information retrieval. Appl. Math. Comput. 185, 919–930 (2007)
Lesk, M.E., Schmidt, E.: Lex—A lexical analyzer generator. UNIX time-sharing system: UNIX Programmer’s Manual, 7th edn, vol. 2B (1975)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Benckmarks
Benckmarks
1.1 CACM
CACM is a collection of article abstracts published in ACM journal between 1958 and 1979. Table 6 shows its characteristics. Although the designed algorithms aim at searching large scale collections of documents, they work in a good way on this small collection and outperform the exact algorithms in terms of runtime. However the collection remains too small to observe the real impact brought by the proposed approach.
1.2 RCV1
Reuters Corpus Volume I (RCV1) is a collection of more than 800.000 documents representing archives published by Reuters, Ltd. It is now publicly available for use by researchers, Table 7 shows its parameters sizes.
1.3 The Random collection
A larger random collection has been created automatically, Table 8 shows its characteristics.
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Drias, Y., Kechid, S. (2015). Hybrid ACO and Tabu Search for Large Scale Information Retrieval. In: Laalaoui, Y., Bouguila, N. (eds) Artificial Intelligence Applications in Information and Communication Technologies. Studies in Computational Intelligence, vol 607. Springer, Cham. https://doi.org/10.1007/978-3-319-19833-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-19833-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19832-3
Online ISBN: 978-3-319-19833-0
eBook Packages: EngineeringEngineering (R0)