Abstract
Evolutionary computation techniques are increasingly being applied to problems within Information Retrieval (IR). Genetic programming (GP) has previously been used with some success to evolve term-weighting schemes in IR. However, one fundamental problem with the solutions generated by this stochastic, non-deterministic process, is that they are often difficult to analyse. In this paper, we introduce two different distance measures between the phenotypes (ranked lists) of the solutions (term-weighting schemes) returned by a GP process. Using these distance measures, we develop trees which show how different solutions are clustered in the solution space. We show, using this framework, that our evolved solutions lie in a different part of the solution space than two of the best benchmark term-weighting schemes available.
Similar content being viewed by others
References
Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. In: SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval. New York, NY, USA, ACM Press, pp 33–40
Carterette B, Allan J (2005) Incremental test collections. In: CIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management. New York, NY, USA, ACM Press, pp 680–687
Choi J-H, Jung H-Y, Kim H-S and Cho H-G (2000). PhyloDraw: a phylogenetic tree drawing system. Bioinformatics 16(11): 1056–1058
Chowdhury A, McCabe MC, Grossman D, Frieder O (2002) Document normalization revisited. In: SIGIR ’02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA, ACM Press, pp 381–382
Cummins R and O’Riordan C (2006). An analysis of the solution space for genetically programmed term-weighting schemes in information retrieval. In: Bell, PMD and Sage, P (eds) 17th artificial intelligence and cognitive science conference (AICS 2006), pp. Queen’s University, Belfast Northern Ireland
Cummins R and O’Riordan C (2006). Evolving local and global weighting schemes in information retrieval. Inform Retrieval 9(3): 311–330
Cummins R and O’Riordan C (2006). A framework for the study of evolved term-weighting schemes in information retrieval. In: Stein, B and Kao, O (eds) TIR-06 text based information retrieval, workshop, ECAI 2006, pp. Riva del Garda, Italy
Fan W, Gordon MD and Pathak P (2004). A generic ranking function discovery framework by genetic programming for information retrieval. Inform Proces Manage 40(4): 587–602
Fang H, Zhai C (2005) An exploration of axiomatic approaches to information retrieval. In: SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval. New York, NY, USA, ACM Press, pp 480–487
HE B, Ounis I (2003) A study of parameter tuning for term frequency normalization. In: CIKM ’03: Proceedings of the twelfth international conference on Information and knowledge management. New York, NY, USA, ACM Press, pp 10–16
Jones KS, Walker S and Robertson SE (2000). A probabilistic model of information retrieval: development and comparative experiments. Inf Process Manage 36(6): 779–808
Koza JR (1992). Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA
Luke S (2001) When short runs beat long runs. In: Proceedings of the genetic and evolutionary computation conference (GECCO-2001). San Francisco, California, USA, Morgan Kaufmann, pp 74–80
Oren N (2002) Re-examining tf.idf based information retrieval with genetic programming. Proceedings of SAICSIT
Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In: SIGIR ’98: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. New York, NY, USA, ACM Press, pp 275–281
Salton G and Buckley C (1988). Term-weighting approaches in automatic text retrieval. Inform Process Manage 24(5): 513–523
Salton G, Wong A and Yang CS (1975). A vector space model for automatic indexing. Commun ACM 18(11): 613–620
Singhal A (2001). Modern information retrieval: a brief overview. Bull IEEE Comput Soc Tech Comm Data Eng 24(4): 35–43
Trotman A (2005). Learning to rank. Inform Retrieval 8: 359–381
Zobel J and Moffat A (1998). Exploring the similarity space. SIGIR Forum 32(1): 18–34
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cummins, R., O’Riordan, C. Evolved term-weighting schemes in Information Retrieval: an analysis of the solution space. Artif Intell Rev 26, 35–47 (2006). https://doi.org/10.1007/s10462-007-9034-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-007-9034-5