Skip to main content
Log in

Evolved term-weighting schemes in Information Retrieval: an analysis of the solution space

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Evolutionary computation techniques are increasingly being applied to problems within Information Retrieval (IR). Genetic programming (GP) has previously been used with some success to evolve term-weighting schemes in IR. However, one fundamental problem with the solutions generated by this stochastic, non-deterministic process, is that they are often difficult to analyse. In this paper, we introduce two different distance measures between the phenotypes (ranked lists) of the solutions (term-weighting schemes) returned by a GP process. Using these distance measures, we develop trees which show how different solutions are clustered in the solution space. We show, using this framework, that our evolved solutions lie in a different part of the solution space than two of the best benchmark term-weighting schemes available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. In: SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval. New York, NY, USA, ACM Press, pp 33–40

  • Carterette B, Allan J (2005) Incremental test collections. In: CIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management. New York, NY, USA, ACM Press, pp 680–687

  • Choi J-H, Jung H-Y, Kim H-S and Cho H-G (2000). PhyloDraw: a phylogenetic tree drawing system. Bioinformatics 16(11): 1056–1058

    Article  Google Scholar 

  • Chowdhury A, McCabe MC, Grossman D, Frieder O (2002) Document normalization revisited. In: SIGIR ’02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA, ACM Press, pp 381–382

  • Cummins R and O’Riordan C (2006). An analysis of the solution space for genetically programmed term-weighting schemes in information retrieval. In: Bell, PMD and Sage, P (eds) 17th artificial intelligence and cognitive science conference (AICS 2006), pp. Queen’s University, Belfast Northern Ireland

    Google Scholar 

  • Cummins R and O’Riordan C (2006). Evolving local and global weighting schemes in information retrieval. Inform Retrieval 9(3): 311–330

    Article  Google Scholar 

  • Cummins R and O’Riordan C (2006). A framework for the study of evolved term-weighting schemes in information retrieval. In: Stein, B and Kao, O (eds) TIR-06 text based information retrieval, workshop, ECAI 2006, pp. Riva del Garda, Italy

    Google Scholar 

  • Fan W, Gordon MD and Pathak P (2004). A generic ranking function discovery framework by genetic programming for information retrieval. Inform Proces Manage 40(4): 587–602

    Article  MATH  Google Scholar 

  • Fang H, Zhai C (2005) An exploration of axiomatic approaches to information retrieval. In: SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval. New York, NY, USA, ACM Press, pp 480–487

  • HE B, Ounis I (2003) A study of parameter tuning for term frequency normalization. In: CIKM ’03: Proceedings of the twelfth international conference on Information and knowledge management. New York, NY, USA, ACM Press, pp 10–16

  • Jones KS, Walker S and Robertson SE (2000). A probabilistic model of information retrieval: development and comparative experiments. Inf Process Manage 36(6): 779–808

    Article  Google Scholar 

  • Koza JR (1992). Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA

    MATH  Google Scholar 

  • Luke S (2001) When short runs beat long runs. In: Proceedings of the genetic and evolutionary computation conference (GECCO-2001). San Francisco, California, USA, Morgan Kaufmann, pp 74–80

  • Oren N (2002) Re-examining tf.idf based information retrieval with genetic programming. Proceedings of SAICSIT

  • Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In: SIGIR ’98: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. New York, NY, USA, ACM Press, pp 275–281

  • Salton G and Buckley C (1988). Term-weighting approaches in automatic text retrieval. Inform Process Manage 24(5): 513–523

    Article  Google Scholar 

  • Salton G, Wong A and Yang CS (1975). A vector space model for automatic indexing. Commun ACM 18(11): 613–620

    Article  MATH  Google Scholar 

  • Singhal A (2001). Modern information retrieval: a brief overview. Bull IEEE Comput Soc Tech Comm Data Eng 24(4): 35–43

    Google Scholar 

  • Trotman A (2005). Learning to rank. Inform Retrieval 8: 359–381

    Article  Google Scholar 

  • Zobel J and Moffat A (1998). Exploring the similarity space. SIGIR Forum 32(1): 18–34

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ronan Cummins.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cummins, R., O’Riordan, C. Evolved term-weighting schemes in Information Retrieval: an analysis of the solution space. Artif Intell Rev 26, 35–47 (2006). https://doi.org/10.1007/s10462-007-9034-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-007-9034-5

Keywords

Navigation