Skip to main content
Log in

An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Machine learning approaches to information retrieval are becoming increasingly widespread. In this paper, we present term-weighting functions reported in the literature that were developed by four separate approaches using genetic programming. Recently, a number of axioms (constraints), from which all good term-weighting schemes should be deduced, have been developed and shown to be theoretically and empirically sound. We introduce a new axiom and empirically validate it by modifying the standard BM25 scheme. Furthermore, we analyse the BM25 scheme and the four learned schemes presented to determine if the schemes are consistent with the axioms. We find that one learned term-weighting approach is consistent with more axioms than any of the other schemes. An empirical evaluation of the schemes on various test collections and query lengths shows that the scheme that is consistent with more of the axioms outperforms the other schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’00). ACM Press, New York, pp 33–40

  • Chowdhury A, McCabe MC, Grossman D, Frieder O (2002) Document normalization revisited. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’02). ACM Press, Tampere, pp 381–382

  • Cummins R, O’Riordan C (2005) An evaluation of evolved term-weighting schemes in information retrieval. In: CIKM, pp 305–306

  • Cummins R, O’Riordan C (2006) Evolving local and global weighting schemes in information retrieval. Inf Retr 9(3): 311–330

    Article  Google Scholar 

  • Cummins R, O’Riordan C (2007a) An axiomatic comparison of learned term-weighting schemes in information retrieval. In: 18th Irish conference on artificial intelligence and cognitive science, AICS 2007, Dublin Institute of Technology

  • Cummins R, O’Riordan C (2007b) An axiomatic study of learned term-weighting schemes. In: SIGIR’07 workshop on learning to rank for information retrieval (LR4IR-2007). Amsterdam, Netherlands, pp 11–18

  • Fan W, Gordon MD, Pathak P (2004) A generic ranking function discovery framework by genetic programming for information retrieval. Inf Process Manage 40(4): 587–602

    Article  MATH  Google Scholar 

  • Fang H, Zhai C (2005) An exploration of axiomatic approaches to information retrieval. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’05). ACM Press, New York, pp 480–487

  • Fang H, Tao T, Zhai C (2004) A formal study of information retrieval heuristics. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’04). ACM Press, New York, pp 49–56

  • He B, Ounis I (2003) A study of parameter tuning for term frequency normalization. In: Proceedings of the twelfth international conference on information and knowledge management (CIKM ’03). ACM Press, New York, pp 10–16

  • He B, Ounis I (2005) Term frequency normalisation tuning for BM25 and DFR models. In: ECIR, Santiago de Compostela, Spain, pp 200–214

  • Heaps HS (1978) Information retrieval: computational and theoretical aspects. Academic Press Inc., Orlando

    MATH  Google Scholar 

  • Jung Y, Park H, Du D (2000) A balanced term-weighting scheme for effective document matching. Tech. Rep. TR008, Department of Computer Science, University of Minnesota, Minneapolis

  • Oren N (2002a) Improving the effectiveness of information retrieval with genetic programming. Master’s Thesis, Faculty of Science, University of the Witwatersrand, South Africa

  • Oren N (2002b) Re-examining tf.idf based information retrieval with genetic programming. In: Proceedings of SAICSIT 2002 conference, pp 224–234

  • Porter M (1980) An algorithm for suffix stripping. Program 14(3): 130–137

    Google Scholar 

  • Robertson SE, Walker S, Hancock-Beaulieu M, Gull A, Lau M (1995) Okapi at TREC-3. In: Harman DK (ed) The third Text REtrieval Conference (TREC-3). NIST, Gaithersburg

  • Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manage 24(5): 513–523

    Article  Google Scholar 

  • Trotman A (2005) Learning to rank. Inf Retr 8: 359–381

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ronan Cummins.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cummins, R., O’Riordan, C. An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions. Artif Intell Rev 28, 51–68 (2007). https://doi.org/10.1007/s10462-008-9074-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-008-9074-5

Keywords

Navigation