Abstract
The term frequency normalisation parameter tuning is a crucial issue in information retrieval (IR), which has an important impact on the retrieval performance. The classical pivoted normalisation approach suffers from the collection-dependence problem. As a consequence, it requires relevance assessment for each given collection to obtain the optimal parameter setting. In this paper, we tackle the collection-dependence problem by proposing a new tuning method by measuring the normalisation effect. The proposed method refines and extends our methodology described in [7]. In our experiments, we evaluate our proposed tuning method on various TREC collections, for both the normalisation 2 of the Divergence From Randomness (DFR) models and the BM25’s normalisation method. Results show that for both normalisation methods, our tuning method significantly outperforms the robust empirically-obtained baselines over diverse TREC collections, while having a marginal computational cost.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amati, G.: Probabilistic Models for Information Retrieval based on Divergence from Randomness. PhD thesis, Department of Computing Science, University of Glasgow (2003)
Amati, G., van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems (TOIS) 20(4), 357–389 (2002)
Callan, J., Connell, M.: Query-based sampling of text databases. ACM Transactions on Information Systems (TOIS) 19(2), 97–130 (2001)
Chowdhury, A., McCabe, M.C., Grossman, D., Frieder, O.: Document normalization revisited. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, pp. 381–382 (2002)
Hawking, D.: Overview of the TREC-9 Web Track. In: Proceedings of the Nineth Text REtrieval Conference (TREC-9), Gaithersburg, MD, pp. 87–94 (2000)
Hawking, D., Voorhees, E., Craswell, N., Bailey, P.: Overview of the TREC-8 Web Track. In: Proceedings of the Eighth Text REtrieval Conference (TREC-8), Gaithersburg, MD, pp. 131–150 (1999)
He, B., Ounis, I.: A study of parameter tuning for term frequency normalization. In: Proceedings of the Twelveth ACM CIKM International Conference on Information and Knowledge Management, New Orleans, LA, pp. 10–16 (2003)
van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Department of Computer Science. University of Glasgow (1979)
Robertson, S., Walker, S., Beaulieu, M.M., Gatford, M., Payne, A.: Okapi at TREC-4. In: NIST Special Publication 500-236: The Fourth Text REtrieval Conference (TREC-4), Gaithersburg, MD, pp. 73–96 (1995)
Silverstein, C., Henzinger, M.R., Marais, H., Moricz, M.: Analysis of a very large web search engine query log. SIGIR Forum 33(1), 6–12 (1999)
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 21–29 (1996)
Sparck-Jones, K., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: Development and comparative experiments. Information Processing and Management 36, 779–840 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
He, B., Ounis, I. (2005). Term Frequency Normalisation Tuning for BM25 and DFR Models. In: Losada, D.E., Fernández-Luna, J.M. (eds) Advances in Information Retrieval. ECIR 2005. Lecture Notes in Computer Science, vol 3408. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31865-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-31865-1_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25295-5
Online ISBN: 978-3-540-31865-1
eBook Packages: Computer ScienceComputer Science (R0)