Abstract
Document fields, such as the title or the headings of a document, offer a way to consider the structure of documents for retrieval. Most of the proposed approaches in the literature employ either a linear combination of scores assigned to different fields, or a linear combination of frequencies in the term frequency normalisation component. In the context of the Divergence From Randomness framework, we have a sound opportunity to integrate document fields in the probabilistic randomness model. This paper introduces novel probabilistic models for incorporating fields in the retrieval process using a multinomial randomness model and its information theoretic approximation. The evaluation results from experiments conducted with a standard TREC Web test collection show that the proposed models perform as well as a state-of-the-art field-based weighting model, while at the same time, they are theoretically founded and more extensible than current field-based models.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amati, G., van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring divergence from randomness. ACM TOIS 20, 357–389 (2002)
Craswell, N., Hawking, D.: Overview of TREC-2002 web track. In: Proceedings of TREC-2002, Gaithersburg, MD, USA (2002)
Craswell, N., et al.: Overview of the TREC-2003 web track. In: Proceedings of TREC-2003, Gaithersburg, MD, USA (2003)
Craswell, N., Hawking, D.: Overview of TREC-2004 web track. In: Proceedings of TREC-2004, Gaithersburg, MD, USA (2004)
Hawking, D., Upstill, T., Craswell, N.: Toward better weighting of anchors. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 512–513. ACM Press, New York (2004)
Jin, R., Hauptmann, A.G., Zhai, C.X.: Title language model for information retrieval. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 42–48. ACM Press, New York (2002)
Macdonald, C., et al.: University of Glasgow at WebCLEF 2005: Experiments in per-field normalisation and language specific stemming. In: Proceedings of the Cross Language Evaluation Forum (CLEF) (2005)
Macdonald, C., et al.: University of Glasgow at TREC 2005: Experiments in Terabyte and Enterprise Tracks with Terrier. In: Proceedings of TREC-2005, Gaithersburg, Maryland, USA (2005)
Macdonald, C., Ounis, I.: Combining fields in known-item email search. In: SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 675–676. ACM Press, New York (2006)
Ounis, I., et al.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: Proceedings of ACM SIGIR’06 Workshop on Open Source Information Retrieval (OSIR), ACM Press, New York (2006)
Plachouras, V.: Selective Web Information Retrieval. PhD thesis, Department of Computing Science, University of Glasgow (2006)
Press, W., et al.: Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, Cambridge (1992)
Rényi, A.: Foundations of probability. Holden-Day, San Francisco (1970)
Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Proceedings of the 13th ACM Conference on Information and Knowledge Management (CIKM’04), pp. 42–49. ACM Press, New York (2004)
Soboroff, I.: On evaluating web search with very few relevant documents. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 530–531. ACM Press, New York (2004)
Yuret, D.: From Genetic Algorithms To Efficient Optimization. Master Thesis, MIT, A.I. Technical Report No. 1569 (1994)
Zaragoza, H., et al.: Microsoft Cambridge at TREC-13: Web and HARD tracks. In: Proceedings of TREC-2004, Gaithersburg, MD, USA (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Plachouras, V., Ounis, I. (2007). Multinomial Randomness Models for Retrieval with Document Fields. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-71496-5_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71494-1
Online ISBN: 978-3-540-71496-5
eBook Packages: Computer ScienceComputer Science (R0)