Skip to main content

Multinomial Randomness Models for Retrieval with Document Fields

  • Conference paper
Advances in Information Retrieval (ECIR 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4425))

Included in the following conference series:

  • 2119 Accesses

Abstract

Document fields, such as the title or the headings of a document, offer a way to consider the structure of documents for retrieval. Most of the proposed approaches in the literature employ either a linear combination of scores assigned to different fields, or a linear combination of frequencies in the term frequency normalisation component. In the context of the Divergence From Randomness framework, we have a sound opportunity to integrate document fields in the probabilistic randomness model. This paper introduces novel probabilistic models for incorporating fields in the retrieval process using a multinomial randomness model and its information theoretic approximation. The evaluation results from experiments conducted with a standard TREC Web test collection show that the proposed models perform as well as a state-of-the-art field-based weighting model, while at the same time, they are theoretically founded and more extensible than current field-based models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Amati, G., van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring divergence from randomness. ACM TOIS 20, 357–389 (2002)

    Article  Google Scholar 

  2. Craswell, N., Hawking, D.: Overview of TREC-2002 web track. In: Proceedings of TREC-2002, Gaithersburg, MD, USA (2002)

    Google Scholar 

  3. Craswell, N., et al.: Overview of the TREC-2003 web track. In: Proceedings of TREC-2003, Gaithersburg, MD, USA (2003)

    Google Scholar 

  4. Craswell, N., Hawking, D.: Overview of TREC-2004 web track. In: Proceedings of TREC-2004, Gaithersburg, MD, USA (2004)

    Google Scholar 

  5. Hawking, D., Upstill, T., Craswell, N.: Toward better weighting of anchors. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 512–513. ACM Press, New York (2004)

    Google Scholar 

  6. Jin, R., Hauptmann, A.G., Zhai, C.X.: Title language model for information retrieval. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 42–48. ACM Press, New York (2002)

    Chapter  Google Scholar 

  7. Macdonald, C., et al.: University of Glasgow at WebCLEF 2005: Experiments in per-field normalisation and language specific stemming. In: Proceedings of the Cross Language Evaluation Forum (CLEF) (2005)

    Google Scholar 

  8. Macdonald, C., et al.: University of Glasgow at TREC 2005: Experiments in Terabyte and Enterprise Tracks with Terrier. In: Proceedings of TREC-2005, Gaithersburg, Maryland, USA (2005)

    Google Scholar 

  9. Macdonald, C., Ounis, I.: Combining fields in known-item email search. In: SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 675–676. ACM Press, New York (2006)

    Chapter  Google Scholar 

  10. Ounis, I., et al.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: Proceedings of ACM SIGIR’06 Workshop on Open Source Information Retrieval (OSIR), ACM Press, New York (2006)

    Google Scholar 

  11. Plachouras, V.: Selective Web Information Retrieval. PhD thesis, Department of Computing Science, University of Glasgow (2006)

    Google Scholar 

  12. Press, W., et al.: Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, Cambridge (1992)

    Google Scholar 

  13. Rényi, A.: Foundations of probability. Holden-Day, San Francisco (1970)

    MATH  Google Scholar 

  14. Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Proceedings of the 13th ACM Conference on Information and Knowledge Management (CIKM’04), pp. 42–49. ACM Press, New York (2004)

    Chapter  Google Scholar 

  15. Soboroff, I.: On evaluating web search with very few relevant documents. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 530–531. ACM Press, New York (2004)

    Google Scholar 

  16. Yuret, D.: From Genetic Algorithms To Efficient Optimization. Master Thesis, MIT, A.I. Technical Report No. 1569 (1994)

    Google Scholar 

  17. Zaragoza, H., et al.: Microsoft Cambridge at TREC-13: Web and HARD tracks. In: Proceedings of TREC-2004, Gaithersburg, MD, USA (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Giambattista Amati Claudio Carpineto Giovanni Romano

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Plachouras, V., Ounis, I. (2007). Multinomial Randomness Models for Retrieval with Document Fields. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71496-5_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71494-1

  • Online ISBN: 978-3-540-71496-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics