Skip to main content

Frequentist and Bayesian Approach to Information Retrieval

  • Conference paper
Book cover Advances in Information Retrieval (ECIR 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3936))

Included in the following conference series:

Abstract

We introduce the hypergeometric models KL, DLH and DLLH using the DFR approach, and we compare these models to other relevant models of IR. The hypergeometric models are based on the probability of observing two probabilities: the relative within-document term frequency and the entire collection term frequency. Hypergeometric models are parameter-free models of IR. Experiments show that these models have an excellent performance with small and very large collections. We provide their foundations from the same IR probability space of language modelling (LM). We finally discuss the difference between DFR and LM. Briefly, DFR is a frequentist (Type I), or combinatorial approach, whilst language models use a Bayesian (Type II) approach for mixing the two probabilities, being thus inherently parametric in its nature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amati, G.: Probability Models for Information Retrieval based on Divergence from Randomness. PhD thesis, University of Glasgow (June 2003)

    Google Scholar 

  2. Amati, G., Carpineto, C., Romano, G.: FUB at TREC 10 web track: a probabilistic framework for topic relevance term weighting. In: Voorhees, E., Harman, D. (eds.) Proceedings of the 10th Text Retrieval Conference TREC 2001, Gaithersburg, MD, pp. 182–191. NIST Special Publication 500-250 (2002)

    Google Scholar 

  3. Amati, G., Carpineto, C., Romano, G.: Fondazione Ugo Bordoni at TREC 2004. In: Voorhees, E., Harman, D. (eds.) Proceedings of the 13th Text Retrieval Conference TREC 2001, Gaithersburg, MD, NIST Special Publication 500-261 (2004)

    Google Scholar 

  4. Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems (TOIS) 20(4), 357–389 (2002)

    Article  Google Scholar 

  5. Bahl, L.R., Jelinek, F., Mercer, R.L.: A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-5 2, 179–190 (1983)

    Article  Google Scholar 

  6. Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: SIGIR 1999: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 222–229. ACM Press, New York (1999)

    Google Scholar 

  7. Brown, P.F., Cocke, J., Della Pietra, S.A., Della Pietra, V.J., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A statistical approach to machine translation. Computational Linguistics 16(2), 79–85 (1990)

    Google Scholar 

  8. Carpineto, C., De Mori, R., Romano, G., Bigi, B.: An information theoretic approach to automatic query expansion. ACM Transactions on Information Systems 19(1), 1–27 (2001)

    Article  Google Scholar 

  9. Feller, W.: An introduction to probability theory and its applications., 3rd edn., vol. I. John Wiley & Sons Inc., New York (1968)

    MATH  Google Scholar 

  10. Good, I.J.: The Estimation of Probabilities: an Essay onModern BayesianMethods, vol. 30. The M.I.T. Press, Cambridge (1968)

    Google Scholar 

  11. Harter, S.P.: A probabilistic approach to automatic keyword indexing. PhD thesis, Graduate Library, The University of Chicago, Thesis No. T25146 (1974)

    Google Scholar 

  12. He, B., Ounis, I.: A study of parameter tuning for term frequency normalization. In: Proceedings of the twelfth International Conference on Information and Knowledge Management. Springer, Heidelberg (2005)

    Google Scholar 

  13. He, B., Ounis, I.: A study of the Dirichlet priors for term frequency normalisation. In: SIGIR 2005: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 465–471. ACM Press, New York (2005)

    Google Scholar 

  14. Jelinek, F., Mercer, R.: Interpolated estimation of markov source parameters from sparse data. In: Pattern Recognition in Practice, pp. 381–397. North-Holland, Amsterdam (1980)

    Google Scholar 

  15. Lafferty, J., Zhai, C.: Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In: Proceedings of ACM SIGIR, New Orleans, Louisiana, USA, pp. 111–119. ACM Press, New York (2001)

    Google Scholar 

  16. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Johnson, D.: Terrier information retrieval platform. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 517–519. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  17. Plachouras, V., He, B., Ounis, I.: University of Glasgow at TREC2004: Experiments in Web, Robust and Terabyte tracks with Terrier. In: Proceedings of the 13th Text REtrieval Conference (TREC 2004), Gaithersburg, MD, NIST Special Pubblication 500-261 (2004)

    Google Scholar 

  18. Plochouras, V., Ounis, I.: Usefulness of hyperlink structure for query-biased topic distillation. In: Proceedings of the 27th annual international conference on Research and development in information retrieval, pp. 448–455. ACM Press, New York (2004)

    Google Scholar 

  19. Ponte, J., Croft, B.: A Language Modeling Approach in Information Retrieval. In: Croft, B., Moffat, A., Van Rijsbergen, C. (eds.) The 21st ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, pp. 275–281. ACM Press, New York (1998)

    Chapter  Google Scholar 

  20. Raghavan, V.V., Wong, S.K.: A critical analysis of the vector space model for information retrieval. Journal of the American Society for Information Science 37(5), 279–287 (1986)

    Article  Google Scholar 

  21. Renyi, A.: Foundations of probability. Holden-Day Press, San Francisco (1969)

    Google Scholar 

  22. Robertson, S., Walker, S.: Some simple approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval. In: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 232–241. Springer, Heidelberg (1994)

    Google Scholar 

  23. Salton, G.: The SMART Retrieval System. Prentice Hall, New Jersey (1971)

    Google Scholar 

  24. Salton, G., McGill, M.: Introduction to modern Information Retrieval. McGraw–Hill, New York (1983)

    MATH  Google Scholar 

  25. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  26. Zhai, C., Lafferty, J.: Model-based Feedback in the Language Modeling Approach to Information Retrieval. In: ClKM 2001, Atlanta, Georgia, USA, November 5-10, pp. 334–342. ACM Press, New York (2001)

    Google Scholar 

  27. Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Transactions on Information Systems 22(2), 179–214 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Amati, G. (2006). Frequentist and Bayesian Approach to Information Retrieval. In: Lalmas, M., MacFarlane, A., Rüger, S., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds) Advances in Information Retrieval. ECIR 2006. Lecture Notes in Computer Science, vol 3936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11735106_3

Download citation

  • DOI: https://doi.org/10.1007/11735106_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33347-0

  • Online ISBN: 978-3-540-33348-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics