skip to main content
10.1145/2808194.2809486acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
short-paper

Verboseness Fission for BM25 Document Length Normalization

Authors Info & Claims
Published:27 September 2015Publication History

ABSTRACT

BM25 is probably the most well known term weighting model in Information Retrieval. It has, depending on the formula variant at hand, 2 or 3 parameters (k1, b, and k3). This paper addresses b - the document length normalization parameter. Based on the observation that the two cases previously discussed for length normalization (multi-topicality and verboseness) are actually three: multi-topicality, verboseness with word repetition (repetitiveness) and verboseness with synonyms, we propose and test a new length normalization method that removes the need for a b parameter in BM25. Testing the new method on a set of purposefully varied test collections, we observe that we can obtain results statistically indistinguishable from the optimal results, therefore removing the need for ground-truth based optimization.

References

  1. G. Amati and J. C. C. Van Rijsbergen. Probabilistic models for information retrieval based on divergence from randomness. TOIS, 20(4), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Chowdhury, M. C. McCabe, D. Grossman, and O. Frieder. Document Normalization Revisited. In Proc. of SIGIR, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Harman. Overview of the Fourth Text REtrieval Conference (TREC-4). In Proc. of TREC 4, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  4. B. He and I. Ounis. A Study of Parameter Tuning for Term Frequency Normalization. In Proc. of CIKM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. He and I. Ounis. A Study of the Dirichlet Priors for Term Frequency Normalisation. In Proc. of SIGIR, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. He and I. Ounis. Term Frequency Normalisation Tuning for BM25 and DFR Models. In Proc. of ECIR, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Lv and C. Zhai. Adaptive Term Frequency Normalization for BM25. In Proc. of CIKM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Lv and C. Zhai. Lower-bounding Term Frequency Normalization. In Proc. of CIKM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Lv and C. Zhai. When Documents Are Very Long, BM25 Fails! In Proc. of SIGIR, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Metzler and H. Zaragoza. Semi-parametric and non-parametric term weighting for information retrieval. In Proc. of ICTIR, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S.-H. Na, I.-S. Kang, and J.-H. Lee. Improving term frequency normalization for multi-topical documents and application to language modeling approaches. In Proc. of ECIR, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Robertson, S. Walker, M. Beaulieu, M. Gatford, and A. Payne. Okapi at TREC-4. In Proc. of TREC 4, 1995.Google ScholarGoogle Scholar
  13. S. Robertson and H. Zaragoza. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends in Information Retrieval, 3(4), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proc. of TREC-3, 1994.Google ScholarGoogle Scholar
  15. F. Rousseau and M. Vazirgiannis. Composition of TF Normalizations: New Insights on Scoring Functions for Ad Hoc IR. In Proc. of SIGIR, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Sakai. Alternatives to Bpref. In Proc. of SIGIR, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Singhal, C. Buckley, and M. Mitra. Pivoted Document Length Normalization. In Proc. of SIGIR, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Verboseness Fission for BM25 Document Length Normalization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval
      September 2015
      402 pages
      ISBN:9781450338332
      DOI:10.1145/2808194

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 September 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      ICTIR '15 Paper Acceptance Rate29of57submissions,51%Overall Acceptance Rate209of482submissions,43%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader