skip to main content
10.1145/1772690.1772836acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
poster

Optimizing two stage bigram language models for IR

Published:26 April 2010Publication History

ABSTRACT

Although higher order language models (LMs) have shown benefit of capturing word dependencies for Information retrieval(IR), the tuning of the increased number of free parameters remains a formidable engineering challenge. Consequently,in many real world retrieval systems, applying higher order LMs is an exception rather than the rule. In this study, we address the parameter tuning problem using a framework based on a linear ranking model in which different component models are incorporated as features. Using unigram and bigram LMs with 2 stage smoothing as examples, we show that our method leads to a bigram LM that outperforms significantly its unigram counterpart and the well-tuned BM25 model.

References

  1. J. Gao Haoliang Qi, X. Xia, and J.-Y. Nie, "Linear discriminant model for information retrieval," in SIGIR, 2005, pp. 290--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Metzler and W. B. Croft, "A markov random field model for term dependencies," in SIGIR, 2005, pp. 472--479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Zhai and J. Lafferty, "Two-stage language models for information retrieval," in SIGIR, 2002, pp. 49--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Gao, J.-Y. Nie, G. Wu, and G. Cao, "Dependence language model for information retrieval," in SIGIR, 2004, pp. 170--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Zhai and J. Lafferty, "A study of smoothing methods for language models applied to information retrieval," ACM Trans. Inf. Syst., vol. 22, no. 2, pp. 179--214, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. J. D. Powell, "An efficient method for finding the minimum of a function of several variables without calculating derivatives," The Computer Journal, vol. 7, no. 2, pp. 155--162, 1964.Google ScholarGoogle ScholarCross RefCross Ref
  7. M. Taylor, H. Zaragoza, N. Craswell, S. Robertson, and C. Burges, "Optimisation methods for ranking functions with multiple parameters," in CIKM, 2006, pp. 585--593. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Jarvelin and J. Kekalainen, "Cumulated gain-based evaluation of IR techniques," ACM Trans. Inf. Syst., vol. 20, no. 4, pp. 422--446, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimizing two stage bigram language models for IR

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '10: Proceedings of the 19th international conference on World wide web
      April 2010
      1407 pages
      ISBN:9781605587998
      DOI:10.1145/1772690

      Copyright © 2010 Copyright is held by the author/owner(s)

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 April 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    ePub

    View this article in ePub.

    View ePub