poster

Optimizing two stage bigram language models for IR

Authors:
Sara Javanmardi

University of California Irvine, Irvine, CA, USA

University of California Irvine, Irvine, CA, USA
View Profile

,
Jianfeng Gao

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Kuansan Wang

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

WWW '10: Proceedings of the 19th international conference on World wide webApril 2010Pages 1125–1126https://doi.org/10.1145/1772690.1772836

Published:26 April 2010Publication History

WWW '10: Proceedings of the 19th international conference on World wide web

Pages 1125–1126

ABSTRACT

Although higher order language models (LMs) have shown benefit of capturing word dependencies for Information retrieval(IR), the tuning of the increased number of free parameters remains a formidable engineering challenge. Consequently,in many real world retrieval systems, applying higher order LMs is an exception rather than the rule. In this study, we address the parameter tuning problem using a framework based on a linear ranking model in which different component models are incorporated as features. Using unigram and bigram LMs with 2 stage smoothing as examples, we show that our method leads to a bigram LM that outperforms significantly its unigram counterpart and the well-tuned BM25 model.

References

J. Gao Haoliang Qi, X. Xia, and J.-Y. Nie, "Linear discriminant model for information retrieval," in SIGIR, 2005, pp. 290--297. Google ScholarDigital Library
D. Metzler and W. B. Croft, "A markov random field model for term dependencies," in SIGIR, 2005, pp. 472--479. Google ScholarDigital Library
C. Zhai and J. Lafferty, "Two-stage language models for information retrieval," in SIGIR, 2002, pp. 49--56. Google ScholarDigital Library
J. Gao, J.-Y. Nie, G. Wu, and G. Cao, "Dependence language model for information retrieval," in SIGIR, 2004, pp. 170--177. Google ScholarDigital Library
C. Zhai and J. Lafferty, "A study of smoothing methods for language models applied to information retrieval," ACM Trans. Inf. Syst., vol. 22, no. 2, pp. 179--214, 2004. Google ScholarDigital Library
M. J. D. Powell, "An efficient method for finding the minimum of a function of several variables without calculating derivatives," The Computer Journal, vol. 7, no. 2, pp. 155--162, 1964.Google ScholarCross Ref
M. Taylor, H. Zaragoza, N. Craswell, S. Robertson, and C. Burges, "Optimisation methods for ranking functions with multiple parameters," in CIKM, 2006, pp. 585--593. Google ScholarDigital Library
K. Jarvelin and J. Kekalainen, "Cumulated gain-based evaluation of IR techniques," ACM Trans. Inf. Syst., vol. 20, no. 4, pp. 422--446, 2002. Google ScholarDigital Library

Index Terms

Optimizing two stage bigram language models for IR
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

A novel retrieval approach reflecting variability of syntactic phrase representation

In this paper, we introduce variability of syntactic phrases and propose a new retrieval approach reflecting the variability of syntactic phrase representation. With variability measure of a phrase, we can estimate how likely a phrase in a given ...
Read More
Hindi Word Sense Disambiguation Using Lesk Approach on Bigram and Trigram Words
AICTC '16: Proceedings of the International Conference on Advances in Information Communication Technology & Computing

Word Sense Disambiguation (WSD) is a vital task which provides the definition of particular words according to their sense or according to given context. Lesk algorithm is originally based on the gloss overlap that can be observed as the measure, ...
Read More
Cross-language plagiarism detection

Cross-language plagiarism detection deals with the automatic identification and extraction of plagiarism in a multilingual setting. In this setting, a suspicious document is given, and the task is to retrieve all sections from the document that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '10: Proceedings of the 19th international conference on World wide web
April 2010
1407 pages
ISBN:9781605587998
DOI:10.1145/1772690
General Chairs:
Michael Rappa
North Carolina State University, USA
,
Paul Jones
University of North Carolina at Chapel Hill, USA
,
Program Chairs:
Juliana Freire
University of Utah, USA
,
Soumen Chakrabarti
Indian Institute of Technology, India
Copyright © 2010 Copyright is held by the author/owner(s)
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 April 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bigram LM
parameter tuning
retrieval model
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 252
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ePub

View this article in ePub.

View ePub

Optimizing two stage bigram language models for IR

WWW '10: Proceedings of the 19th international conference on World wide web

ABSTRACT

References

Cited By

Index Terms

Recommendations

A novel retrieval approach reflecting variability of syntactic phrase representation

Hindi Word Sense Disambiguation Using Lesk Approach on Bigram and Trigram Words

Cross-language plagiarism detection