ABSTRACT
Although higher order language models (LMs) have shown benefit of capturing word dependencies for Information retrieval(IR), the tuning of the increased number of free parameters remains a formidable engineering challenge. Consequently,in many real world retrieval systems, applying higher order LMs is an exception rather than the rule. In this study, we address the parameter tuning problem using a framework based on a linear ranking model in which different component models are incorporated as features. Using unigram and bigram LMs with 2 stage smoothing as examples, we show that our method leads to a bigram LM that outperforms significantly its unigram counterpart and the well-tuned BM25 model.
- J. Gao Haoliang Qi, X. Xia, and J.-Y. Nie, "Linear discriminant model for information retrieval," in SIGIR, 2005, pp. 290--297. Google ScholarDigital Library
- D. Metzler and W. B. Croft, "A markov random field model for term dependencies," in SIGIR, 2005, pp. 472--479. Google ScholarDigital Library
- C. Zhai and J. Lafferty, "Two-stage language models for information retrieval," in SIGIR, 2002, pp. 49--56. Google ScholarDigital Library
- J. Gao, J.-Y. Nie, G. Wu, and G. Cao, "Dependence language model for information retrieval," in SIGIR, 2004, pp. 170--177. Google ScholarDigital Library
- C. Zhai and J. Lafferty, "A study of smoothing methods for language models applied to information retrieval," ACM Trans. Inf. Syst., vol. 22, no. 2, pp. 179--214, 2004. Google ScholarDigital Library
- M. J. D. Powell, "An efficient method for finding the minimum of a function of several variables without calculating derivatives," The Computer Journal, vol. 7, no. 2, pp. 155--162, 1964.Google ScholarCross Ref
- M. Taylor, H. Zaragoza, N. Craswell, S. Robertson, and C. Burges, "Optimisation methods for ranking functions with multiple parameters," in CIKM, 2006, pp. 585--593. Google ScholarDigital Library
- K. Jarvelin and J. Kekalainen, "Cumulated gain-based evaluation of IR techniques," ACM Trans. Inf. Syst., vol. 20, no. 4, pp. 422--446, 2002. Google ScholarDigital Library
Index Terms
- Optimizing two stage bigram language models for IR
Recommendations
A novel retrieval approach reflecting variability of syntactic phrase representation
In this paper, we introduce variability of syntactic phrases and propose a new retrieval approach reflecting the variability of syntactic phrase representation. With variability measure of a phrase, we can estimate how likely a phrase in a given ...
Hindi Word Sense Disambiguation Using Lesk Approach on Bigram and Trigram Words
AICTC '16: Proceedings of the International Conference on Advances in Information Communication Technology & ComputingWord Sense Disambiguation (WSD) is a vital task which provides the definition of particular words according to their sense or according to given context. Lesk algorithm is originally based on the gloss overlap that can be observed as the measure, ...
Cross-language plagiarism detection
Cross-language plagiarism detection deals with the automatic identification and extraction of plagiarism in a multilingual setting. In this setting, a suspicious document is given, and the task is to retrieve all sections from the document that ...
Comments