Abstract
Most information retrieval systems rely on the strict equality of terms between document and query in order to retrieve relevant documents to a given query. The term mismatch problem appears when users and documents’ authors use different terms to express the same meaning. Statistical translation models are proposed as an effective way to adapt language models in order to mitigate term mismatch problem by exploiting semantic relations between terms. However, translation probability estimation is shown as a crucial and a hard practice within statistical translation models. Therefore, we present an alternative approach to statistical translation models that formally incorporates semantic relations between indexing terms into language models. Experiments on different CLEF corpora from the medical domain show a statistically significant improvement over the ordinary language models, and mostly better than translation models in retrieval performance. The improvement is related to the rate of general terms and their distribution inside the queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aronson, A.R.: Metamap: Mapping text to the umls metathesaurus (2006)
Bendersky, M., Croft, W.B.: Discovering key concepts in verbose queries. In: SIGIR 2008, pp. 491–498. ACM, New York (2008), http://doi.acm.org/10.1145/1390334.1390419
Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: SIGIR 1999, pp. 222–229. ACM, New York (1999), http://doi.acm.org/10.1145/312624.312681
Chevallet, J.-P.: X-iota: An open xml framework for ir experimentation. In: Myaeng, S.-H., Zhou, M., Wong, K.-F., Zhang, H.-J. (eds.) AIRS 2004. LNCS, vol. 3411, pp. 263–280. Springer, Heidelberg (2005)
Chevallet, J.P., Lim, J.H., Le, D.T.H.: Domain knowledge conceptual inter-media indexing: Application to multilingual multimedia medical reports. In: CIKM 2007, pp. 495–504. ACM (2007), http://doi.acm.org/10.1145/1321440.1321511
Crestani, F.: Exploiting the similarity of non-matching terms at retrieval time. Journal of Information Retrieval 2, 25–45 (2000)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Jing, Y., Croft, W.B.: An association thesaurus for information retrieval, pp. 146–160 (1994)
Karimzadehgan, M., Zhai, C.: Estimation of statistical translation models based on mutual information for ad hoc information retrieval. ACM (2010), http://doi.acm.org/10.1145/1835449.1835505
Krovetz, R.: Viewing morphology as an inference process, pp. 191–202. ACM Press (1993)
Lavrenko, V., Croft, W.B.: Relevance based language models. In: SIGIR 2001, pp. 120–127. ACM, New York (2001), http://doi.acm.org/10.1145/383952.383972
Lin, J., Demner-Fushman, D.: The role of knowledge in conceptual retrieval: A study in the domain of clinical medicine. In: SIGIR 2006 (2006), http://doi.acm.org/10.1145/1148170.1148191
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Peng, F., Ahmed, N., Li, X., Lu, Y.: Context sensitive stemming for web search. In: SIGIR 2007, pp. 639–646. ACM, New York (2007), http://doi.acm.org/10.1145/1277741.1277851
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR 1998, pp. 275–281. ACM (1998), http://doi.acm.org/10.1145/290941.291008
Porter, M.F.: An algorithm for suffix stripping. In: Readings in Information Retrieval, pp. 313–316. Morgan Kaufmann Publishers Inc. (1997), http://dl.acm.org/citation.cfm?id=275537.275705
Salton, G. (ed.): The SMART Retrieval System - Experiments in Automatic Document Processing. Prentice Hall, Englewood (1971)
Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: CIKM 2007. ACM (2007), http://doi.acm.org/10.1145/1321440.1321528
Widdows, D.: Geometry and Meaning. Center for the Study of Language and Inf. (November 2004), http://www.amazon.ca/exec/obidos/redirect?tag=citeulike04-20&path=ASIN/1575864487
Zhai, C.: Statistical Language Models for Information Retrieval. Now Publishers Inc., Hanover (2008)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval 22(2), 179–214 (2004), http://doi.acm.org/10.1145/984321.984322
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
ALMasri, M., Tan, K., Berrut, C., Chevallet, JP., Mulhem, P. (2014). Integrating Semantic Term Relations into Information Retrieval Systems Based on Language Models. In: Jaafar, A., et al. Information Retrieval Technology. AIRS 2014. Lecture Notes in Computer Science, vol 8870. Springer, Cham. https://doi.org/10.1007/978-3-319-12844-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-12844-3_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12843-6
Online ISBN: 978-3-319-12844-3
eBook Packages: Computer ScienceComputer Science (R0)