skip to main content
10.1145/2983323.2983833acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Generalizing Translation Models in the Probabilistic Relevance Framework

Published: 24 October 2016 Publication History

Abstract

A recurring question in information retrieval is whether term associations can be properly integrated in traditional information retrieval models while preserving their robustness and effectiveness. In this paper, we revisit a wide spectrum of existing models (Pivoted Document Normalization, BM25, BM25 Verboseness Aware, Multi-Aspect TF, and Language Modelling) by introducing a generalisation of the idea of the translation model. This generalisation is a de facto transformation of the translation models from Language Modelling to the probabilistic models. In doing so, we observe a potential limitation of these generalised translation models: they only affect the term frequency based components of all the models, ignoring changes in document and collection statistics. We correct this limitation by extending the translation models with the 15 statistics of term associations and provide extensive experimental results to demonstrate the benefit of the newly proposed methods. Additionally, we compare the translation models with query expansion methods based on the same term association resources, as well as based on Pseudo-Relevance Feedback (PRF). We observe that translation models always outperform the first, but provide complementary information with the second, such that by using PRF and our translation models together we observe results better than the current state of the art.

References

[1]
G. Amati, C. Carpineto, G. Romano, and F. U. Bordoni. Query difficulty, robustness, and selective application of query expansion. In Proc. of ECIR, 2004.
[2]
G. Amati and C. J. Van Rijsbergen. Probabilistic models of information retrieval based on measuring the divergence from randomness. TOIS, 2002.
[3]
A. Berger and J. Lafferty. Information Retrieval As Statistical Translation. In Proc. of SIGIR, 1999.
[4]
D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 2003.
[5]
H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma. Probabilistic query expansion using query logs. In Proc. of WWW, 2002.
[6]
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 1990.
[7]
M. Dehghani, H. Azarbonyad, J. Kamps, D. Hiemstra, and M. Marx. Luhn revisited: Significant words language models. In Proceedings of The 25th ACM International Conference on Information and Knowledge Management.
[8]
D. Ganguly, D. Roy, M. Mitra, and G. J. Jones. Word Embedding based Generalized Language Model for Information Retrieval. In Proc. of SIGIR, 2015.
[9]
J. Gao and J.-Y. Nie. Towards concept-based translation models using search logs for query expansion. In Proc. of CIKM.
[10]
T. Hofmann. Probabilistic latent semantic indexing. In Proc. of SIGIR, 1999.
[11]
S. Huston and W. B. Croft. A Comparison of Retrieval Models Using Term Dependencies. In Proc. of CIKM, 2014.
[12]
M. Karimzadehgan and C. Zhai. Estimation of Statistical Translation Models Based on Mutual Information for Ad Hoc Information Retrieval. In Proc. of SIGIR, 2010.
[13]
M. Karimzadehgan and C. Zhai. Axiomatic Analysis of Translation Language Model for Information Retrieval. In Proc. of ECIR, 2012.
[14]
J. Karlgren, A. Holst, and M. Sahlgren. Filaments of meaning in word space. In Advances in Information Retrieval. 2008.
[15]
B. Koopman, G. Zuccon, P. Bruza, L. Sitbon, and M. Lawley. An evaluation of corpus-driven measures of medical concept similarity for information retrieval. In Proc. of CIKM, 2012.
[16]
A. Kotov and C. Zhai. Tapping into knowledge base for concept feedback: leveraging conceptnet to improve search results for difficult queries. In Proc. of WSDM, 2012.
[17]
J. D. Lafferty and C. Zhai. Probabilistic relevance models based on document and query generation. In Language modeling and information retrieval, 2003.
[18]
V. Lavrenko and W. B. Croft. Relevance based language models. In Proc. of SIGIR, 2001.
[19]
O. Levy, Y. Goldberg, and I. Dagan. Improving distributional similarity with lessons learned from word embeddings. Transaction of the Association of Computational Linguists (ACL), 2015.
[20]
H. Li and J. Xu. Semantic Matching in Search. Foundations and Trends in Information Retrieval, 2014.
[21]
C. Lioma, J. G. Simonsen, B. Larsen, and N. D. Hansen. Non-compositional term dependence for information retrieval. In Proc. of SIGIR, 2015.
[22]
A. Lipani, M. Lupu, A. Hanbury, and A. Aizawa. Verboseness fission for bm25 document length normalization. In Proc. of ICTIR, 2015.
[23]
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[24]
J. H. Paik. A novel tf-idf weighting scheme for effective ranking. In Proc. of SIGIR, 2013.
[25]
J. Palotti, G. Zuccon, L. Goeuriot, L. Kelly, A. Hanbury, G. J. Jones, M. Lupu, and P. Pecina. Clef ehealth evaluation lab 2015, task 2: Retrieving information about medical symptoms. CLEF, 2015.
[26]
J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. Proc. of EMNLP, 2014.
[27]
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proc. of SIGIR, 1998.
[28]
G. Recchia, M. Jones, M. Sahlgren, and P. Kanerva. Encoding sequential information in vector space models of semantics: Comparing holographic reduced representation and random permutation. In Proceedings the Cognitive Science Society Conference, 2010.
[29]
S. Robertson. On Event Spaces and Probabilistic Models in Information Retrieval. Information Retrieval, 8, 2005.
[30]
S. Robertson and H. Zaragoza. The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc, 2009.
[31]
J. Rocchio. Relevance feedback in information retrieval. In The SMART Retrieval System-- Experiments in Automatic Document Processing, 1971.
[32]
T. Sakai. Alternatives to bpref. In Proc. of SIGIR, 2007.
[33]
A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proc. of SIGIR, 1996.
[34]
I. Vulić and M.-F. Moens. Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In Proc. of SIGIR, 2015.
[35]
C. Xiong and J. Callan. Query expansion with Freebase. In Proc. of ICTIR, 2015.
[36]
J. Xu and W. B. Croft. Query expansion using local and global document analysis. In Proc. of SIGIR, 1996.
[37]
Y. Xu, G. J. Jones, and B. Wang. Query dependent pseudo-relevance feedback based on wikipedia. In Proc. of SIGIR, 2009.
[38]
C. Zhai and J. Lafferty. A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In Proc. of SIGIR, 2001.
[39]
J. Zhao, J. X. Huang, and Z. Ye. Modeling term associations for probabilistic information retrieval. TOIS, 2014.
[40]
G. Zheng and J. Callan. Learning to reweight terms with distributed representations. In Proc. of SIGIR, 2015.
[41]
G. Zuccon, B. Koopman, P. Bruza, and L. Azzopardi. Integrating and evaluating neural word embeddings in information retrieval. In Proc. of Australasian Document Computing Symposium, 2015.

Cited By

View all
  • (2023)Approaches to Cross-Language Retrieval of Similar Legal Documents Based on Machine LearningScientific and Technical Information Processing10.3103/S014768822305016750:5(494-499)Online publication date: 1-Dec-2023
  • (2021)A Modern Perspective on Query Likelihood with Deep Generative Retrieval ModelsProceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3471158.3472229(185-195)Online publication date: 11-Jul-2021
  • (2021)Comparison of Cross-Lingual Similar Documents Retrieval MethodsData Analytics and Management in Data Intensive Domains10.1007/978-3-030-81200-3_16(216-229)Online publication date: 16-Jul-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
October 2016
2566 pages
ISBN:9781450340731
DOI:10.1145/2983323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. IR models
  2. related terms
  3. translation model
  4. word embeddings

Qualifiers

  • Research-article

Funding Sources

  • Austrian Research Promotion Agency (FFG)
  • Austrian Science Fund (FWF)

Conference

CIKM'16
Sponsor:
CIKM'16: ACM Conference on Information and Knowledge Management
October 24 - 28, 2016
Indiana, Indianapolis, USA

Acceptance Rates

CIKM '16 Paper Acceptance Rate 160 of 701 submissions, 23%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Approaches to Cross-Language Retrieval of Similar Legal Documents Based on Machine LearningScientific and Technical Information Processing10.3103/S014768822305016750:5(494-499)Online publication date: 1-Dec-2023
  • (2021)A Modern Perspective on Query Likelihood with Deep Generative Retrieval ModelsProceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3471158.3472229(185-195)Online publication date: 11-Jul-2021
  • (2021)Comparison of Cross-Lingual Similar Documents Retrieval MethodsData Analytics and Management in Data Intensive Domains10.1007/978-3-030-81200-3_16(216-229)Online publication date: 16-Jul-2021
  • (2020)Different Approaches in Cross-Language Similar Documents Retrieval in the Legal DomainSpeech and Computer10.1007/978-3-030-60276-5_65(679-686)Online publication date: 29-Sep-2020
  • (2019)On the Effect of Low-Frequency Terms on Neural-IR ModelsProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331344(1137-1140)Online publication date: 18-Jul-2019
  • (2019)Contextually Propagated Term Weights for Document RepresentationProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331307(897-900)Online publication date: 18-Jul-2019
  • (2019)The Scholarly Impact and Strategic Intent of CLEF eHealth Labs from 2012 to 2017Information Retrieval Evaluation in a Changing World10.1007/978-3-030-22948-1_14(333-363)Online publication date: 14-Aug-2019
  • (2019)Enriching Word Embeddings for Patent Retrieval with Global ContextAdvances in Information Retrieval10.1007/978-3-030-15712-8_57(810-818)Online publication date: 7-Apr-2019
  • (2019)Iterative Relevance Feedback for Answer Passage Retrieval with Passage-Level Semantic MatchAdvances in Information Retrieval10.1007/978-3-030-15712-8_36(558-572)Online publication date: 7-Apr-2019
  • (2018)Neural information retrievalInformation Retrieval10.1007/s10791-017-9321-y21:2-3(111-182)Online publication date: 1-Jun-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media