skip to main content
10.1145/1835449.1835546acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Positional relevance model for pseudo-relevance feedback

Published: 19 July 2010 Publication History

Abstract

Pseudo-relevance feedback is an effective technique for improving retrieval results. Traditional feedback algorithms use a whole feedback document as a unit to extract words for query expansion, which is not optimal as a document may cover several different topics and thus contain much irrelevant information. In this paper, we study how to effectively select from feedback documents those words that are focused on the query topic based on positions of terms in feedback documents. We propose a positional relevance model (PRM) to address this problem in a unified probabilistic way. The proposed PRM is an extension of the relevance model to exploit term positions and proximity so as to assign more weights to words closer to query words based on the intuition that words closer to query words are more likely to be related to the query topic. We develop two methods to estimate PRM based on different sampling processes. Experiment results on two large retrieval datasets show that the proposed PRM is effective and robust for pseudo-relevance feedback, significantly outperforming the relevance model in both document-based feedback and passage-based feedback.

References

[1]
Nasreen Abdul-Jaleel, James Allan, W. Bruce Croft, Fernando Diaz, Leah Larkey, Xiaoyan Li, Donald Metzler, Mark D. Smucker, Trevor Strohman, Howard Turtle, and Courtney Wade. Umass at trec 2004: Novelty and hard. In TREC '04, 2004.
[2]
James Allan. Relevance feedback with too much data. In SIGIR '95, pages 337--343, 1995.
[3]
Chris Buckley, Gerard Salton, James Allan, and Amit Singhal. Automatic query expansion using smart: Trec 3. In TREC '94, pages 69--80, 1994.
[4]
Georg Buscher, Andreas Dengel, and Ludger van Elst. Query expansion using gaze-based feedback on the subdocument level. In SIGIR '08, pages 387--394, 2008.
[5]
Stefan Buttcher and Charles L. A. Clarke. Efficiency vs. effectiveness in terabyte-scale information retrieval. In TREC '05, 2005.
[6]
Stefan Buttcher, Charles L. A. Clarke, and Brad Lushman. Term proximity scoring for ad-hoc retrieval on very large text collections. In SIGIR '06, pages 621--622, 2006.
[7]
Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson. Selecting good expansion terms for pseudo-relevance feedback. In SIGIR, pages 243--250, 2008.
[8]
Ben Carterette, James Allan, and Ramesh Sitaraman. Minimal test collections for retrieval evaluation. In SIGIR '06, pages 268--275, 2006.
[9]
Charles L. A. Clarke, Gordon V. Cormack, and Forbes J. Burkowski. Shortest substring ranking (multitext experiments for trec-4). In TREC '95, pages 295--304, 1995.
[10]
Ronan Cummins and Colm O'Riordan. Learning in a pairwise term-term proximity framework for information retrieval. In SIGIR '09, pages 251--258, 2009.
[11]
David Hawking and Paul B. Thistlewaite. Proximity operators - so near and yet so far. In TREC '95, pages 500--236, 1995.
[12]
Marcin Kaszkiel and Justin Zobel. Effective ranking with arbitrary passages. Journal of the American Society for Information Science and Technology, 52(4):344--364, 2001.
[13]
E. Michael Keen. The use of term position devices in ranked output experiments. The Journal of Documentation, 47(1):1--22, 1991.
[14]
E. Michael Keen. Some aspects of proximity searching in text retrieval systems. Journal of Information Science, 18(2):89--98, 1992.
[15]
John D. Lafferty and Chengxiang Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR '01, pages 111--119, 2001.
[16]
Victor Lavrenko and W. Bruce Croft. Relevance-based language models. In SIGIR '01, pages 120--127, 2001.
[17]
Xiaoyong Liu and W. Bruce Croft. Passage retrieval based on language models. In CIKM '02, pages 375--382, 2002.
[18]
Yuanhua Lv and ChengXiang Zhai. A comparative study of methods for estimating query language models with pseudo feedback. In CIKM '09, pages 1895--1898, 2009.
[19]
Yuanhua Lv and ChengXiang Zhai. Positional language models for information retrieval. In SIGIR '09, pages 299--306, 2009.
[20]
Donald Metzler and W. Bruce Croft. A markov random field model for term dependencies. In SIGIR '05, pages 472--479, 2005.
[21]
Donald Metzler and W. Bruce Croft. Latent concept expansion using markov random fields. In SIGIR '07, pages 311--318, 2007.
[22]
Christof Monz. Minimal span weighting retrieval for question answering. In Rob Gaizauskas, Mark Greenwood, and Mark Hepple, editors, SIGIR Workshop on Information Retrieval for Question Answering, pages 23--30, 2004.
[23]
Yves Rasolofo and Jacques Savoy. Term proximity scoring for keyword-based retrieval systems. In ECIR '03, pages 207--218, 2003.
[24]
Stephen E. Robertson and Karen Sparck Jones. Relevance weighting of search terms. Journal of the American Society of Information Science, 27(3):129--146, 1976.
[25]
Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. Okapi at trec-3. In TREC '94, pages 109--126, 1994.
[26]
J. J. Rocchio. Relevance feedback in information retrieval. In In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313--323. Prentice-Hall Inc., 1971.
[27]
Gerard Salton and Chris Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society of Information Science, 41(4):288--297, 1990.
[28]
Tao Tao and ChengXiang Zhai. An exploration of proximity measures in information retrieval. In SIGIR '07, pages 295--302, 2007.
[29]
Olga Vechtomova and Ying Wang. A study of the effect of term proximity on query expansion. Journal of Information Science, 32(4):324--333, August 2006.
[30]
Jinxi Xu and W. Bruce Croft. Query expansion using local and global document analysis. In SIGIR '96, pages 4--11, 1996.
[31]
Shipeng Yu, Deng Cai, Ji-Rong Wen, and Wei-Ying Ma. Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In WWW '03, pages 11--18, 2003.
[32]
ChengXiang Zhai and John D. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In CIKM '01, pages 403--410, 2001.
[33]
ChengXiang Zhai and John D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR '01, pages 334--342, 2001.
[34]
Jinglei Zhao and Yeogirl Yun. A proximity language model for information retrieval. In SIGIR '09, pages 291--298, 2009.

Cited By

View all
  • (2024)How to personalize and whether to personalize? Candidate documents decideKnowledge and Information Systems10.1007/s10115-024-02138-y66:9(5581-5604)Online publication date: 27-May-2024
  • (2024)Event-Specific Document Ranking Through Multi-stage Query Expansion Using an Event Knowledge GraphAdvances in Information Retrieval10.1007/978-3-031-56060-6_22(333-348)Online publication date: 16-Mar-2024
  • (2023)Personalized Query Expansion with Contextual Word EmbeddingsACM Transactions on Information Systems10.1145/362498842:2(1-35)Online publication date: 20-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
July 2010
944 pages
ISBN:9781450301534
DOI:10.1145/1835449
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. passage-based feedback
  2. positional language model
  3. positional relevance model
  4. proximity
  5. pseudo relevance feedback
  6. query expansion

Qualifiers

  • Research-article

Conference

SIGIR '10
Sponsor:

Acceptance Rates

SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)How to personalize and whether to personalize? Candidate documents decideKnowledge and Information Systems10.1007/s10115-024-02138-y66:9(5581-5604)Online publication date: 27-May-2024
  • (2024)Event-Specific Document Ranking Through Multi-stage Query Expansion Using an Event Knowledge GraphAdvances in Information Retrieval10.1007/978-3-031-56060-6_22(333-348)Online publication date: 16-Mar-2024
  • (2023)Personalized Query Expansion with Contextual Word EmbeddingsACM Transactions on Information Systems10.1145/362498842:2(1-35)Online publication date: 20-Sep-2023
  • (2023)Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and PitfallsACM Transactions on Information Systems10.1145/357072441:3(1-40)Online publication date: 10-Apr-2023
  • (2023)ColBERT-FairPRF: Towards Fair Pseudo-Relevance Feedback in Dense RetrievalAdvances in Information Retrieval10.1007/978-3-031-28238-6_36(457-465)Online publication date: 17-Mar-2023
  • (2022)A Concept Net-based semantic constraint method for query expansion2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)10.1109/WI-IAT55865.2022.00147(906-913)Online publication date: Nov-2022
  • (2022)A Large Scale Document-Term Matching Method Based on Information Retrieval2022 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00048(323-330)Online publication date: Dec-2022
  • (2022)A probabilistic framework for integrating sentence-level semantics via BERT into pseudo-relevance feedbackInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10273459:1Online publication date: 9-Apr-2022
  • (2022)A Dependency-Aware Utterances Permutation Strategy to Improve Conversational EvaluationAdvances in Information Retrieval10.1007/978-3-030-99736-6_13(184-198)Online publication date: 5-Apr-2022
  • (2022)Biomedical Data Retrieval Using Enhanced Query ExpansionHandbook of Smart Materials, Technologies, and Devices10.1007/978-3-030-84205-5_63(1921-1956)Online publication date: 10-Nov-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media