skip to main content
10.1145/1571941.1571994acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Positional language models for information retrieval

Published: 19 July 2009 Publication History

Abstract

Although many variants of language models have been proposed for information retrieval, there are two related retrieval heuristics remaining "external" to the language modeling approach: (1) proximity heuristic which rewards a document where the matched query terms occur close to each other; (2) passage retrieval which scores a document mainly based on the best matching passage. Existing studies have only attempted to use a standard language model as a "black box" to implement these heuristics, making it hard to optimize the combination parameters.
In this paper, we propose a novel positional language model (PLM) which implements both heuristics in a unified language model. The key idea is to define a language model for each position of a document, and score a document based on the scores of its PLMs. The PLM is estimated based on propagated counts of words within a document through a proximity-based density function, which both captures proximity heuristics and achieves an effect of "soft" passage retrieval. We propose and study several representative density functions and several different PLM-based document ranking strategies. Experiment results on standard TREC test collections show that the PLM is effective for passage retrieval and performs better than a state-of-the-art proximity-based retrieval model.

References

[1]
]]Milton Abramowitz and Irene A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, New York, 1964.
[2]
]]Stefan Buttcher and Charles L. A. Clarke. E±ciency vs. effectiveness in terabyte-scale information retrieval. In Proceedings of TREC '05, 2005.
[3]
]]Stefan Buttcher, Charles L. A. Clarke, and Brad Lushman. Term proximity scoring for ad-hoc retrieval on very large text collections. In Proceedings of SIGIR '06, pages 621--622, 2006.
[4]
]]James P. Callan. Passage-level evidence in document retrieval. In Proceedings of SIGIR '94, pages 302--310, 1994.
[5]
]]Charles L. A. Clarke, Gordon V. Cormack, and Forbes J. Burkowski. Shortest substring ranking (multitext experiments for trec-4). In Proceedings of TREC '95, pages 295--304, 1995.
[6]
]]Owen de Kretser and Alistair Moffat. Effective document presentation with a locality-based similarity heuristic. In Proceedings of SIGIR '99, pages 113--120, 1999.
[7]
]]David Hawking and Paul B. Thistlewaite. Proximity operators -- so near and yet so far. In Proceedings of TREC '95, pages 500--236, 1995.
[8]
]]Marcin Kaszkiel and Justin Zobel. Passage retrieval revisited. In Proceedings of SIGIR '97, pages 178--185, 1997.
[9]
]]Marcin Kaszkiel and Justin Zobel. Effective ranking with arbitrary passages. Journal of the American Society for Information Science and Technology, 52(4):344--364, 2001.
[10]
]]Marcin Kaszkiel, Justin Zobel, and Ron Sacks-Davis. Efficient passage ranking for document databases. ACM Transactions on Information Systems, 17(4):406--439, 1999.
[11]
]]E. Michael Keen. The use of term position devices in ranked output experiments. The Journal of Documentation, 47(1):1--22, 1991.
[12]
]]E. Michael Keen. Some aspects of proximity searching in text retrieval systems. Journal of Information Science, 18(2):89--98, 1992.
[13]
]]Koichi Kise, Markus Junker, Andreas Dengel, and Keinosuke Matsumoto. Passage Retrieval Based on Density Distributions of Terms and Its Applications to Document Retrieval and Question Answering, volume 2956 of Lecture Notes in Computer Science. Springer Berlin/Heidelberg, 2004.
[14]
]]John D. Lafferty and Chengxiang Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR '01, pages 111--119, 2001.
[15]
]]Victor Lavrenko and W. Bruce Croft. Relevance-based language models. In Proceedings of SIGIR '01, pages 120--127, 2001.
[16]
]]Xiaoyong Liu and W. Bruce Croft. Passage retrieval based on language models. In Proceedings of CIKM '02, pages 375--382, 2002.
[17]
]]Xiaoyong Liu and W. Bruce Croft. Cluster-based retrieval using language models. In Proceedings of SIGIR '04, pages 186--193, 2004.
[18]
]]David E. Losada and Leif Azzopardi. An analysis on document length retrieval trends in language modeling smoothing. Information Retrieval, 11(2):109--138, 2008.
[19]
]]Annabelle Mercier and Michel Beigbeder. Fuzzy proximity ranking with boolean queries. In Proceedings of TREC '05, 2005.
[20]
]]Donald Metzler and W. Bruce Croft. A markov random field model for term dependencies. In Proceedings of SIGIR '05, pages 472--479, 2005.
[21]
]]Christof Monz. Minimal span weighting retrieval for question answering. In Rob Gaizauskas, Mark Greenwood, and Mark Hepple, editors, SIGIR Workshop on Information Retrieval for Question Answering, pages 23--30, 2004.
[22]
]]Desislava Petkova and W. Bruce Croft. Proximity-based document representation for named entity retrieval. In Proceedings of CIKM '07, pages 731--740, 2007.
[23]
]]Jay M. Ponte and W. Bruce Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR '98, pages 275--281, 1998.
[24]
]]Yves Rasolofo and Jacques Savoy. Term proximity scoring for keyword--based retrieval systems. In Proceedings of ECIR '03, pages 207--218, 2003.
[25]
]]Gerard Salton, J. Allan, and Chris Buckley. Approaches to passage retrieval in full text information systems. In Proceedings of SIGIR '93, pages 49--58, 1993.
[26]
]]Fei Song and W. Bruce Croft. A general language model for information retrieval. In Proceedings of CIKM '99, pages 316--321, 1999.
[27]
]]Ruihua Song, Ji-Rong Wen, and Wei-Ying Ma. Viewing term proximity from a different perspective. In Proceedings of ECIR'08, 2008.
[28]
]]Tao Tao and ChengXiang Zhai. An exploration of proximity measures in information retrieval. In Proceedings of SIGIR '07, pages 295--302, 2007.
[29]
]]Stefanie Tellex, Boris Katz, Jimmy Lin, Aaron Fernandes, and Gregory Marton. Quantitative evaluation of passage retrieval algorithms for question answering. In Proceedings of SIGIR'03, pages 41--47, 2003.
[30]
]]ChengXiang Zhai. Statistical language models for information retrieval a critical review. Found. Trends Inf. Retr., 2(3):137--213, 2008.
[31]
]]ChengXiang Zhai and John D. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of CIKM '01, pages 403--410, 2001.
[32]
]]ChengXiang Zhai and John D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR '01, pages 334--342, 2001.

Cited By

View all
  • (2024)Innovating Patent Retrieval: A Comprehensive Review of Techniques, Trends, and Challenges in Prior Art SearchesApplied System Innovation10.3390/asi70500917:5(91)Online publication date: 26-Sep-2024
  • (2024)Dynamic Segmentation for Efficient Retrieval of Podcasts: The Repping AlgorithmProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658047(29-36)Online publication date: 30-May-2024
  • (2024)Utilizing passage‐level relevance and kernel pooling for enhancing BERT‐based document rerankingComputational Intelligence10.1111/coin.1265640:3Online publication date: 7-Jun-2024
  • Show More Cited By

Index Terms

  1. Positional language models for information retrieval

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
    July 2009
    896 pages
    ISBN:9781605584836
    DOI:10.1145/1571941
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. passage retrieval
    2. positional language models
    3. proximity

    Qualifiers

    • Research-article

    Conference

    SIGIR '09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Innovating Patent Retrieval: A Comprehensive Review of Techniques, Trends, and Challenges in Prior Art SearchesApplied System Innovation10.3390/asi70500917:5(91)Online publication date: 26-Sep-2024
    • (2024)Dynamic Segmentation for Efficient Retrieval of Podcasts: The Repping AlgorithmProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658047(29-36)Online publication date: 30-May-2024
    • (2024)Utilizing passage‐level relevance and kernel pooling for enhancing BERT‐based document rerankingComputational Intelligence10.1111/coin.1265640:3Online publication date: 7-Jun-2024
    • (2024)SoftQE: Learned Representations of Queries Expanded by LLMsAdvances in Information Retrieval10.1007/978-3-031-56066-8_8(68-77)Online publication date: 24-Mar-2024
    • (2023)Summarizing Financial Reports with Positional Language Model2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386704(2877-2883)Online publication date: 15-Dec-2023
    • (2021)Effective Query Formulation in Conversation Contextualization: A Query Specificity-based ApproachProceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3471158.3472237(177-183)Online publication date: 11-Jul-2021
    • (2021)Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question AnsweringProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463259(2491-2498)Online publication date: 11-Jul-2021
    • (2021)A Principled Approach Using Fuzzy Set Theory for Passage-Based Document RetrievalIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2020.299011029:7(1967-1977)Online publication date: Jul-2021
    • (2021)A Survey of Vietnamese Automatic Speech Recognition2021 9th International Conference on Orange Technology (ICOT)10.1109/ICOT54518.2021.9680652(1-4)Online publication date: 16-Dec-2021
    • (2021)HeadlineStanceChecker: Exploiting summarization to detect headline disinformationJournal of Web Semantics10.1016/j.websem.2021.100660(100660)Online publication date: Sep-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media