skip to main content
10.1145/1277741.1277797acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

A study of Poisson query generation model for information retrieval

Published: 23 July 2007 Publication History

Abstract

Many variants of language models have been proposed for information retrieval. Most existing models are based on multinomial distribution and would score documents based on query likelihood computed based on a query generation probabilistic model. In this paper, we propose and study a new family of query generation models based on Poisson distribution. We show that while in their simplest forms, the new family of models and the existing multinomial models are equivalent. However, based on different smoothing methods, the two families of models behave differently. We show that the Poisson model has several advantages, including naturally accommodating per-term smoothing and modeling accurate background more efficiently. We present several variants of the new model corresponding to different smoothing methods, and evaluate them on four representative TREC test collections. The results show that while their basic models perform comparably, the Poisson model can out perform multinomial model with per-term smoothing. The performance can be further improved with two-stage smoothing.

References

[1]
D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research 3:993--1022, 2003.
[2]
S. F. Chen and J. Goodman. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Harvard University, 1998.
[3]
K. Church and W. Gale. Poisson mixtures. Nat. Lang. Eng. 1(2):163--190, 1995.
[4]
W. B. Croft and J. Lafferty, editors. Language Modeling and Information Retrieval Kluwer Academic Publishers, 2003.
[5]
H. Fang, T. Tao, and C. Zhai. A formal study of information retrieval heuristics. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval pages 49--56, 2004.
[6]
D. Hiemstra. Using Language Models for Information Retrieval PhD thesis, University of Twente, Enschede, Netherlands, 2001.
[7]
D. Hiemstra. Term-speci?c smoothing for the language modeling approach to information retrieval: the importance of a query term. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval pages 35--41, 2002.
[8]
T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of ACM SIGIR'99 pages 50--57, 1999.
[9]
S. M. Katz. Distribution of content words and phrases in text and language modelling. Nat. Lang. Eng. 2(1):15--59, 1996.
[10]
O. Kurland and L. Lee. Corpus structure, language models, and ad hoc information retrieval. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval pages 194--201, 2004.
[11]
J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR'01 pages 111--119, Sept 2001.
[12]
J. Lafferty and C. Zhai. Probabilistic IR models based on query and document generation. In Proceedings of the Language Modeling and IR workshop pages 1--5, May 31 June 1 2001.
[13]
J. Lafferty and C. Zhai. Probabilistic relevance models based on document and query generation. In W. B. Croft and J. Lafferty, editors, Language Modeling and Information Retrieval Kluwer Academic Publishers, 2003.
[14]
V. Lavrenko and B. Croft. Relevance-based language models. In Proceedings of SIGIR'01 pages 120--127, Sept 2001.
[15]
X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval pages 186--193, 2004.
[16]
E. L. Margulis. Modelling documents with multiple poisson distributions. Inf. Process. Manage. 29(2):215--227, 1993.
[17]
D. Metzler, V. Lavrenko, and W. B. Croft. Formal multiple-bernoulli models for language modeling. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval pages 540--541, 2004.
[18]
D. H. Miller, T. Leek, and R. Schwartz. A hidden Markov model information retrieval system. In Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval pages 214--221, 1999.
[19]
A. Papoulis. Probability, random variables and stochastic processes New York: McGraw-Hill, 1984, 2nd ed., 1984.
[20]
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval pages 275--281, 1998.
[21]
S. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of SIGIR'94 pages 232--241, 1994.
[22]
S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In D. K. Harman, editor, The Third Text REtrieval Conference (TREC-3) pages 109--126, 1995.
[23]
T. Roelleke and J. Wang. A parallel derivation of probabilistic information retrieval models. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval pages 107--114, 2006.
[24]
T. Tao, X. Wang, Q. Mei, and C. Zhai. Language model information retrieval with document expansion. In Proceedings of HLT/NAACL 2006 pages 407--414, 2006.
[25]
J. Teevan and D. R. Karger. Empirical development of an exponential probabilistic model for text retrieval: using textual analysis to build a better model. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval pages 18--25, 2003.
[26]
X. Wei and W. B. Croft. Lda-based document models for ad-hoc retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval pages 178--185, 2006.
[27]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of ACM SIGIR'01 pages 334--342, Sept 2001.
[28]
C. Zhai and J. Lafferty. Two-stage language models for information retrieval. In Proceedings of ACM SIGIR'02 pages 49--56, Aug 2002.

Cited By

View all

Index Terms

  1. A study of Poisson query generation model for information retrieval

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
    July 2007
    946 pages
    ISBN:9781595935977
    DOI:10.1145/1277741
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 July 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. formal models
    2. poisson process
    3. query generation
    4. term dependent smoothing

    Qualifiers

    • Article

    Conference

    SIGIR07
    Sponsor:
    SIGIR07: The 30th Annual International SIGIR Conference
    July 23 - 27, 2007
    Amsterdam, The Netherlands

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Dirichlet-Survival Process: Scalable Inference of Topic-Dependent Diffusion NetworksAdvances in Information Retrieval10.1007/978-3-031-28238-6_47(562-570)Online publication date: 2-Apr-2023
    • (2019)A Novel Auction-Based Query Pricing SchemaInternational Journal of Parallel Programming10.1007/s10766-017-0534-x47:4(759-780)Online publication date: 1-Aug-2019
    • (2019)A probabilistic model for assigning queries at the edgeComputing10.1007/s00607-019-00767-8102:4(865-892)Online publication date: 18-Nov-2019
    • (2017)Extracting Hierarchies of Search Tasks & Subtasks via a Bayesian Nonparametric ApproachProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080823(285-294)Online publication date: 7-Aug-2017
    • (2017)Survival Factorization on Diffusion NetworksMachine Learning and Knowledge Discovery in Databases10.1007/978-3-319-71249-9_41(684-700)Online publication date: 30-Dec-2017
    • (2016)Paraphrasing Sentential Queries by Incorporating Coordinate RelationshipJournal of Information Processing10.2197/ipsjjip.24.72124:4(721-731)Online publication date: 2016
    • (2015)Sentential query rewriting via mutual reinforcement of paraphrase-coordinate relationshipsProceedings of the 17th International Conference on Information Integration and Web-based Applications & Services10.1145/2837185.2837222(1-10)Online publication date: 11-Dec-2015
    • (2015)Negative query generation: bridging the gap between query likelihood retrieval models and relevanceInformation Retrieval Journal10.1007/s10791-015-9257-z18:4(359-378)Online publication date: 6-Jun-2015
    • (2014)A Fixed-Point Method for Weighting Terms in Verbose Informational QueriesProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management10.1145/2661829.2661957(131-140)Online publication date: 3-Nov-2014
    • (2014)A variational Bayes model for count data learning and classificationEngineering Applications of Artificial Intelligence10.1016/j.engappai.2014.06.02335(176-186)Online publication date: 1-Oct-2014
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media