Skip to main content

Two-Poisson model

  • Reference work entry
Encyclopedia of Database Systems
  • 384 Accesses

Synonyms

Harter’s model; Probabilistic model of indexing

Definition

The 2-Poisson model is a mixture, that is a linear combination, of two Poisson distributions:

$${ \rm Prob({{X}} = {\rm{\bf tf}}) = \alpha {{{\rm{\lambda }}^{{\rm{\bf tf}}} {{e}}^{{\rm{ - \lambda }}} }\over {{\rm{\bf tf\,!}}}} + (1 - \alpha ){{\mu ^{{\rm{\bf tf}}}{{e}}^{ - \mu } } \over {{\rm{\bf tf\,!}}}}\quad[0 \le \alpha \le 1]}$$

In the context of IR, the 2-Poisson is used to model the probability distribution of the frequency X of a term in a collection of documents.

Historical Background

The 2-Poisson model was given by Harter [57], although Bookstein [2,1] and Harter had been exchanging ideas about probabilistic models of indexing during those years. Harter coined the word “elite” to introduce his 2-Poisson model [5, pp. 68–74].

The origin of the 2-Poisson model can be traced back through all Luhn, Maroon, Damerau, Edmundson and Wyllys [3,4,5,6]. The first accounts on Poisson distribution modeling the...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 2,500.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Bookstein A. and Kraft D. Operations research applied to document indexing and retrieval decisions. J. ACM, 24(3):418–427, 1977.

    Article  MATH  Google Scholar 

  2. Bookstein A. and Swanson D. Probabilistic models for automatic indexing. J. Am. Soc. Inform. Sci., 25:312–318, 1974.

    Article  Google Scholar 

  3. Damerau F. An experiment in automatic indexing. Am. Doc., 16:283–289, 1965.

    Article  Google Scholar 

  4. Edmundson H.P. and Wyllys R.E. Automated abstracting and indexing–survey and recommendations. Commun. ACM, 4(5):226–234, 1964. May 1961. Reprinted in Readings in Information Retrieval, pp. 390-412. H. Sharp (ed.). New York, NY: Scarecrow;

    Article  Google Scholar 

  5. Harter S.P. A probabilistic approach to automatic keyword indexing. PhD thesis, Graduate Library, The University of Chicago, Thesis No. T25146, 1974.

    Google Scholar 

  6. Harter S.P. A probabilistic approach to automatic keyword indexing. part I: On the distribution of specialty words in a technical literature. J. American Soc. for Inf. Sci., 26:197–216, 1975.

    Article  Google Scholar 

  7. Harter S.P. A probabilistic approach to automatic keyword indexing. part II: An algorithm for probabilistic indexing. J. American Soc. for Inf. Sci., 26:280–289, 1975.

    Article  Google Scholar 

  8. Luhn H.P. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1:309–317, 1957.

    Article  MathSciNet  Google Scholar 

  9. Maron M.E. Automatic indexing: an experimental inquiry. J. ACM, 8:404–417, 1961.

    Article  MATH  Google Scholar 

  10. Puri P.S. and Goldie C.M. Poisson mixtures and quasi-infinite divisibility of distributions. J. Appl. Probab., 16(1):138–153, 1979.

    Article  MathSciNet  MATH  Google Scholar 

  11. Stone D. and Rubinoff B. Statistical generation of a technical vocabulary. Am. Doc., 19(4):411–412, 1968.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this entry

Cite this entry

Amati, G. (2009). Two-Poisson model. In: LIU, L., ÖZSU, M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9_920

Download citation

Publish with us

Policies and ethics