Skip to main content

A Reconstruction of Ewens’ Sampling Formula via Lists of Coins

  • Chapter
  • First Online:
  • 573 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13560))

Abstract

This overview paper starts from an elementary fact about lists of numbers (coins) for which a simple arithmetical proof is lacking. The paper does provide a proof, but via probabilistic reasoning, using iterations of probabilistic functions (channels), or equivalently, using iteration of transitions in a probabilistic automaton. The formulas involved capture mutations, with a rate parameter, as developed some fifty years ago in population biology by Warren Ewens. Here, this formula is reconstructed, in a theoretical computer science setting, first for lists and then also for multisets—like in the original work. The methods for describing such mutations have wider significance, beyond biology, for instance in machine learning, when the number of clusters in a classification problem may grow.

Dedicated to my dear colleague Frits Vaandrager, on the occasion of his 60th birthday.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aldous, D.J.: Exchangeability and related topics. In: Hennequin, P.L. (ed.) École d’Été de Probabilités de Saint-Flour XIII—1983. LNM, vol. 1117, pp. 1–198. Springer, Heidelberg (1985). https://doi.org/10.1007/BFb0099421

    Chapter  Google Scholar 

  2. Andrews, G.: The Theory of Partitions. Cambridge University Press, Cambridge (1998)

    MATH  Google Scholar 

  3. Antoniak, C.: Mixtures of Dirichlet processes with applications to Bayesian non-parametric problems. Ann. Stat. 2, 1152–1174 (1974). https://doi.org/10.1214/aos/1176342871

    Article  MathSciNet  MATH  Google Scholar 

  4. Berendsen, J., Jansen, D., Vaandrager, F.: Fortuna: model checking priced probabilistic timed automata. In: Quantitative Evaluation of Systems (QEST), pp. 273–281. IEEE Computer Society (2010). https://doi.org/10.1109/QEST.2010.41

  5. Bernardo, J., Smith, A.: Bayesian Theory. Wiley, Hoboken (2000). https://onlinelibrary.wiley.com/doi/book/10.1002/9780470316870, https://doi.org/10.1002/9780470316870

  6. Billingsley, P.: Probability and Measure. Wiley-Interscience, New York (1995)

    MATH  Google Scholar 

  7. Bishop, C.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  8. Cheung, L., Stoelinga, M., Vaandrager, F.: A testing scenario for probabilistic processes. J. ACM 54(6), 29 (2007). https://doi.org/10.1145/1314690.1314693

    Article  MathSciNet  MATH  Google Scholar 

  9. Crane, H.: The ubiquitous Ewens sampling formula. Stat. Sci. 31(1), 1–19 (2016). https://doi.org/10.1214/15-STS529

    Article  MathSciNet  MATH  Google Scholar 

  10. Ewens, W.: The sampling theory of selectively neutral alleles. Theoret. Popul. Biol. 3, 87–112 (1972). https://doi.org/10.1016/0040-5809(72)90035-4

    Article  MathSciNet  MATH  Google Scholar 

  11. Ferguson, T.: A Bayesian analysis of some nonparametric problems. Ann. Stat. 1(2), 209–230 (1973). https://doi.org/10.1214/aos/1176342360

    Article  MathSciNet  MATH  Google Scholar 

  12. Fritz, T.: A synthetic approach to Markov kernels, conditional independence, and theorems on sufficient statistics. Adv. Math. 370, 107239 (2020). https://doi.org/10.1016/J.AIM.2020.107239

    Article  MathSciNet  MATH  Google Scholar 

  13. Guichard, D.: Combinatorics and graph theory (2022). https://www.whitman.edu/mathematics/cgt_online/book/

  14. Jacobs, B.: From multisets over distributions to distributions over multisets. In: Logic in Computer Science. IEEE, Computer Science Press (2021). https://doi.org/10.1109/lics52264.2021.9470678

  15. Jacobs, B.: Partitions and Ewens distributions in element-free probability theory. In: Logic in Computer Science. IEEE, Computer Science Press (2022). https://doi.org/10.1145/3531130.3532419

  16. Jacobs, B.: Sufficient statistics and split idempotents in discrete probability theory. In: Mathematical Foundation of Programming Semantics (2022)

    Google Scholar 

  17. Joyce, P.: Partition structures and sufficient statistics. J. Appl. Probab. 35(3), 622–632 (1998). https://doi.org/10.1239/jap/1032265210

    Article  MathSciNet  MATH  Google Scholar 

  18. Kingman, J.: Random partitions in population genetics. Proc. R. Soc. Ser. A 361, 1–20 (1978). https://doi.org/10.1098/rspa.1978.0089

    Article  MathSciNet  MATH  Google Scholar 

  19. Kingman, J.: The representation of partition structures. J. London Math. Soc. 18(2), 374–380 (1978). https://doi.org/10.1112/jlms/s2-18.2.374

    Article  MathSciNet  MATH  Google Scholar 

  20. Lynch, N., Segala, R., Vaandrager, F.: Compositionality for probabilistic automata. In: Amadio, R., Lugiez, D. (eds.) CONCUR 2003. LNCS, vol. 2761, pp. 208–221. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45187-7_14

    Chapter  Google Scholar 

  21. McCullagh, P., Yang, J.: How many clusters? Bayesian Anal. 3(1), 101–120 (2008). https://doi.org/10.1214/08-BA304

    Article  MathSciNet  MATH  Google Scholar 

  22. Pitman, J.: Random discrete distributions invariant under size-biased permutation. Adv. Appl. Probab. 28(2), 525–539 (1995). https://doi.org/10.2307/1428070

    Article  MathSciNet  MATH  Google Scholar 

  23. Pitman, J., Yor, M.: The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25(2), 855–900 (1997). https://doi.org/10.1214/aop/1024404422

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Thanks are due to Ceel Pierik for helpful discussion on the material in Sect. 5.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bart Jacobs .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Jacobs, B. (2022). A Reconstruction of Ewens’ Sampling Formula via Lists of Coins. In: Jansen, N., Stoelinga, M., van den Bos, P. (eds) A Journey from Process Algebra via Timed Automata to Model Learning . Lecture Notes in Computer Science, vol 13560. Springer, Cham. https://doi.org/10.1007/978-3-031-15629-8_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15629-8_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15628-1

  • Online ISBN: 978-3-031-15629-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics