Abstract
This overview paper starts from an elementary fact about lists of numbers (coins) for which a simple arithmetical proof is lacking. The paper does provide a proof, but via probabilistic reasoning, using iterations of probabilistic functions (channels), or equivalently, using iteration of transitions in a probabilistic automaton. The formulas involved capture mutations, with a rate parameter, as developed some fifty years ago in population biology by Warren Ewens. Here, this formula is reconstructed, in a theoretical computer science setting, first for lists and then also for multisets—like in the original work. The methods for describing such mutations have wider significance, beyond biology, for instance in machine learning, when the number of clusters in a classification problem may grow.
Dedicated to my dear colleague Frits Vaandrager, on the occasion of his 60th birthday.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aldous, D.J.: Exchangeability and related topics. In: Hennequin, P.L. (ed.) École d’Été de Probabilités de Saint-Flour XIII—1983. LNM, vol. 1117, pp. 1–198. Springer, Heidelberg (1985). https://doi.org/10.1007/BFb0099421
Andrews, G.: The Theory of Partitions. Cambridge University Press, Cambridge (1998)
Antoniak, C.: Mixtures of Dirichlet processes with applications to Bayesian non-parametric problems. Ann. Stat. 2, 1152–1174 (1974). https://doi.org/10.1214/aos/1176342871
Berendsen, J., Jansen, D., Vaandrager, F.: Fortuna: model checking priced probabilistic timed automata. In: Quantitative Evaluation of Systems (QEST), pp. 273–281. IEEE Computer Society (2010). https://doi.org/10.1109/QEST.2010.41
Bernardo, J., Smith, A.: Bayesian Theory. Wiley, Hoboken (2000). https://onlinelibrary.wiley.com/doi/book/10.1002/9780470316870, https://doi.org/10.1002/9780470316870
Billingsley, P.: Probability and Measure. Wiley-Interscience, New York (1995)
Bishop, C.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, Heidelberg (2006)
Cheung, L., Stoelinga, M., Vaandrager, F.: A testing scenario for probabilistic processes. J. ACM 54(6), 29 (2007). https://doi.org/10.1145/1314690.1314693
Crane, H.: The ubiquitous Ewens sampling formula. Stat. Sci. 31(1), 1–19 (2016). https://doi.org/10.1214/15-STS529
Ewens, W.: The sampling theory of selectively neutral alleles. Theoret. Popul. Biol. 3, 87–112 (1972). https://doi.org/10.1016/0040-5809(72)90035-4
Ferguson, T.: A Bayesian analysis of some nonparametric problems. Ann. Stat. 1(2), 209–230 (1973). https://doi.org/10.1214/aos/1176342360
Fritz, T.: A synthetic approach to Markov kernels, conditional independence, and theorems on sufficient statistics. Adv. Math. 370, 107239 (2020). https://doi.org/10.1016/J.AIM.2020.107239
Guichard, D.: Combinatorics and graph theory (2022). https://www.whitman.edu/mathematics/cgt_online/book/
Jacobs, B.: From multisets over distributions to distributions over multisets. In: Logic in Computer Science. IEEE, Computer Science Press (2021). https://doi.org/10.1109/lics52264.2021.9470678
Jacobs, B.: Partitions and Ewens distributions in element-free probability theory. In: Logic in Computer Science. IEEE, Computer Science Press (2022). https://doi.org/10.1145/3531130.3532419
Jacobs, B.: Sufficient statistics and split idempotents in discrete probability theory. In: Mathematical Foundation of Programming Semantics (2022)
Joyce, P.: Partition structures and sufficient statistics. J. Appl. Probab. 35(3), 622–632 (1998). https://doi.org/10.1239/jap/1032265210
Kingman, J.: Random partitions in population genetics. Proc. R. Soc. Ser. A 361, 1–20 (1978). https://doi.org/10.1098/rspa.1978.0089
Kingman, J.: The representation of partition structures. J. London Math. Soc. 18(2), 374–380 (1978). https://doi.org/10.1112/jlms/s2-18.2.374
Lynch, N., Segala, R., Vaandrager, F.: Compositionality for probabilistic automata. In: Amadio, R., Lugiez, D. (eds.) CONCUR 2003. LNCS, vol. 2761, pp. 208–221. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45187-7_14
McCullagh, P., Yang, J.: How many clusters? Bayesian Anal. 3(1), 101–120 (2008). https://doi.org/10.1214/08-BA304
Pitman, J.: Random discrete distributions invariant under size-biased permutation. Adv. Appl. Probab. 28(2), 525–539 (1995). https://doi.org/10.2307/1428070
Pitman, J., Yor, M.: The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25(2), 855–900 (1997). https://doi.org/10.1214/aop/1024404422
Acknowledgements
Thanks are due to Ceel Pierik for helpful discussion on the material in Sect. 5.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Jacobs, B. (2022). A Reconstruction of Ewens’ Sampling Formula via Lists of Coins. In: Jansen, N., Stoelinga, M., van den Bos, P. (eds) A Journey from Process Algebra via Timed Automata to Model Learning . Lecture Notes in Computer Science, vol 13560. Springer, Cham. https://doi.org/10.1007/978-3-031-15629-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-15629-8_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15628-1
Online ISBN: 978-3-031-15629-8
eBook Packages: Computer ScienceComputer Science (R0)