Skip to main content
Log in

The computing of the Poisson multinomial distribution and applications in ecological inference and machine learning

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The Poisson multinomial distribution (PMD) describes the distribution of the sum of n independent but non-identically distributed random vectors, in which each random vector is of length m with 0/1 valued elements and only one of its elements can take value 1 with a certain probability. Those probabilities are different for the m elements across the n random vectors, and form an \(n \times m\) matrix with row sum equals to 1. We call this \(n\times m\) matrix the success probability matrix (SPM). Each SPM uniquely defines a \({ \text {PMD}}\). The \({ \text {PMD}}\) is useful in many areas such as, voting theory, ecological inference, and machine learning. The distribution functions of \({ \text {PMD}}\), however, are usually difficult to compute and there is no efficient algorithm available for computing it. In this paper, we develop efficient methods to compute the probability mass function (pmf) for the PMD using multivariate Fourier transform, normal approximation, and simulations. We study the accuracy and efficiency of those methods and give recommendations for which methods to use under various scenarios. We also illustrate the use of the \({ \text {PMD}}\) via three applications, namely, in ecological inference, uncertainty quantification in classification, and voting probability calculation. We build an R package that implements the proposed methods, and illustrate the package with examples. This paper has online supplementary materials.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Akter LA, Moon I, Kwon G-R (2019) Double random phase encoding with a Poisson-multinomial distribution for efficient colorful image authentication. Multimedia Tools Appl 78:14613–14632

    Article  Google Scholar 

  • Bentkus V (2005) A Lyapunov-type bound in \(R^d\). Theor Probab Appl 49:311–323

    Article  MATH  Google Scholar 

  • Biscarri W, Zhao SD, Brunner RJ (2018) A simple and fast method for computing the poisson binomial distribution function. Comput Stat Data Anal 122:92–100

    Article  MathSciNet  MATH  Google Scholar 

  • Deitsch S, Christlein V, Berger S, Buerhop-Lutz C, Maier A, Gallwitz F, Riess C (2019) Automatic classification of defective photovoltaic module cells in electroluminescence images. Sol Energy 185:455–468

    Article  Google Scholar 

  • Deitsch S, Buerhop-Lutz C, Sovetkin E, Steland A, Maier A, Gallwitz F, Riess C (2021) Segmentation of photovoltaic module cells in uncalibrated electroluminescence images. Mach Vision Appl. https://doi.org/10.1007/s00138-021-01191-9

    Article  Google Scholar 

  • Frigo M, Johnson S (2005) The design and implementation of FFTW3. Proc IEEE 93:216–231

    Article  Google Scholar 

  • Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press, Cambridge

    MATH  Google Scholar 

  • Hong Y (2013) On computing the distribution function for the Poisson binomial distribution. Comput Stat Data Anal 59:41–51

    Article  MathSciNet  MATH  Google Scholar 

  • Junge F (2021) PoissonBinomial: efficient computation of ordinary and generalized Poisson binomial distributions. R Packag Version 1(2):4

    Google Scholar 

  • Schuessler AA (1999) Ecological inference. Proc Nat Acad Sci 96:10578–10581

    Article  Google Scholar 

  • Zhang M, Hong Y, Balakrishnan N (2018) The generalized Poisson-binomial distribution and the computation of its distribution function. J Stat Comput Simul 88:1515–1527

    Article  MathSciNet  MATH  Google Scholar 

  • Buerhop-Lutz C, Deitsch S, Maier A, Gallwitz F Berger S, Doll B, Hauch J, Camus C, Brabec CJ (2018) A benchmark for visual identification of defective solar cells in electroluminescence imagery. In European PV solar energy conference and exhibition (EU PVSEC), pp. 1287 – 1289

  • Cheng Y, Diakonikolas I, Stewart A (2017) Playing anonymous games using simple strategies. In: Proceedings twenty-eighth annual ACM-SIAM symposium discrete algorithms (SODA). https://doi.org/10.1137/1.9781611974782.40

  • Daskalakis C, Kamath G, Tzamos C (2015) On the structure, covering, and learning of Poisson multinomial distributions. In: 2015 IEEE 56th annual symposium on foundations of computer science, 1203–1217

  • Deitsch S (2018) A benchmark for visual identification of defective solar cells in electroluminescence imagery. https://github.com/zae-bayern/elpv-dataset

  • Diakonikolas I, Kane DM, Stewart A (2016) The Fourier transform of Poisson multinomial distributions and its algorithmic applications. In: Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pp. 1060–1073

  • Dua D, Graff C (2017) UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets/AI4I+2020+Predictive+Maintenance+Dataset

  • Hong Y, Lin Z, Wang Y, Junge F (2022). PoissonMultinomial: the poisson-multinomial distribution. R package version 1.0

  • Mersmann O (2022) FFTW: fast FFT and DCT based on the FFTW library. R package version 1.0-7

Download references

Acknowledgements

The authors thank the editor, associate editor, and two referees, for their valuable comments that helped improve the paper significantly. The authors acknowledge the Advanced Research Computing program at Virginia Tech for providing computational resources. The work by Hong was partially supported Virginia Tech College of Science Research Equipment Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yili Hong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 166 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, Z., Wang, Y. & Hong, Y. The computing of the Poisson multinomial distribution and applications in ecological inference and machine learning. Comput Stat 38, 1851–1877 (2023). https://doi.org/10.1007/s00180-022-01299-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-022-01299-0

Keywords

Navigation