Skip to main content
Log in

Efficient time/space algorithm to compute rectangular probabilities of multinomial, multivariate hypergeometric and multivariate Pólya distributions

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

The computation of rectangular probabilities of multivariate discrete integer distributions such as the multinomial, multivariate hypergeometric or multivariate Pólya distributions is of great interest both for statistical applications and for probabilistic modeling purpose. All these distributions are members of a broader family of multivariate discrete integer distributions for which computationaly efficient approximate methods have been proposed for the evaluation of such probabilities, but with no control over their accuracy. Recently, exact algorithms have been proposed for computing such probabilities, but they are either dedicated to a specific distribution or to very specific rectangular probabilities. We propose a new algorithm that allows to perform the computation of arbitrary rectangular probabilities in the most general case. Its accuracy matches or even outperforms the accuracy exact algorithms when the rounding errors are taken into account. In the worst case, its computational cost is the same as the most efficient exact method published so far, and is much lower in many situations of interest. It does not need any additional storage than the one for the parameters of the distribution, which allows to deal with large dimension/large counting parameter applications at no extra memory cost and with an acceptable computation time, which is a major difference with respect to the methods published so far.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The notation t=Θ(N) means that t is bounded above and below by a linear function of N, while \(t=\mathcal{O}(N)\) means that t is only bounded above by a linear function of N. Here, it is important to make this distinction as the argument in the proof is not the same if t is a Θ(N) or a \(\mathcal{O}(N)\) without being a Θ(N).

  2. For which the wrong value of 0.030837 is reported (using a storage of 1373701 floating point numbers for the computation).

  3. For which the wrong values of resp. 0.877373 and 0.750895 are reported.

References

  • Abate, J., Whitt, W.: The Fourier-series method for inverting transforms of probability distributions. Queueing Syst. 10(1–2), 5–87 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  • Berry, K.J., Mielke, P.W. Jr.: Exact cumulative probabilities fort the multinomial distribution. Technical report 19, Colorado State University (1995)

  • Butler, R.W., Stutton, R.K.: Saddlepoint approximation for multivariate cumulative distribution functions and probability computations in sampling theory and outlier testing. J. Am. Stat. Assoc. 93(442), 596–604 (1998)

    Article  MATH  Google Scholar 

  • Childs, A., Balakrishnan, N.: Some approximations to the multivariate hypergeometric distribution with applications to hypothesis testing. Comput. Stat. Data Anal. 35(2), 137–154 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Corrado, C.J.: The exact distribution of the maximum, minimum and the range of multinomial/Dirichlet and multivariate hypergeometric frequencies. Stat. Comput. 21, 349–359 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Didonato, A.R., Morris, A.H.: Computation of the incomplete gamma function ratios and their inverse. ACM Trans. Math. Softw. 12(4), 377–393 (1986)

    Article  MATH  Google Scholar 

  • Frey, J.: An algorithm for computing rectangular multinomial probabilities. J. Stat. Comput. Simul. 79(12), 1483–1489 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Good, I.J.: Saddle-point methods for the multinomial distribution. Ann. Math. Stat. 4, 861–881 (1957)

    Article  MathSciNet  Google Scholar 

  • Johnson, N.L.: An approximation to the multinomial distribution some properties and applications. Biometrika 47(1–2), 93–102 (1960)

    MathSciNet  MATH  Google Scholar 

  • Levin, B.: A representation for multinomial cumulative distribution functions. Ann. Math. Stat. 9(5), 1123–1126 (1981)

    Article  MATH  Google Scholar 

  • Levin, B.: On calculations involving the maximum cell frequency. Commun. Stat. 12(11), 1299–1327 (1983)

    Article  MATH  Google Scholar 

  • Levin, B.: Siobhan’s problem: the coupon collector revisited. Am. Stat. 46, 76 (1992)

    Google Scholar 

  • Mallows, C.L.: An inequality involving multinomial probabilities. Biometrika 55, 422–424 (1968)

    Article  MATH  Google Scholar 

  • Maple computer algebra system, www.maplesoft.com

  • Open TURNS software, www.openturns.org

  • Python programming language, www.python.org

  • Temme, N.M.: A set of algorithms for the incomplete gamma functions. Probab. Eng. Inf. Sci. 8, 291 (1994)

    Article  Google Scholar 

Download references

Acknowledgements

I would like to thank Pr. Kenneth J. Berry and Pr. Jesse Frey, who kindly provided the source code of their algorithms and additional bibliographical material, as well as the two anonymous reviewers for their very valuable remarks and advices that improved significantly the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Lebrun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lebrun, R. Efficient time/space algorithm to compute rectangular probabilities of multinomial, multivariate hypergeometric and multivariate Pólya distributions. Stat Comput 23, 615–623 (2013). https://doi.org/10.1007/s11222-012-9334-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-012-9334-8

Keywords

Navigation