Abstract
The computation of rectangular probabilities of multivariate discrete integer distributions such as the multinomial, multivariate hypergeometric or multivariate Pólya distributions is of great interest both for statistical applications and for probabilistic modeling purpose. All these distributions are members of a broader family of multivariate discrete integer distributions for which computationaly efficient approximate methods have been proposed for the evaluation of such probabilities, but with no control over their accuracy. Recently, exact algorithms have been proposed for computing such probabilities, but they are either dedicated to a specific distribution or to very specific rectangular probabilities. We propose a new algorithm that allows to perform the computation of arbitrary rectangular probabilities in the most general case. Its accuracy matches or even outperforms the accuracy exact algorithms when the rounding errors are taken into account. In the worst case, its computational cost is the same as the most efficient exact method published so far, and is much lower in many situations of interest. It does not need any additional storage than the one for the parameters of the distribution, which allows to deal with large dimension/large counting parameter applications at no extra memory cost and with an acceptable computation time, which is a major difference with respect to the methods published so far.
Similar content being viewed by others
Notes
The notation t=Θ(N) means that t is bounded above and below by a linear function of N, while \(t=\mathcal{O}(N)\) means that t is only bounded above by a linear function of N. Here, it is important to make this distinction as the argument in the proof is not the same if t is a Θ(N) or a \(\mathcal{O}(N)\) without being a Θ(N).
For which the wrong value of 0.030837 is reported (using a storage of 1373701 floating point numbers for the computation).
For which the wrong values of resp. 0.877373 and 0.750895 are reported.
References
Abate, J., Whitt, W.: The Fourier-series method for inverting transforms of probability distributions. Queueing Syst. 10(1–2), 5–87 (1992)
Berry, K.J., Mielke, P.W. Jr.: Exact cumulative probabilities fort the multinomial distribution. Technical report 19, Colorado State University (1995)
Butler, R.W., Stutton, R.K.: Saddlepoint approximation for multivariate cumulative distribution functions and probability computations in sampling theory and outlier testing. J. Am. Stat. Assoc. 93(442), 596–604 (1998)
Childs, A., Balakrishnan, N.: Some approximations to the multivariate hypergeometric distribution with applications to hypothesis testing. Comput. Stat. Data Anal. 35(2), 137–154 (2000)
Corrado, C.J.: The exact distribution of the maximum, minimum and the range of multinomial/Dirichlet and multivariate hypergeometric frequencies. Stat. Comput. 21, 349–359 (2011)
Didonato, A.R., Morris, A.H.: Computation of the incomplete gamma function ratios and their inverse. ACM Trans. Math. Softw. 12(4), 377–393 (1986)
Frey, J.: An algorithm for computing rectangular multinomial probabilities. J. Stat. Comput. Simul. 79(12), 1483–1489 (2009)
Good, I.J.: Saddle-point methods for the multinomial distribution. Ann. Math. Stat. 4, 861–881 (1957)
Johnson, N.L.: An approximation to the multinomial distribution some properties and applications. Biometrika 47(1–2), 93–102 (1960)
Levin, B.: A representation for multinomial cumulative distribution functions. Ann. Math. Stat. 9(5), 1123–1126 (1981)
Levin, B.: On calculations involving the maximum cell frequency. Commun. Stat. 12(11), 1299–1327 (1983)
Levin, B.: Siobhan’s problem: the coupon collector revisited. Am. Stat. 46, 76 (1992)
Mallows, C.L.: An inequality involving multinomial probabilities. Biometrika 55, 422–424 (1968)
Maple computer algebra system, www.maplesoft.com
Open TURNS software, www.openturns.org
Python programming language, www.python.org
Temme, N.M.: A set of algorithms for the incomplete gamma functions. Probab. Eng. Inf. Sci. 8, 291 (1994)
Acknowledgements
I would like to thank Pr. Kenneth J. Berry and Pr. Jesse Frey, who kindly provided the source code of their algorithms and additional bibliographical material, as well as the two anonymous reviewers for their very valuable remarks and advices that improved significantly the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lebrun, R. Efficient time/space algorithm to compute rectangular probabilities of multinomial, multivariate hypergeometric and multivariate Pólya distributions. Stat Comput 23, 615–623 (2013). https://doi.org/10.1007/s11222-012-9334-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-012-9334-8