Efficient computation of multivariate empirical distribution functions at the observed values

Lee, David; Joe, Harry

doi:10.1007/s00180-017-0771-x

Efficient computation of multivariate empirical distribution functions at the observed values

Original Paper
Published: 17 October 2017

Volume 33, pages 1413–1428, (2018)
Cite this article

Computational Statistics Aims and scope Submit manuscript

675 Accesses
2 Citations
Explore all metrics

Abstract

Consider the evaluation of model-based functions of cumulative distribution functions that are integrals. When the cumulative distribution function does not have a tractable form but simulation of the multivariate distribution is easily feasible, we can evaluate the integral via a Monte Carlo sample, replacing the model-based distribution function by the empirical distribution function. Given a simulation sample of size N, the naive method uses $O(N^{2})$ comparisons to compute the empirical distribution function at all N sample vectors. To obtain faster computational speed when N needs to be large to achieve a desired accuracy, we propose methods modified from the popular merge sort and quicksort algorithms that preserve their average $O(N\log _{2}N)$ complexity in the bivariate case. The modified merge sort algorithm can be extended to the computation of a d-dimensional empirical distribution function at the observed values with $O(N\log _{2}^{d-1}N)$ complexity. Simulation studies suggest that the proposed algorithms provide substantial time savings when N is large.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

On a class of $$\sigma $$ -stable Poisson–Kingman models and an effective marginalized sampler

Article 22 August 2014

S. Favaro, M. Lomeli & Y. W. Teh

Computing highest density regions for continuous univariate distributions with known probability functions

Article 05 August 2021

Ben O’Neill

Computing marginal likelihoods via the Fourier integral theorem and pointwise estimation of posterior densities

Article 16 August 2022

Frank Rotiroti & Stephen G. Walker

Notes

The relationship $1\left\{ Y_{{mj}}\ge Y_{{ij}},Y_{{mk}}\ge Y_{{ik}}\right\} =1\left\{ -Y_{{mj}}\le -Y_{{ij}},-Y_{{mk}}\le -Y_{{ik}}\right\} $ allows one to obtain the empirical survival function at the same order of complexity as the empirical cdf. We therefore only focus on the cdf in this paper.
The densities are coupled with standard normal margins for a better illustration of the permutation asymmetry.

References

Bedford T, Cooke RM (2001) Probability density decomposition for conditionally dependent random variables modeled by vines. Ann Math Artif Intell 32:245–268
Article MathSciNet MATH Google Scholar
Brechmann EC, Czado C, Aas K (2012) Truncated regular vines in high dimensions with application to financial data. Can J Stat 40:68–85
Article MathSciNet MATH Google Scholar
Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. MIT Press, Cambridge
MATH Google Scholar
Gumbel EJ (1960) Distributions des valeurs extrêmes en plusieurs dimensions. Publications de l’Institut de Statistique de l’Université de Paris 9:171–173
MATH Google Scholar
Hoeffding W (1948) A class of statistics with asymptotically normal distribution. Ann Math Stat 19:293–325
Article MathSciNet MATH Google Scholar
Knuth DE (1998) The art of computer programming: sorting and searching, vol 3, 2nd edn. Addison-Wesley, Reading, MA
Krupskii P, Joe H (2013) Factor copula models for multivariate data. J Multivar Anal 120:85–101
Article MathSciNet MATH Google Scholar
Tawn JA (1988) Bivariate extreme value theory: models and estimation. Biometrika 75:397–415
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research has been supported by UBC’s Four Year Doctoral Fellowship and NSERC Discovery Grant 8698. We would like to thank the editors and the anonymous referee for the comments that lead to a better presentation of the paper.

Author information

Authors and Affiliations

Department of Statistics, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
David Lee & Harry Joe

Authors

David Lee
View author publications
You can also search for this author in PubMed Google Scholar
Harry Joe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, D., Joe, H. Efficient computation of multivariate empirical distribution functions at the observed values. Comput Stat 33, 1413–1428 (2018). https://doi.org/10.1007/s00180-017-0771-x

Download citation

Received: 18 November 2016
Accepted: 06 October 2017
Published: 17 October 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s00180-017-0771-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Efficient computation of multivariate empirical distribution functions at the observed values

Abstract

Access this article

Similar content being viewed by others

On a class of $$\sigma $$ -stable Poisson–Kingman models and an effective marginalized sampler

Computing highest density regions for continuous univariate distributions with known probability functions

Computing marginal likelihoods via the Fourier integral theorem and pointwise estimation of posterior densities

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient computation of multivariate empirical distribution functions at the observed values

Abstract

Access this article

Similar content being viewed by others

On a class of $$\sigma $$ -stable Poisson–Kingman models and an effective marginalized sampler

Computing highest density regions for continuous univariate distributions with known probability functions

Computing marginal likelihoods via the Fourier integral theorem and pointwise estimation of posterior densities

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation