Abstract
Consider the evaluation of model-based functions of cumulative distribution functions that are integrals. When the cumulative distribution function does not have a tractable form but simulation of the multivariate distribution is easily feasible, we can evaluate the integral via a Monte Carlo sample, replacing the model-based distribution function by the empirical distribution function. Given a simulation sample of size N, the naive method uses \(O(N^{2})\) comparisons to compute the empirical distribution function at all N sample vectors. To obtain faster computational speed when N needs to be large to achieve a desired accuracy, we propose methods modified from the popular merge sort and quicksort algorithms that preserve their average \(O(N\log _{2}N)\) complexity in the bivariate case. The modified merge sort algorithm can be extended to the computation of a d-dimensional empirical distribution function at the observed values with \(O(N\log _{2}^{d-1}N)\) complexity. Simulation studies suggest that the proposed algorithms provide substantial time savings when N is large.
Similar content being viewed by others
Notes
The relationship \(1\left\{ Y_{{mj}}\ge Y_{{ij}},Y_{{mk}}\ge Y_{{ik}}\right\} =1\left\{ -Y_{{mj}}\le -Y_{{ij}},-Y_{{mk}}\le -Y_{{ik}}\right\} \) allows one to obtain the empirical survival function at the same order of complexity as the empirical cdf. We therefore only focus on the cdf in this paper.
The densities are coupled with standard normal margins for a better illustration of the permutation asymmetry.
References
Bedford T, Cooke RM (2001) Probability density decomposition for conditionally dependent random variables modeled by vines. Ann Math Artif Intell 32:245–268
Brechmann EC, Czado C, Aas K (2012) Truncated regular vines in high dimensions with application to financial data. Can J Stat 40:68–85
Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. MIT Press, Cambridge
Gumbel EJ (1960) Distributions des valeurs extrêmes en plusieurs dimensions. Publications de l’Institut de Statistique de l’Université de Paris 9:171–173
Hoeffding W (1948) A class of statistics with asymptotically normal distribution. Ann Math Stat 19:293–325
Knuth DE (1998) The art of computer programming: sorting and searching, vol 3, 2nd edn. Addison-Wesley, Reading, MA
Krupskii P, Joe H (2013) Factor copula models for multivariate data. J Multivar Anal 120:85–101
Tawn JA (1988) Bivariate extreme value theory: models and estimation. Biometrika 75:397–415
Acknowledgements
This research has been supported by UBC’s Four Year Doctoral Fellowship and NSERC Discovery Grant 8698. We would like to thank the editors and the anonymous referee for the comments that lead to a better presentation of the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lee, D., Joe, H. Efficient computation of multivariate empirical distribution functions at the observed values. Comput Stat 33, 1413–1428 (2018). https://doi.org/10.1007/s00180-017-0771-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-017-0771-x