Abstract
In this paper, we present the fundamentals of a hierarchical algorithm for computing the N-dimensional integral \(\phi (\mathbf {a}, \mathbf {b}; A) = {\int \limits }_{\mathbf {a}}^{\mathbf {b}} H(\mathbf {x}) f(\mathbf {x} | A) \text {d} \mathbf {x}\) representing the expectation of a function H(X) where f(x|A) is the truncated multi-variate normal (TMVN) distribution with zero mean, x is the vector of integration variables for the N-dimensional random vector X, A is the inverse of the covariance matrix Σ, and a and b are constant vectors. The algorithm assumes that H(x) is “low-rank” and is designed for properly clustered X so that the matrix A has “low-rank” blocks and “low-dimensional” features. We demonstrate the divide-and-conquer idea when A is a symmetric positive definite tridiagonal matrix and present the necessary building blocks and rigorous potential theory–based algorithm analysis when A is given by the exponential covariance model. The algorithm overall complexity is O(N) for N-dimensional problems, with a prefactor determined by the rank of the off-diagonal matrix blocks and number of effective variables. Very high accuracy results for N as large as 2048 are obtained on a desktop computer with 16G memory using the fast Fourier transform (FFT) and non-uniform FFT to validate the analysis. The current paper focuses on the ideas using the simple yet representative examples where the off-diagonal matrix blocks are rank 1 and the number of effective variables is bounded by 2, to allow concise notations and easier explanation. In a subsequent paper, we discuss the generalization of current scheme using the sparse grid technique for higher rank problems and demonstrate how all the moments of kth order or less (a total of O(Nk) integrals) can be computed using O(Nk) operations for k ≥ 2 and \(O(N \log N)\) operations for k = 1.
Similar content being viewed by others
References
Arellano-Valle, R.B., Azzalini, A.: On the unification of families of skew-normal distributions. Scand. J. Stat. 33(3), 561–574 (2006)
Arellano-Valle, R.B., Branco, M.D., Genton, M.G.: A unified view on skewed distributions arising from selections. Canadian Journal of Statistics 34 (4), 581–601 (2006)
Arellano-Valle, R.B., Genton, M.G.: On the exact distribution of the maximum of absolutely continuous dependent random variables. Statistics & Probability Letters 78(1), 27–35 (2008)
Azzalini, A., Capitanio, A.: The skew-normal and related families, vol. 3. Cambridge University Press, Cambridge (2014)
Castruccio, S., Huser, R., Genton, M.G.: High-order composite likelihood inference for max-stable distributions and processes. J. Comput. Graph. Stat. 25(4), 1212–1229 (2016)
Huser, R., Dombry, C., Ribatet, M., Genton, M.G.: Full likelihood inference for max-stable data. Stat 8(1), e218 (2019)
Genton, M.G.: Skew-elliptical distributions and their applications: a journey beyond normality. CRC Press (2004)
Genton, M.G., Keyes, D.E., Turkiyyah, G.: Hierarchical decompositions for the computation of high-dimensional multivariate normal probabilities. J. Comput. Graph. Stat. 27(2), 268–277 (2018)
Stephenson, A., Tawn, J.: Exploiting occurrence times in likelihood inference for componentwise maxima. Biometrika 92(1), 213–227 (2005)
Barthelmann, V., Novak, E., Ritter, K.: High dimensional polynomial interpolation on sparse grids. Adv. Comput. Math. 12(4), 273–288 (2000)
Gerstner, T., Griebel, M.: Numerical integration using sparse grids. Numerical algorithms 18(3), 209–232 (1998)
Shen, J., Yu, H.: Efficient spectral sparse grid methods and applications to high-dimensional elliptic problems. SIAM J. Sci. Comput. 32(6), 3228–3250 (2010)
Genz, A., Bretz, F.: Computation of multivariate normal and t probabilities, vol. 195. Springer Science & Business Media, New York (2009)
Azzimonti, D., Ginsbourger, D.: Estimating orthant probabilities of high-dimensional Gaussian vectors with an application to set estimation. J. Comput. Graph. Stat. 27(2), 255–267 (2018)
Botev, Z.I.: The normal law under linear restrictions: simulation and estimation via minimax tilting. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79(1), 125–148 (2017)
Botev, Z.I., L’Ecuyer, P.: Efficient probability estimation and simulation of the truncated multivariate student-t distribution. In: Proceedings of the 2015 Winter Simulation Conference, pp 380–391. IEEE Press (2015)
Craig, P.: A new reconstruction of multivariate normal orthant probabilities. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70(1), 227–243 (2008)
Fayed, H., Atiya, A.: A novel series expansion for the multivariate normal probability integrals based on Fourier series. Math. Comput. 83(289), 2385–2402 (2014)
Genz, A.: Numerical computation of multivariate normal probabilities. Journal of computational and graphical statistics 1(2), 141–149 (1992)
Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., Bornkamp, B., Maechler, M., Hothorn, T.: Multivariate normal and t distributions (2014)
Geweke, J.: Efficient simulation from the multivariate normal and student-t distributions subject to linear constraints and the evaluation of constraint probabilities. Seattle, USA (1991)
Hajivassiliou, V., McFadden, D., Ruud, P.: Simulation of multivariate normal rectangle probabilities and their derivatives theoretical and computational results. Journal of econometrics 72(1-2), 85–134 (1996)
Keane, M.P.: 20 simulation estimation for panel data models with limited dependent variables. Handbook of Statistics 11, 545–571 (1993)
Kuo, F., Sloan, I., Woźniakowski, H: Multivariate integration for analytic functions with Gaussian kernels. Math. Comput. 86(304), 829–853 (2017)
Meyer, C.: Recursive numerical evaluation of the cumulative bivariate normal distribution. arXiv preprint arXiv:1004.3616 (2010)
Miwa, T., Hayter, A.J., Kuriki, S.: The evaluation of general non-centred orthant probabilities. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65(1), 223–234 (2003)
Pakman, A., Paninski, L.: Exact Hamiltonian Monte Carlo for truncated multivariate Gaussians. J. Comput. Graph. Stat. 23(2), 518–542 (2014)
Phinikettos, I., Gandy, A.: Fast computation of high-dimensional multivariate normal probabilities. Computational Statistics & Data Analysis 55(4), 1521–1529 (2011)
Ridgway, J.: Computation of Gaussian orthant probabilities in high dimension. Statistics and computing 26(4), 899–916 (2016)
Wang, X.: Strong tractability of multivariate integration using quasi–Monte Carlo algorithms. Math. Comput. 72(242), 823–838 (2003)
Floros, D., Liu, T., Pitsianis, N., Sun, X.: Sparse dual of the density peaks algorithm for cluster analysis of high-dimensional data. In: 2018 IEEE High Performance extreme Computing Conference (HPEC), pp 1–14. IEEE (2018)
Yu, C.D., Reiz, S., Biros, G.: Distributed-memory hierarchical compression of dense SPD matrices. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, p 15. IEEE Press (2018)
Greengard, L., Gueyffier, D., Martinsson, P.-G., Rokhlin, V.: Fast direct solvers for integral equations in complex three-dimensional domains. Acta Numerica 18, 243–275 (2009)
Hackbusch, W.: A sparse matrix arithmetic based on \({\mathscr{H}}\)-matrices. Part I: Introduction to \({\mathscr{H}}\)-matrices. Computing 62(2), 89–108 (1999)
Hackbusch, W., Khoromskij, B.N.: A sparse \({\mathscr{H}}\)-matrix arithmetic. Computing 64(1), 21–47 (2000)
Ho, K.L., Greengard, L.: A fast direct solver for structured linear systems by recursive skeletonization. SIAM J. Sci. Comput. 34(5), A2507–A2532 (2012)
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Mathematics of computation 19(90), 297–301 (1965)
Brandt, A.: Multi-level adaptive solutions to boundary-value problems. Mathematics of computation 31(138), 333–390 (1977)
Hackbusch, W.: Multi-grid methods and applications, vol. 4. Springer Science & Business Media, New York (2013)
Greengard, L., Rokhlin, V.: A fast algorithm for particle simulations. Journal of computational physics 73(2), 325–348 (1987)
Greengard, L., Rokhlin, V.: A new version of the fast multipole method for the Laplace equation in three dimensions. Acta numerica 6, 229–269 (1997)
Frigo, M., Johnson, S.G.: FFTW: An adaptive software architecture for the FFT. In: Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, 3, pp. 1381–1384. IEEE (1998)
Barnett, A.H., Magland, J., af Klinteberg, L.: A parallel nonuniform fast fourier transform library based on an “exponential of semicircle” kernel. SIAM J. Sci. Comput. 41(5), C479–C504 (2019)
Greengard, L., Lee, J-Y: Accelerating the nonuniform fast Fourier transform. SIAM review 46(3), 443–454 (2004)
Lee, J-Y, Greengard, L.: The type 3 nonuniform FFT and its applications. J. Comput. Phys. 206(1), 1–5 (2005)
Bebendorf, M.: Hierarchical Matrices: A Means to Efficiently Solve Elliptic Boundary Value Problems. Lecture Notes in Computational Science and Engineering, vol. 63. Springer, New York (2008)
Li, K.-C.: Sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 86(414), 316–327 (1991)
Ho, K.L., Ying, L.: Hierarchical interpolative factorization for elliptic operators: differential equations. Commun. Pure Appl. Math. 69(8), 1415–1451 (2016)
Ho, K.L., Ying, L.: Hierarchical interpolative factorization for elliptic operators: integral equations. Commun. Pure Appl. Math. 69(7), 1314–1353 (2016)
Halko, N., Martinsson, P.-G., Tropp, J.A.: Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review 53(2), 217–288 (2011)
Boyd, J.P.: Asymptotic Fourier coefficients for a \(c^{\infty }\) bell (smoothed-“top-hat”) and the Fourier extension problem. J. Sci. Comput. 29(1), 1–24 (2006)
Abrarov, S.M., Quine, B.M.: Efficient algorithmic implementation of the Voigt/complex error function based on exponential series approximation. Appl. Math. Comput. 218(5), 1894–1902 (2011)
Abrarov, S.M., Quine, B.M.: On the Fourier expansion method for highly accurate computation of the Voigt/complex error function in a rapid algorithm. arXiv preprint arXiv:1205.1768 (2012)
Gautschi, W.: Efficient computation of the complex error function. SIAM J. Numer. Anal. 7(1), 187–198 (1970)
Karbach, T.M., Raven, G., Schiller, M.: Decay time integrals in neutral meson mixing and their efficient evaluation. arXiv preprint arXiv:1407.0748 (2014)
Gilbert, A.C., Indyk, P., Iwen, M., Schmidt, L.: Recent developments in the sparse Fourier transform: A compressed Fourier transform for big data. IEEE Signal Process. Mag. 31(5), 91–100 (2014)
Nobile, F., Tempone, R., Webster, C.G.: A sparse grid stochastic collocation method for partial differential equations with random input data. SIAM J. Numer. Anal. 46(5), 2309–2345 (2008)
Gander, M.J.: An eigendecomposition updating algorithm for large Hermitian matrices under low rank perturbations (1998)
Acknowledgements
J. Huang was supported by the NSF grant DMS1821093, and the work was finished while he was a visiting professor at the King Abdullah University of Science and Technology, National Center for Theoretical Sciences (NCTS) in Taiwan, Mathematical Center for Interdisciplinary Research of Soochow University, and Institute for Mathematical Sciences of the National University of Singapore.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Leslie Greengard
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Potential theory–based analysis
Appendix: Potential theory–based analysis
The covariance matrix and covariance functions are often related to the solutions of elliptic partial differential equations. In the following, we apply the potential theory from the analysis of ordinary and partial differential equations and show how the divide-and-conquer strategy can be successfully performed on the hierarchical tree structure when A is the exponential matrix. A statistical analysis–based approach for a general \({\mathscr{H}}\)-matrix with rank 1 off-diagonals is presented in Theorem 4. Purely numerical linear algebra–based approaches for more general cases will be addressed in subsequent papers.
Green’s functions: We present the results for bz = 1 to simplify the notations and assume zj ∈ [0, 1]. We start from the observation that
is the domain Green’s function of the ordinary differential equation (ODE) two-point boundary value problem
where \(coef=\frac 12\), gl(z) = ez− 1 and gr(z) = e1−z. The proof is a straightforward validation that \(u(z)= {{\int \limits }_{0}^{1}} G(z,y) f(y) dy\) satisfies both the ODE and boundary conditions.
In the following discussions, we consider the continuous version of the original matrix problem, where the matrix A is the discretized Green’s function G(z, y), the two off-diagonal submatrices A1,2 and A1,3 are the discretized gr(z) ⋅ gl(y) and gr(y) ⋅ gl(z), respectively. Some simple algebra manipulations show that the submatrices \(A_{1,1}- \frac {\lambda }{\gamma ^{2}} \mathbf {v} \mathbf {v}^{T}\) and \(A_{1,4}-\gamma ^{2} \lambda \mathbf {u} \mathbf {u}^{T}\) can be considered as the discretized \(G(z,y)-\tilde {\gamma }^{2} \cdot g_{l}(z) \cdot g_{l}(y)\) and \(G(z,y)-\frac {1}{\tilde {\gamma }^{2}} \cdot g_{r}(z) \cdot g_{r}(y)\), and the coefficients \(i t \frac {\sqrt {2\lambda }}{\gamma } \mathbf {v}^{T}\) and \(i t \sqrt {2\lambda } \gamma \mathbf {u}^{T}\) for the linear terms of the x-variables x1,1 and x1,2 in Eq. (??) are the discretized \(i t \tilde {\gamma } g_{l}(z)\) and \( i t \frac {1}{\tilde {\gamma }} g_{r}(z)\), respectively.
Remark: The observation also allows easy proof of the positive definiteness of the matrix A, which is the discretized Green’s function G(z, y). In order to show that for any vector f ≠ 0, the quadratic form satisfies \(\frac 12 \mathbf {f}^{T} A \mathbf {f} >0\), we consider its continuous version defined as
where \(u(z)= {{\int \limits }_{0}^{1}} G(z,y) f(y) dy\) and f(y) is the continuous version of the (discretized) vector f. As \(f(z)=u(z)-u^{\prime \prime }(z)\), applying the integration by parts, we have
As f(z) ≠ 0, therefore u(z) ≠ 0, and applying the boundary conditions of the ODE, we have \({{\int \limits }_{0}^{1}} f(z) u(z) dz>0\). We refer to the two-variable function G(z, y) as a positive definite function. The positive definiteness of the matrix A can be proved in a similar way using the discretized integration by parts.
A particular choice of γ can be determined by considering the corresponding child ODE problems as follows. We first study the root problem and define its two children as the left child and right child, and the locations zi of the left child and zj of the right child satisfy the condition zi < zj as the z-locations of the x-variables in the two child problems are separated and ordered. We pick a location ζ between the two clusters of z-locations. Note that the choice of ζ is not unique, and a particular choice is the midpoint of the two sets. We have the following results for the root node.
Theorem 2
If we choose \(\tilde {\gamma } = \frac {e^{-\zeta }}{\sqrt {2}}\), then
-
1.
For the left child, the new function \(G_{l}(z,y)=G(z,y)-\tilde {\gamma }^{2} \cdot g_{l}(z) \cdot g_{l}(y)\) is the Green function of the ODE problem
$$ \left\{ \begin{array}{l} \mathbf{u}_1(z)-\mathbf{u}_1^{\prime\prime}(z)=f(z), \quad z \in [0,\zeta], \\ \mathbf{u}_1(0)=\mathbf{u}_1^{\prime}(0), \quad \mathbf{u}_1(\zeta)=0. \end{array} \right. $$(19)The function Gl(z, y) is positive definite.
-
2.
For the right child, the new function \(G_{r}(z,y)=G(z,y)-\frac {1}{\tilde {\gamma }^{2}} \cdot g_{r}(z) \cdot g_{r}(y)\) is the Green function of the ODE problem
$$ \left\{ \begin{array}{l} \mathbf{u}_2(z)-\mathbf{u}_2^{\prime\prime}(z)=f(z), \quad z \in [\zeta, 1], \\ \mathbf{u}_2(\zeta)=0, \quad \mathbf{u}_2(1)=-\mathbf{u}_2^{\prime}(1). \end{array} \right. $$(20)The function Gr(z, y) is positive definite.
-
3.
The two child ODE problem solutions u1(z) and u2(z) can be derived by subtracting a single-layer potential defined at z = ζ from the parent’s solution u(z) of Eq. (18), so that solutions u1(z) and u2(z) satisfy the zero interface condition at z = ζ. The other boundary condition for each child ODE problem is the same as its parent’s boundary condition.
These results can be easily validated by plugging in the functions to the ODE problems. The positive definiteness of the child Green’s function can be proved using the same integration by part technique as we did for the parent’s Green’s function.
For a general parent node on the tree structure, we have the following generalized results.
Theorem 3
Consider a parent node with the corresponding function Gp(z, y) defined on the interval [a, b], and ζ is a point separating the two children’s z-locations. Then there exists a number \(\tilde {\gamma }\) which depends on ζ, such that
-
1.
For the left child, the new function \(G_{l}(z,y)=G(z,y)-\tilde {\gamma }^{2} \cdot g_{l}(z) \cdot g_{l}(y)\) is the Green function of the ODE problem
$$ \left\{ \begin{array}{l} \mathbf{u}_1(z)-\mathbf{u}_1^{\prime\prime}(z)=f(z), \quad z \in [a,\zeta], \\ \text{same boundary condition as parent at } x=a, \text{ and } \mathbf{u}_1(\zeta)=0. \end{array} \right. $$(21)The function Gl(z, y) is positive definite.
-
2.
For the right child, the new function \(G_{r}(z,y)=G(z,y)-\frac {1}{\tilde {\gamma }^{2}} \cdot g_{r}(z) \cdot g_{r}(y)\) is the Green function of the ODE problem
$$ \left\{ \begin{array}{l} \mathbf{u}_2(z)-\mathbf{u}_2^{\prime\prime}(z)=f(z), \quad z \in [\zeta, b], \\ \mathbf{u}_2(\zeta)=0, \text{ and same boundary condition as parent at } x=b. \end{array} \right. $$(22)The function Gr(z, y) is positive definite.
-
3.
The two child ODE problem solutions u1(z) and u2(z) can be derived by subtracting a single-layer potential defined at z = ζ from the parent’s solution u(z) of Eq. (18), so that solutions u1(z) and u2(z) satisfy the zero interface condition at z = ζ. The other boundary condition for each child ODE problem is the same as its parent’s boundary condition.
The detailed formulas for the number \(\tilde {\gamma }\) and Green’s functions are presented next. The proof of the theorem is simply validations of the formulas.
The Green function Gp(z, y) of a parent node p and the functions G1(z, y) and G2(z, y) of p’s left child 1 and right child 2 are defined as
We assume parent’s z-locations satisfy z ∈ [a, b]. We choose ζ = c to separate the parent’s locations, and the child intervals are therefore [a, c] and [c, b], respectively.
Case 1: p is the root node (a = 0, b = 1): The functions are
Case 2: p is a left boundary node(a = 0): The functions are
Case 3: p is a right boundary node(b = 1): The functions are
Case 4: p is an interior node: The functions are
Next, we present the relations of the parent p’s two w-variables \({w_{1}^{p}}\) and \({w_{2}^{p}}\) with the left child 1’s two w-variables {\({w_{1}^{1}}\), \({w_{2}^{1}}\)} and right child 2’s two w-variables {\({w_{1}^{2}}\), \({w_{2}^{2}}\)}. We use tnew to represent the new t-variable introduced to divide the parent problem into two sub-problems of child 1 and child 2. We use a unified set of basis functions for each node on the hierarchical tree structure. For the parent node, the basis functions are \(\{ {{{\varPhi }}_{1}^{p}}=\cosh (z-c), {{{\varPhi }}_{2}^{p}}=\frac {\sinh (z-c)}{b-a} \}\). The basis functions for the left and right children are \(\{ {{{\varPhi }}_{1}^{1}} = \cosh (z-p), {{{\varPhi }}_{2}^{1}}=\frac {\sinh (z-p)}{c-a} \}\) and \(\{ {{{\varPhi }}_{1}^{2}}=\cosh (z-q), {{{\varPhi }}_{2}^{2}}=\frac {\sinh (z-q)}{b-c} \}\), respectively, where p and q are either the interface ζ points when further subdividing the two child problems, or the mid-point of the child intervals when they become leaf nodes.
Case 1: p is the root node (a = 0, b = 1): Parent has no effective w-variables.
Case 2: p is a left boundary node(a = 0):
Case 3: p is a right boundary node(b = 1):
Case 4: p is an interior node:
Mathematica files for computing these formulas are available.
Finally, we present a theorem which guarantees the positive definiteness of the child problems when a proper parameter is chosen in the divide-and-conquer scheme for a \({\mathscr{H}}\)-matrix with rank 1 off-diagonal blocks. The theorem applies to both the tridiagonal and exponential cases.
Theorem 4
Consider a positive definite covariance matrix of two random vectors X and Y with rank one off-diagonal blocks in the form

where c is a constant and u = [u1, u2, ⋯ , un] and v = [v1, v2, ⋯ , vn] are row vectors. Then, there exists an interval of γ values such that the matrices
are both symmetric positive definite.
Proof
Without loss of generality, we assume c > 0. Note that any symmetric positive definite matrix \(\left (\begin {array}{cc} A_{1,1} & A_{1,2} \\ A_{2,1} & A_{2,2} \end {array} \right ) \) can be reduced to the standard form in Eq. (31) using an orthogonal transformation \(\left (\begin {array}{cc} U & \huge 0 \\ \huge 0 & V \end {array} \right )\) where U and V are the normalized eigenvectors for A1,1 and A2,2, respectively. Apply Lemma 2.2 from [58], we need to find γ such that
We consider the conditional variance
which is symmetric positive definite from statistics theory (or linear algebra). Therefore using Lemma 2.2 from [58], we have
which is equivalent to \( c {\sum }_{k=1}^{n} \frac {{u_{i}^{2}}}{{\sigma _{i}^{2}}} \leq \frac {1}{c {\sum }_{k=1}^{n} \frac {{v_{i}^{2}}}{{\lambda _{i}^{2}}}} \). Clearly, any γ value in the interval \([ c {\sum }_{k=1}^{n} \frac {{u_{i}^{2}}}{{\sigma _{i}^{2}}}, \frac {1}{c {\sum }_{k=1}^{n} \frac {{v_{i}^{2}}}{{\lambda _{i}^{2}}}} ]\) will satisfy the conditions in Eqs. (16), e.g., one can choose the mid-point of this interval to “balance” the positive definiteness of the two child problems. This completes the proof. □
Rights and permissions
About this article
Cite this article
Huang, J., Cao, J., Fang, F. et al. An O(N) algorithm for computing expectation of N-dimensional truncated multi-variate normal distribution I: fundamentals. Adv Comput Math 47, 65 (2021). https://doi.org/10.1007/s10444-021-09888-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10444-021-09888-1
Keywords
- Exponential covariance model
- Fourier transform
- Hierarchical algorithm
- Low-dimensional structure
- Low-rank structure
- Truncated multi-variate normal distribution