Skip to main content
Log in

The locally Gaussian density estimator for multivariate data

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

It is well known that the Curse of Dimensionality causes the standard Kernel Density Estimator to break down quickly as the number of variables increases. In non-parametric regression, this effect is relieved in various ways, for example by assuming additivity or some other simplifying structure on the interaction between variables. This paper presents the Locally Gaussian Density Estimator (LGDE), which introduces a similar idea to the problem of density estimation. The LGDE is a new method for the non-parametric estimation of multivariate probability density functions. It is based on preliminary transformations of the marginal observation vectors towards standard normality, and a simplified local likelihood fit of the resulting distribution with standard normal marginals. The LGDE is introduced, and asymptotic theory is derived. In particular, it is shown that the LGDE converges at a speed that does not depend on the dimension. Examples using real and simulated data confirm that the new estimator performs very well on finite sample sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Aas, K., Czado, C., Frigessi, A., Bakken, H.: Pair-copula constructions of multiple dependence. Insur. Math. Econ. 44(2), 182–198 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Andrews, D.W.K.: Generic uniform convergence. Econom. Theory 8(2), 241–257 (1992)

    Article  MathSciNet  Google Scholar 

  • Azzalini, A.: The skew-normal distribution and related multivariate families. Scand. J. Stat. 32(2), 159–188 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Bedford, T., Cooke, R.M.: Vines: a new graphical model for dependent random variables. Ann. Stat. 30(4), 1031–1068 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Berentsen, G.D., Tjøstheim, D.: Recognizing and visualizing departures from independence in bivariate data using local Gaussian correlation. Stat. Comput. 24(5), 785–801 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Berentsen, G.D., Støve, B., Tjøstheim, D., Nordbø, T.: Recognizing and visualizing copulas: an approach using local Gaussian approximation. Insur. Math. Econ. 57, 90–103 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Billingsley, Patrick: Probability and Measure. Wiley, Oxford (2008)

    MATH  Google Scholar 

  • Bowman, A.W.: An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2), 353–360 (1984)

    Article  MathSciNet  Google Scholar 

  • Friedman, J.H., Stuetzle, W., Schroeder, A.: Projection pursuit density estimation. J. Am. Stat. Assoc. 79(387), 599–608 (1984)

    Article  MathSciNet  Google Scholar 

  • Geenens, G.: Probit transformation for kernel density estimation on the unit interval. J. Am. Stat. Assoc. 109(505), 346–358 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Geenens, G., Charpentier, A., Paindaveine, D.: Probit transformation for nonparametric kernel estimation of the copula density. arXiv:1404.4414 (preprint) (2014)

  • Genest, C., Segers, J.: On the covariance of the asymptotic empirical copula process. J. Multivar. Anal. 101(8), 1837–1845 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Hall, P.: On Kullback–Leibler loss and density estimation. Ann. Stat. 15(4), 1491–1519 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  • Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Data Mining, Inference, and Prediction. Springer, Berlin (2009)

    MATH  Google Scholar 

  • Hjort, N.L., Glad, I.K.: Nonparametric density estimation with a parametric start. Ann. Stat. 23(3), 882–904 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  • Hjort, N.L., Jones, M.C.: Locally parametric nonparametric density estimation. Ann. Stat. 24(4), 1619–1647 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Hwang, J.-N., Lay, S.-R., Lippman, A.: Nonparametric multivariate density estimation: a comparative study. Signal Process. IEEE Trans. 42(10), 2795–2810 (1994)

    Article  Google Scholar 

  • Jones, M.C.: Simple boundary correction for kernel density estimation. Stat. Comput. 3(3), 135–146 (1993)

    Article  Google Scholar 

  • Loader, C.R.: Local likelihood density estimation. Ann. Stat. 24(4), 1602–1618 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Mikosch, T.: Copulas: tales and facts. Extremes 9(1), 3–20 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Nagler, T., Czado, C.: Evading the curse of dimensionality in nonparametric density estimation with simplified vine copulas. J. Multivar. Anal. 151, 69–89 (2016)

  • Newey, W.K.: Uniform convergence in probability and stochastic equicontinuity. Econometrica 59(4), 1161–1167 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  • Otneim, H., Karlsen, H.A., Tjøstheim, D.: Bias and bandwidth for local likelihood density estimation. Stat. Probab. Lett. 83(5), 1382–1387 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  • Politis, D.N., Romano, J.P.: Multivariate density estimation with general flat-top kernels of infinite order. J. Multivar. Anal. 68(1), 1–25 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ (2015)

  • Ruppert, D., Cline, D.B.H.: Bias reduction in kernel density estimation by smoothed empirical transformations. Ann. Stat. 22(1), 185–210 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  • Schervish, M.J.: Theory of Statistics. Springer, Berlin (1995)

    Book  MATH  Google Scholar 

  • Severini, T. A.: Likelihood Methods in Statistics. Oxfordscience publications. Oxford University Press. ISBN9780198506508 (2000)

  • Silverman, B.W.: Density Estimation for Statistics and Data Analysis, vol. 26. CRC Press, London (1986)

    Book  MATH  Google Scholar 

  • Sklar, A.: Fonctions de répartition à n dimensions et leurs marges. Université Paris 8 (1959)

  • Sperlich, S., Tjøstheim, D., Yang, L.: Nonparametric estimation and testing of interaction in additive models. Econom. Theory 18(02), 197–251 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Stone, C.J.: An asymptotically optimal window selection rule for kernel density estimates. Ann. Stat. 12(4), 1285–1297 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  • Stone, C.J.: Large-sample inference for log-spline models. Ann. Stat. 18(2), 717–741 (1990)

  • Stone, C.J., Hansen, M.H., Kooperberg, C., Truong, Y.K., et al.: Polynomial splines and their tensor products in extended linear modeling: 1994 Wald Memorial Lecture. Ann. Stat. 25(4), 1371–1470 (1997)

    Article  MATH  Google Scholar 

  • Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B (Methodological) 36(2), 111–147 (1974)

    MathSciNet  MATH  Google Scholar 

  • Støve, B., Tjøstheim, D., Hufthammer, K.O.: Using local Gaussian correlation in a nonlinear re-examination of financial contagion. J. Empir. Fin. 25, 62–82 (2014)

    Article  Google Scholar 

  • Tjøstheim, D., Hufthammer, K.O.: Local Gaussian correlation: a new measure of dependence. J. Econom. 172(1), 33–48 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Van der Vaart, A.W.: Asymptotic Statistics, vol. 3. Cambridge University Press, Cambridge (2000)

    MATH  Google Scholar 

  • Wand, M.P., Jones, M.C.: Multivariate plug-in bandwidth selection. Comput. Stat. 9(2), 97–116 (1994)

    MathSciNet  MATH  Google Scholar 

  • Wand, M.P., Marron, J.S., Ruppert, D.: Transformations in density estimation. J. Am. Stat. Assoc. 86(414), 343–353 (1991)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors gratefully acknowledge Bård Støve for supplying the data set, and Kjersti Aas for valuable comments in the preparation of this paper. The authors would also like to thank two anonymous referees, whose comments have greatly improved this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Håkon Otneim.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 49 KB)

Appendices

Appendix A: Proofs

1.1 A.1: Proof of Theorem 1

The method of proof is the same as that of Severini (2000), pp. 105–107 for ordinary maximum likelihood estimates. The proof requires the additional assumption of uniform convergence in probability of the local likelihood function:

$$\begin{aligned} \sup _{\rho \in \varTheta } \left| L_n(\rho ,\mathbf {Z}) - Q_{\mathbf {h}_n,K}(\rho )\right| \mathop {\rightarrow }\limits ^{P} 0 \,\,\, \text { as } \,\,\, n\rightarrow \infty . \end{aligned}$$
(13)

The bivariate version of (6) satisfies condition (13) provided that condition (ii) is fulfilled. To see this, consider \(\psi (\cdot ,\rho )\) as a function of the parameter; it is bounded and differentiable to any order on the compact set \(\varTheta \), and so is its logarithm. Thus \(g(\rho ) = \log \psi (\cdot ,\rho )f(\cdot ) - \psi (\cdot ;\rho )\) is uniformly continuous there, so for every \(\epsilon >0\) there exists a \(\delta >0\) such that if \(|\rho _1-\rho _2|<\delta \) then \(|g(\rho _1)-g(\rho _2)|<\epsilon \). Multiplying with a kernel and integrating over a different variable conserves this property, because if \(|\rho _1-\rho _2|<\delta \), then

$$\begin{aligned}&\left| \int K_{\mathbf {h}_n}(\mathbf {y}-\mathbf {z})g(\rho _1)\,\text {d}\mathbf {y} - \int K_{\mathbf {h}_n}(\mathbf {y}-\mathbf {z})g(\rho _2)\,\text {d}\mathbf {y}\right| \\&\quad \le \int K_{\mathbf {h}_n}(\mathbf {y}-\mathbf {z})|g(\rho _1)-g(\rho _2)|\,\text {d}\mathbf {y} \\&\quad < \epsilon \int K_{\mathbf {h}_n}(\mathbf {y}-\mathbf {z})\,\text {d}\mathbf {y} = \epsilon . \end{aligned}$$

The \(\epsilon \) and \(\delta \) do not depend on \(\mathbf {h}\) nor n, so \(\{Q_{\mathbf {h}_n,K}(\rho )\}\) form an equicontinuous family of functions. Further, and again exploiting the smoothness of \(\psi (\cdot ,\rho )\) on a compact set \(\varTheta \), the local likelihood functions are Lipschitz continuous there by the mean value theorem. The conditions in Corollary 2.2 by Newey (1991) are thus satisfied, and condition (13) follows thereof. It follows from the uniform convergence that

$$\begin{aligned} \sup _{\rho \in \varTheta }L_n(\rho , \mathbf {Z}) = L_n(\widehat{\rho },\mathbf {Z}) \mathop {\rightarrow }\limits ^{P} \sup _{\rho \in \varTheta }Q(\rho ) = Q(\rho _0). \end{aligned}$$

The rest of the argument follows exactly that of Severini (2000), pp. 105–107 for ordinary maximum likelihood estimates.

1.2 A.2: Proof of Theorems 2 and 3

We establish joint asymptotic normality of the local correlation vector by first following the standard argument for ordinary maximum likelihood estimates in the bivariate, and thus one-parameter case, and then apply a central limit argument, which amounts to a proof of Theorem 2. Then we make use of the Cramèr–Wold device to include the multi-parameter case. In the end, we show that the off-diagonal elements in the covariance matrix vanish asymptotically. In the bivariate case, we must verify the following conditions in order to use Theorem 7.63 in Schervish (1995) and Theorem 1A of Parzen (1962):

  1. (I)

    The parametric family \(\psi (\mathbf {z},\rho )\) is continuously differentiable with respect to \(\rho \);

  2. (II)

    \(\int |u(\mathbf {z},\rho _0) f(\mathbf {z})|<\infty \);

  3. (III)

    There exists a function \(T_r(\mathbf {z},\rho )\) such that for each \(\rho _0\in \text {int}(\varTheta )\) and,

    $$\begin{aligned} \sup _{|\rho -\rho _0|\le r} \left| \partial ^2 L_n(\rho _0, \mathbf {z})/\partial \rho ^2 - \partial ^2L_n(\rho ,\mathbf {z})/\partial \rho ^2 \right| \le T_r(\mathbf {z},\rho _0) \end{aligned}$$

    with \(\lim _{r\rightarrow 0}\text {E}T_r(\mathbf {Z},\rho _0) = 0\) (stochastic equicontinuity).

The parametric family is Gaussian, so condition (I) is obviously true. The local score function \(u(\mathbf {z},\rho ) = \partial \log \psi (\mathbf {z},\rho )/\partial \rho \) in the bivariate Gaussian case is given by

$$\begin{aligned} u(z_1,z_2,\rho ) = \frac{\rho ^3 - z_1z_2(1+\rho ^2) + (z_1^2+z_2^2-1)\rho }{(1-\rho ^2)^2}, \end{aligned}$$
(14)

and the stochastic variable \(\mathbf {Z} = (Z_1,Z_ 2)\), having density \(f_{\mathbf {Z}}\), has moments of all orders since the marginals are standard normal. Therefore, E\(|u(\mathbf {Z},\rho )|<\infty \), so (II) is satisfied. Further, Andrews (1992) shows that uniform continuity of \(\partial ^2L_n(\rho )/\partial \rho ^2\) as well as Lipschitz continuity of \(|\partial ^2L_n(\rho ,\mathbf {z})/\partial \rho ^2 - \partial ^2L(\rho _0,\mathbf {z})/\partial \rho ^2|\) suffice for stochastic equicontinuity as required in condition (III). The argument in Appendix A.1 goes through also here.

Using a one-term Taylor expansion of the local score function \(\partial L_n(\widehat{\rho },\mathbf {z})/\partial \rho \), and following Schervish (1995), p. 422, in writing 0 as \(o_P\left( (nh_n^2)^{-1/2}\right) \), we get

$$\begin{aligned} \partial L_n(\rho _0, \mathbf {z})/\partial \rho + B_{n,h}(\widehat{\rho }_n - \rho _0) = o_P\left( (nh_n^2)^{-1/2}\right) , \end{aligned}$$

where \(B_{n,h} = \partial ^2 L_n(\rho ^*, \mathbf {z})/\partial \rho ^2\), and \(\rho ^*\) lies between \(\rho _0\) and \(\widehat{\rho }\). As \(n\rightarrow \infty \), this quantity tends to its expectation, which we denote by \(J_h\), and is given by

$$\begin{aligned} J_h&= \int K_{\mathbf {h}}(\mathbf {y} - \mathbf {z})u^2(\mathbf {y},\rho ^*(\mathbf {z}))\psi (\mathbf {y},\rho ^*(\mathbf {z}))\,\text {d}\mathbf {y}\nonumber \\&\quad - \int K_{\mathbf {h}}(\mathbf {y}-\mathbf {z})u'(\mathbf {y},\rho ^*(\mathbf {z}))[f(\mathbf {y}) - \psi (\mathbf {y},\rho ^*(\mathbf {z}))]\,\text {d}\mathbf {y}. \end{aligned}$$
(15)

The arguments of Hjort and Jones (1996), as well as the consistency of \(\widehat{\rho }_n\), can be used to see that,

$$\begin{aligned} J = \lim _{\mathbf {h}\rightarrow 0} J_h = u^2(\mathbf {z},\rho _{0,k})\psi (\mathbf {z},\rho _{0,k}). \end{aligned}$$

Further, the variance of \(\sqrt{nh^2}\partial L_n(\rho _0,\mathbf {Z})/\partial \rho \) approaches \(M_h\) as \(n\rightarrow \infty \), where

$$\begin{aligned} M_h&= h_1h_2\int (h_{1}h_{2})^{-2}K^2(\mathbf {h}^{-1}(\mathbf {y} - \mathbf {z}))\\&\quad \times u^2(\mathbf {y},\rho _{k,0}(\mathbf {z})) f(\mathbf {y})\,\text {d}\,\mathbf {y} \\&\quad - h_1h_2\left( \int K_{\mathbf {h}}(\mathbf {y} - \mathbf {z})u(\mathbf {y}, \rho _0(\mathbf {z}))f(\mathbf {y})\,\text {d}\mathbf {y}\right) ^2. \end{aligned}$$

The second term vanishes as \(\mathbf {h}\rightarrow 0\), so we have in the limit that

$$\begin{aligned} M = \lim _{\mathbf {h}\rightarrow 0}M_h = u^2(\mathbf {z},\rho _{0}(\mathbf {z}))f(\mathbf {z})\int K^ 2(\mathbf {y})\,\text {d}\mathbf {y}. \end{aligned}$$

Following the details of Theorem 7.63 in Schervish (1995), it follows that

$$\begin{aligned} \sqrt{nh_n^2}\left( \widehat{\rho }_n - \rho _0\right) \mathop {\rightarrow }\limits ^{\mathcal L} N(0, M/J^2), \end{aligned}$$

provided that the quantity

$$\begin{aligned} Y_n(\mathbf {z}) = \frac{1}{n}\sum _{i=1}^nK_{\mathbf {h}_n}(\mathbf {Z}_i - \mathbf {z})u(\mathbf {Z}_i,\rho _0) = \frac{1}{n}\sum _{i=1}^nV_{ni}, \end{aligned}$$
(16)

is asymptotically normal, and this follows along the lines of Parzen (1962), which we now proceed to establish.

Of the two terms comprising the local likelihood function (4), only the first depends on data. It follows readily from Theorem 1A of Parzen (1962) that the variance of the summands in (16), all identically distributed as \(V_n = K_{\mathbf {h}_n}(\mathbf {Z} - \mathbf {z})u(\mathbf {Z},\rho _0)\), satisfies

$$\begin{aligned} h_n^2\text {Var}(V_n) \rightarrow f_{\mathbf {Z}}(\mathbf {z})u^2(\mathbf {z},\rho _0) \int _{-\infty }^{\infty }K^2(\mathbf {y})\,\text {d}\mathbf {y}. \end{aligned}$$
(17)

Further, a simple Taylor expansion reveals that

$$\begin{aligned} \text {E}|V_n|^{2+\delta }&= \int _{-\infty }^{\infty } \left| K_{\mathbf {h}_n}\left( \mathbf {y}-\mathbf {z}\right) u(\mathbf {y},\rho _0)\right| ^{2+\delta }f(\mathbf {y})\,\text {d}\mathbf {y} \nonumber \\&= \frac{1}{(h_{n1}h_{n2})^{1+\delta }}f_{\mathbf {Z}}(\mathbf {z})|u(\mathbf {z},\rho _0)|^{2+\delta }\nonumber \\&\quad \int _{-\infty }^{\infty }|K(\mathbf {y})|^{2+\delta }\,\text {d}\mathbf {y} \nonumber \\&\quad +\text { higher order terms } \end{aligned}$$
(18)

The quantity in (17) is finite because of assumption (iv) in Theorem 2. Further,

$$\begin{aligned}&\frac{\text {E}\left| V_n - \text {E}(V_n)\right| ^{2+\delta }}{n^{\delta /2}\sigma ^{2+\delta }(V_n)}\nonumber \\&\quad = \frac{(h_{n1}h_{n2})^{1+\delta }\text {E}|V_n - \text {E}(V_n)|^{2+\delta }}{(nh_{n1}h_{n2})^{\delta /2}(h_{n1}h_{n2})^{1+\delta /2}\sigma ^{2+\delta }(V_n)}, \end{aligned}$$
(19)

which tends to zero as \(n\rightarrow \infty \) because of (17), (18) and the second part of assumption (iv), and where, for a stochastic variable X, here and in the sequel we use the notation \(\sigma (X) = \text {sd}(X)\). The summands comprising \(Y_n(\mathbf {z})\) therefore satisfy the Lyapunov, and thus the Lindeberg, condition, so \(Y_n(\mathbf {z})\) is asymptotically normal.

Having established asymptotic normality for each \(\widehat{\rho }_k\) (and proven Theorem 2), we extend the argument above to the p-variate, and thus \(d=p(p-1)/2\)-parameter, case; let \(\mathbf {\rho } = (\rho _1,\ldots ,\rho _d)\) be the vector of local correlations, let \(\mathbf {u}(\mathbf {z},\mathbf {\rho }_0) = (u_1(\mathbf {z},\mathbf {\rho }_0),\ldots ,u_d(\mathbf {z},\mathbf {\rho }_0))\) be the vector of score functions, defined before as \(u_k(\mathbf {z},\mathbf {\rho }) = \partial \psi (\mathbf {z},\mathbf {\rho })/\partial \rho _k\), and, finally, note that \(\mathbf {Y}_n(\mathbf {z}) = n^{-1}\sum _{i=1}^n\mathbf {V}_{ni} \) is now a stochastic vector, so that \(\mathbf {Y}_{n}(\mathbf {z}) = \{Y_{nk}(\mathbf {z})\}_{k=1}^d\) and \(\mathbf {V}_{ni} = \{V_{nik}\}_{k=1}^d\).

We proceed to show that

$$\begin{aligned} \sum _{k}t_kY_{nk}(x) \mathop {\rightarrow }\limits ^{\mathcal {L}} \sum _kt_kZ^*_k, \end{aligned}$$
(20)

where \(\mathbf {t}=(t_1,\ldots ,t_d)\) and \(\mathbf {Z}^* = (Z_1^*,\ldots ,Z_d^*)\) are an arbitrary vector of constants, and a jointly normally distributed vector respectively. Asymptotic normality of the vector \(\mathbf {Y}_n(\mathbf {z})\) then follows from the Cramèr–Wold device (Billingsley 2008, p. 383). First, if \(\mathbf {t}\mathbf {Y}_n(\mathbf {z})\) is asymptotically normal at all, it must converge to \(\mathbf {t}\mathbf {Z}^*\) because of Slutsky’s theorem and the asymptotic normality of each of the \(Y_{nk}\). The normality of \(\mathbf {t}\mathbf {Y}_n(\mathbf {z})\) follows immediately from the one-dimensional case by writing \(W_{ni}=\sum _{k=1}^dt_kV_{nik}\) so that \(\sum _{k=1}^dt_kY_{nk}(\mathbf {z}) = \sum _{i=1}^nW_{ni}\), where all summands are identically distributed as \(W_n = \sum _{k=1}^dt_kK_{\mathbf {h}_n}(\mathbf {Z}-\mathbf {z})u_k(\mathbf {z}, \mathbf {\rho _0}) = \sum _{k=1}^dt_kV_{nk}\). Jensen’s inequality implies \(|\sum _{k=1}^d Z_k|^{2+\delta }\le d^{1+\delta }\sum _{k=1}^d|Z_k|^{2+\delta }\), and so

$$\begin{aligned}&\frac{\text {E}\left| W_n - \text {E}(W_n)\right| ^{2+\delta }}{n^{\delta /2}\sigma ^{2+\delta }(W_n)} = \frac{\text {E}\left| \sum t_kV_{nk} - \text {E}\big (\sum t_kV_{nk}\big )\right| ^{2+\delta }}{n^{\delta /2}\sigma ^{2+\delta }\big (\sum t_kV_{nk}\big )} \nonumber \\&\quad \le d ^{1+\delta } \sum _{k=1}^d\frac{|t_k|^{2+\delta }(h_{n1}h_{n2})^{1+\delta }\text {E}|V_{nk} - \text {E}(V_{nk})|^{2+\delta }}{(nh_{n1}h_{n2})^{\delta /2}(h_{n1}h_{n2})^{1+\delta /2}\sigma ^{2+\delta }\big (\sum t_kV_{nk}\big )}. \end{aligned}$$
(21)

Recall that all variables are on the same Gaussian scale, and that all bandwidths tend to zero at the same rate. Therefore, it does not matter which bandwidths we use in the above expression. Further, the variance in the denominator of (21) stays away from zero because of (17). Following the same reasoning as in the univariate case (19), the Lyapunov condition is satisfied for the \(W_n\), implying (20), and so the vector \(\mathbf {Y}_n(x)\) is jointly asymptotically normal.

It remains to show that the asymptotic covariance matrix is diagonal. Indeed, the covariance between two local correlation estimates with no common index goes to zero as \(n^{-1}\). If they share a common index, one can go through the arguments below once again and see that their covariance \(\text {Cov}(\widehat{\rho }_{ij}, \widehat{\rho }_{jk})\) tends to zero as \((nh_n)^{-1}\). Both rates are negligible compared to \((nh_n^2)^{-1}\).

Assume without loss of generality that we have four variables \(Z_1,\ldots ,Z_4\) with joint density \(f_{\mathbf {Z}}(\mathbf {z})\) and that we estimate the local correlations \(\widehat{\rho }_{12}\) and \(\widehat{\rho }_{34}\) according to the scheme described in Sect. 2. Again, we identify the parameters with single indices, so that we in this case have \(\mathbf {\rho } = (\rho _1, \rho _2)\). They are estimated independently from each other by maximising the local likelihood functions \(L_{n,1}(\rho _{1}, Z_1,Z_2)\) and \(L_{n,2}(\rho _{2}, Z_3,Z_4)\), as defined by Eq. (4). Taylor-expanding the estimation equations \(L_{n,1} = 0\) and \(L_{n,2} = 0\) about the population values \(\rho _{1,0}\) and \(\rho _{2,0}\) respectively, yields

$$\begin{aligned} 0&= \left( \begin{array}{c}\partial L_{n,1}(\widehat{\rho }_{1})/\partial \rho _{1} \\ \partial L_{n,2}(\widehat{\rho }_{2})/\partial \rho _{2} \end{array}\right) = \left( \begin{array}{c} S_1(\widehat{\rho }_{1}) \\ S_2(\widehat{\rho }_{2})\end{array}\right) \\&= \left( \begin{array}{c}S_1(\rho _{1,0}) \\ S_2(\rho _{2,0})\end{array}\right) + \left( \begin{array}{cc} \partial S_1(\rho _1^*)/\partial \rho _{1} &{} 0 \\ 0 &{}\partial S_2(\rho _2^*)/\partial \rho _{2} \end{array}\right) \nonumber \\&\quad \times \,\left( \begin{array}{c}\widehat{\rho }_{1} - \rho _{1,0} \\ \widehat{\rho }_{2} - \rho _{2,0} \end{array}\right) , \end{aligned}$$

where \(\rho _k^*\) again lies between \(\widehat{\rho }_k\) and \(\rho _{k,0}\). More compactly, we write

$$\begin{aligned} (nh^2)^{1/2}(\widehat{\mathbf {\rho }} - \mathbf {\rho }_0) = -\mathbf {U}^{-1}(\mathbf {\rho }^*)(nh_n^2)^{1/2}\mathbf {S}(\mathbf {\rho }_0) \end{aligned}$$

where \(\mathbf {U}\) is the diagonal matrix of derivatives. The non-zero elements in \(\mathbf {U}\) converge, as \(n\rightarrow \infty \) and \(h\rightarrow 0\) , to the quantities \(J_{1}\) and \(J_{2}\), which we have seen to be

$$\begin{aligned} J_k = u_k^2(\mathbf {z}_k,\rho _{0,k})\psi (\mathbf {z}_k,\rho _{0,k}), \quad k = 1,2. \end{aligned}$$

Denote by \(\mathbf {M}_{h}\) the covariance matrix of \(\sqrt{nh^2}\mathbf {S}(\mathbf {\rho }_0)\). The diagonal elements of \(\mathbf {M}_{\mathbf {h}}\) are given by

$$\begin{aligned} M_{k} = u_k^2(\mathbf {z}_k,\rho _{k.0}(\mathbf {z}_k))f_k(\mathbf {z}_k)\int K^ 2(\mathbf {y}_k)\,\text {d}\mathbf {y}_k. \end{aligned}$$

The off-diagonal element in \(\mathbf {M}_{\mathbf {h}}\) is \(O(h^2)\), because

$$\begin{aligned} \mathbf {M}_{\mathbf {h}}^{(1,2)}&= \mathbf {M}_{\mathbf {h}}^{(2,1)} \\&= h^2\int K_{\mathbf {h}}(\mathbf {y}_1-\mathbf {z}_1)K_{\mathbf {h}}(\mathbf {y}_2-\mathbf {z}_2) \\&\quad \times u_1(\mathbf {y_1},\rho _{1,0}(\mathbf {z_1}))u_2(\mathbf {y_2},\rho _{2,0}(\mathbf {z_2})) f_Z(\mathbf {y})\,\text {d}\mathbf {y} \\&\quad -\text {a higher order term}, \end{aligned}$$

Writing \(\mathbf {J}_h = \text {diag}(J_{\mathbf {h},1}, J_{\mathbf {h},2}) \), where \(J_{\mathbf {h},k}\) was defined in (15), we collect these results and write the covariance matrix of \(\sqrt{n\mathbf {h}^2}(\widehat{\rho }_{1},\widehat{\rho }_{2})^T\) in terms of its asymptotic order;

$$\begin{aligned} \mathbf {J}_{\mathbf {h}}^{-1}\mathbf {M}_{\mathbf {h}}(\mathbf {J}_{\mathbf {h}}^{-1})^T&\sim \left( \begin{array}{cc}J_{\mathbf {h},1}^{-1}&{}0\\ 0&{}J_{\mathbf {h},1}^{-1}\end{array}\right) \left( \begin{array}{cc}M_{\mathbf {h},1}&{}h^2\\ h^2&{}M_{\mathbf {h},2}\end{array}\right) \left( \begin{array}{cc}J_{\mathbf {h},1}^{-1}&{}0\\ 0&{}J_{\mathbf {h},2}^{-1}\end{array}\right) \\&\rightarrow \left( \begin{array}{cc}M_1/J_1^2&{}0\\ 0&{}M_2/J_2^2\end{array}\right) , \end{aligned}$$

as \(h\rightarrow 0\), indicating that the asymptotic covariance between \(\widehat{\rho }_{1}\) and \(\widehat{\rho }_{2}\) tends to zero as \(n^{-1}\). The same procedure must be repeated in order to establish \(\text {Cov}(\widehat{\rho }_{ij}, \widehat{\rho }_{jk}) = O((nh)^{-1})\).

1.3 Appendix A.3: Proof of Lemma 1

By inspecting the preceding proof of Theorems 2 and 3, we see that Lemma 1 holds if the asymptotic distribution of \(Y_n(\mathbf {z})\) in (16) remains unchanged when we replace the marginally standard normal observations \(\mathbf {Z}_n\) with their pseudo-observations \(\widehat{\mathbf {Z}}_n\) as defined by (2). Apart from the factor \(u(\cdot )\), this is exactly the same expression as analysed in Proposition 3.1 by Geenens et al. (2014), so we proceed to show that this difference does not alter their proof in any other way than a little more complicated algebraic expressions.

We have assumed the bivariate kernel function to be the product of two univariate kernels, so write in this section \(K(\mathbf {z})=K(z_1)K(z_2)\), even though that is a slight abuse of notation. Further, and following the notation of Geenens et al. (2014), write

$$\begin{aligned} J_{\mathbf {z},h}(\mathbf {v})&= K\left( \frac{z_1 - \varPhi ^{-1}(v_1)}{h}\right) K\left( \frac{z_2 - \varPhi ^{-1}(v_2)}{h}\right) \\&\quad \times u(\varPhi ^{-1}(\mathbf {v})), \end{aligned}$$

where \(\mathbf {v} = (v_1,v_2) \in (0,1)^2\). Thus, writing \(k(z)=K'(z)\) and \(u_i(\mathbf {z})=\partial u(\mathbf {z})/\partial z_i\), we have

$$\begin{aligned} \frac{\partial J_{\mathbf {z},h}}{\partial v_1}&= k\left( \frac{z_1 - \varPhi ^{-1}(v_1)}{h}\right) \nonumber \\&\quad \times K\left( \frac{z_2 - \varPhi ^{-1}(v_2)}{h}\right) \frac{u(\varPhi ^{-1}(\mathbf {v}))}{h\phi (\varPhi ^{-1}(v_1))} \nonumber \\&\quad + K\left( \frac{z_1 - \varPhi ^{-1}(v_1)}{h}\right) \nonumber \\&\quad \times K\left( \frac{z_2 - \varPhi ^{-1}(v_2)}{h}\right) \frac{u_1(\varPhi ^{-1}(\mathbf {v}))}{\phi (\varPhi ^{-1}(v_1))}, \\ \frac{\partial J_{\mathbf {z},h}}{\partial v_2}&= K\left( \frac{z_1 - \varPhi ^{-1}(v_1)}{h}\right) \nonumber \\&\quad \times k\left( \frac{z_2 - \varPhi ^{-1}(v_2)}{h}\right) \frac{u(\varPhi ^{-1}(\mathbf {v}))}{h\phi (\varPhi ^{-1}(v_1))} \nonumber \\&\quad + K\left( \frac{z_1 - \varPhi ^{-1}(v_1)}{h}\right) \nonumber \\&\quad \times K\left( \frac{z_2 - \varPhi ^{-1}(v_2)}{h}\right) \frac{u_2(\varPhi ^{-1}(\mathbf {v}))}{\phi (\varPhi ^{-1}(v_1))}, \end{aligned}$$

which means that the expressions for \(R_n(\mathbf {z})\), \(B_{n,1}(\mathbf {z})\) and \(B_{n,2}(\mathbf {z})\), as defined by Geenens et al. (2014), in our case have four terms instead of just one, resulting from the multiplications of \(\frac{\partial J_{\mathbf {z},h}}{\partial v_1}\) and \(\frac{\partial J_{\mathbf {z},h}}{\partial v_2}\). We will not write any more details here, because that will necessitate a much bigger body of notation. Straightforward algebra, however, exploiting the boundedness of \(K(\mathbf {z})\), \(k(\mathbf {z})\) [from assumption (iv)], and the boundedness of \(u(\mathbf {z})\) [defined in (14)] for a fixed \(\mathbf {z}\), will reveal that each of our terms is smaller than a constant times the corresponding term in Geenens et al. (2014), which is enough to prove the result.

1.4 Appendix A.4: Proof of Theorem 4

It follows from Lemma 1 and the delta method that \(\widehat{f}_{\mathbf {Z}}(\mathbf {z})\) is asymptotically normal. It remains to show that the asymptotic normality still holds after the final back-transformation (8), with suitable estimates for the marginal density and distribution functions. Under assumption (viii), the normalised estimates \(\sqrt{nh^2}\left( \widehat{f}_i(x_i) - f_i(x_i)\right) \) and \(\sqrt{nh^2}\left( \widehat{F}_i(x_i) - F_i(x_i)\right) \) both converge in distribution to the constant 0, which again implies \(\widehat{f}_i(x_i) - f_i(x_i) = o_P(1)\), and \(\widehat{F}_i(x_i) - F_i(x_i) = o_P(1)\). It follows that

$$\begin{aligned} \phi \left( \varPhi ^{-1}(\widehat{F}_i(x_i))\right)&= \phi \left( \varPhi ^{-1}(F_i(x_i))\right) \\&\quad + \phi '\left( \varPhi ^{-1}(F_i(x_i))\right) \nonumber \\&\quad \times \left[ \varPhi ^{-1}\right] '(F_i(x_i))(\widehat{F}_i(x_i) - F_i(x_i)) \\&\quad + \text {higher order terms} \end{aligned}$$

where the second term is \(o_P(1)\) in all x such that \(F(x) \in (0,1)\). We can then write

$$\begin{aligned} \frac{\widehat{f}_i(x_i)}{\phi \left( \varPhi ^{-1}(\widehat{F}_i(x_i))\right) }&= \frac{f_i(x_i) + o_P(1)}{\phi \left( \varPhi ^{-1}(F_i(x_i))\right) } \\&\quad + \left( 1 - \frac{o_P(1)}{\phi \left( \varPhi ^{-1}(F_i(x_i))\right) } + \cdots \right) \\&= \frac{f_i(x_i)}{\phi \left( \varPhi ^{-1}(F_i(x_i))\right) } + o_P(1), \end{aligned}$$

from which it follows that

$$\begin{aligned} \prod _{i=1}^p \frac{\widehat{f}_i(x_i)}{\phi \left( \varPhi ^{-1}(\widehat{F}_i(x_i))\right) } = \prod _{i=1}^p \frac{f_i(x_i)}{\phi \left( \varPhi ^{-1}(F_i(x_i))\right) } + o_P(1). \end{aligned}$$

By Slutsky’s Theorem, we have that \(\widehat{f}(\mathbf {x})\) as defined by Eq. (8) is asymptotically normal. The expression for the asymptotic variance of the density estimate follows from Theorem 3, Lemma 1, and the delta method applied to the asymptotic covariance matrix of the local correlations (11), using the function (1).

1.5 Appendix A.5: Proof of Theorem 5

Stone (1990) provides large sample theory for logspline density estimates. The asymptotic bias is shown to be asymptotically negligible provided that the true density is twice continuously differentiable. He shows further that \(\widehat{f}_i\) is asymptotically normal with asymptotic variance of order \(O(n^{-(0.5 + \epsilon )})\), where \(\epsilon \in (0,1/2)\) is a tuning parameter that controls the rate at which new nodes are added in the logspline model. Stone (1990) develops theory for compactly supported densities only, but we proceed using a truncation argument to show that the asymptotic normality holds equally well for densities \(f_i\) satisfying \(f_i = o(|z|^{-(5/2 + \gamma )})\), where \(2\epsilon /(1 - 2\epsilon )\) (see below) is close to zero if \(\epsilon \) is small. If \(\epsilon \rightarrow 1/2\), then \(k\rightarrow \infty \), which is intuitively reasonable because the number of nodes will increase very slowly, meaning that the probability of extreme observations beyond the smallest and largest node must necessarily be small.

Denote by \(J_n\) the number of nodes that are used to fit the logspline model to f(z) based on iid observations \(Z_1, \ldots , Z_n\). Stone (1990) assume \(J_n=o(n^{0.5-\epsilon })\). Construct a sequence \(L_n\), \(n = 1, 2, \ldots \) that is \(o(J_n)\), and define a new stochastic variable by truncation;

$$\begin{aligned} Z^{(n)} = Z_n1_{|Z_n| \le L_n} + U_n1_{|Z_n|>L_n}, \end{aligned}$$

where \(U_n\) is uniformly distributed on \([-L_n, L_n]\). Let \(c_n = \int _{|z|\le L_n}f(z)\,\text {d}z\), then the density of \(Z^{(n)}\) is given by

$$\begin{aligned} f^{(n)}(z) = c_n^{-1}f(z)1_{|z|\le L_n}. \end{aligned}$$

The index i in \(f_i=f_i(z_i)\) is not important here, and will be omitted. Let \(\widehat{f}_n\) be the logspline estimate of f based on \(Z_1, \ldots , Z_n\), and let \(\widehat{f}_n^{(n)}\) be the logspline estimate of \(f^{(n)}\) based on \(Z_1^{(n)}, \ldots , Z_n^{(n)}\). We wish to show that \(\sqrt{n^{0.5 + \epsilon }}(\widehat{f}(z) - f(z))\) is asymptotically normal, and make the following decomposition:

$$\begin{aligned} \sqrt{n^{0.5+\epsilon }}(\widehat{f}(z) - f(z))&= \sqrt{n^{0.5 + \epsilon }}\bigg ( \left\{ \widehat{f}(z) - \widehat{f}^{(n)}(z)\right\} \nonumber \\&\quad + \left\{ \widehat{f}^{(n)}(z) - f^{(n)}(z)\right\} \nonumber \\&\quad + \left\{ f^{(n)}(z) - f(z)\right\} \bigg ). \end{aligned}$$
(22)

The first parenthesis converges in probability to zero provided that the tails of f are not too heavy. To see this, assume that there exists a \(z_0>0\) such that \(f(z)<M_1|z|^{-s}\) for all \(|z|>z_0\) and some constant \(M_1\). It follows from elementary calculus that

$$\begin{aligned} P\left( \left| \widehat{f}(z) - \widehat{f}^{(n)}(z)\right| >0\right)&\le 1-(P(|Z|\le L_n))^n \\&\le 1- (1- M_2L_n^{1-s})^n, \end{aligned}$$

for a new constant \(M_2\). We have from a Taylor expansion that the right-hand side is \(O(n^{(1/2-\epsilon )(1-s) + 1})\) since \(L_n = o(J_n) = o(n^{0.5-\epsilon })\), and so balancing this with the convergence rate in the normal approximation, \(n^{1/4+\epsilon /2}\), yields \(s^* = 5/2 + 2\epsilon /(1-2\epsilon )\) as the limiting value for s. Thus \(\gamma = 2\epsilon /(1-2\epsilon )\).

It follows easily that the third parenthesis in (22) converges to zero if we assume that \(f=o(|z|^{-s^*})\).

To see that the middle parenthesis in (22) is asymptotically normal, choose an arbitrary constant \(T>0\). Then there exists a positive integer N such that \([-T,T]\subset [-L_n,L_n]\) for all \(n>N\). Make a new decomposition:

$$\begin{aligned}&\sqrt{n^{0.5+\epsilon }}\bigg (\left\{ \widehat{f}^{(n)}(z) - \widehat{f}^{(T)}(z)\right\} + \left\{ \widehat{f}^{(T)}(z) - f^{(T)}(z)\right\} \nonumber \\&\qquad \qquad \qquad +\left\{ f^{(T)}(z) - f^{(n)}(z)\right\} \bigg ) \end{aligned}$$
(23)

If we let \(n\rightarrow \infty \) we have on the right-hand side, and for any T according to the theory by Stone (1990), an asymptotically normally distributed variable in the middle. The first and third parentheses in (23) can be made arbitrarily small by choosing a large enough T. It follows from Slutsky’s Theorem that the logspline estimates of the marginal densities are asymptotically normal with convergence rate \(\sqrt{n^{1/2 + \epsilon }}\), provided their tails are thinner than those of \(|z|^{-s^*}\).

The exact same argument as above goes through for the marginal distribution and quantile functions as well, with convergence rate being equal to the usual \(\sqrt{n}\).

Appendix B: Supplementary material

The supplementary file “code.zip” contains the data set that was used in Sect. 6, and necessary computer code for calculating the locally Gaussian density estimates, written in the R programming language (R Core Team 2015).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Otneim, H., Tjøstheim, D. The locally Gaussian density estimator for multivariate data. Stat Comput 27, 1595–1616 (2017). https://doi.org/10.1007/s11222-016-9706-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-016-9706-6

Keywords

Navigation