Abstract
It is well known that the Curse of Dimensionality causes the standard Kernel Density Estimator to break down quickly as the number of variables increases. In non-parametric regression, this effect is relieved in various ways, for example by assuming additivity or some other simplifying structure on the interaction between variables. This paper presents the Locally Gaussian Density Estimator (LGDE), which introduces a similar idea to the problem of density estimation. The LGDE is a new method for the non-parametric estimation of multivariate probability density functions. It is based on preliminary transformations of the marginal observation vectors towards standard normality, and a simplified local likelihood fit of the resulting distribution with standard normal marginals. The LGDE is introduced, and asymptotic theory is derived. In particular, it is shown that the LGDE converges at a speed that does not depend on the dimension. Examples using real and simulated data confirm that the new estimator performs very well on finite sample sizes.
Similar content being viewed by others
References
Aas, K., Czado, C., Frigessi, A., Bakken, H.: Pair-copula constructions of multiple dependence. Insur. Math. Econ. 44(2), 182–198 (2009)
Andrews, D.W.K.: Generic uniform convergence. Econom. Theory 8(2), 241–257 (1992)
Azzalini, A.: The skew-normal distribution and related multivariate families. Scand. J. Stat. 32(2), 159–188 (2005)
Bedford, T., Cooke, R.M.: Vines: a new graphical model for dependent random variables. Ann. Stat. 30(4), 1031–1068 (2002)
Berentsen, G.D., Tjøstheim, D.: Recognizing and visualizing departures from independence in bivariate data using local Gaussian correlation. Stat. Comput. 24(5), 785–801 (2014)
Berentsen, G.D., Støve, B., Tjøstheim, D., Nordbø, T.: Recognizing and visualizing copulas: an approach using local Gaussian approximation. Insur. Math. Econ. 57, 90–103 (2014)
Billingsley, Patrick: Probability and Measure. Wiley, Oxford (2008)
Bowman, A.W.: An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2), 353–360 (1984)
Friedman, J.H., Stuetzle, W., Schroeder, A.: Projection pursuit density estimation. J. Am. Stat. Assoc. 79(387), 599–608 (1984)
Geenens, G.: Probit transformation for kernel density estimation on the unit interval. J. Am. Stat. Assoc. 109(505), 346–358 (2014)
Geenens, G., Charpentier, A., Paindaveine, D.: Probit transformation for nonparametric kernel estimation of the copula density. arXiv:1404.4414 (preprint) (2014)
Genest, C., Segers, J.: On the covariance of the asymptotic empirical copula process. J. Multivar. Anal. 101(8), 1837–1845 (2010)
Hall, P.: On Kullback–Leibler loss and density estimation. Ann. Stat. 15(4), 1491–1519 (1987)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Data Mining, Inference, and Prediction. Springer, Berlin (2009)
Hjort, N.L., Glad, I.K.: Nonparametric density estimation with a parametric start. Ann. Stat. 23(3), 882–904 (1995)
Hjort, N.L., Jones, M.C.: Locally parametric nonparametric density estimation. Ann. Stat. 24(4), 1619–1647 (1996)
Hwang, J.-N., Lay, S.-R., Lippman, A.: Nonparametric multivariate density estimation: a comparative study. Signal Process. IEEE Trans. 42(10), 2795–2810 (1994)
Jones, M.C.: Simple boundary correction for kernel density estimation. Stat. Comput. 3(3), 135–146 (1993)
Loader, C.R.: Local likelihood density estimation. Ann. Stat. 24(4), 1602–1618 (1996)
Mikosch, T.: Copulas: tales and facts. Extremes 9(1), 3–20 (2006)
Nagler, T., Czado, C.: Evading the curse of dimensionality in nonparametric density estimation with simplified vine copulas. J. Multivar. Anal. 151, 69–89 (2016)
Newey, W.K.: Uniform convergence in probability and stochastic equicontinuity. Econometrica 59(4), 1161–1167 (1991)
Otneim, H., Karlsen, H.A., Tjøstheim, D.: Bias and bandwidth for local likelihood density estimation. Stat. Probab. Lett. 83(5), 1382–1387 (2013)
Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)
Politis, D.N., Romano, J.P.: Multivariate density estimation with general flat-top kernels of infinite order. J. Multivar. Anal. 68(1), 1–25 (1999)
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ (2015)
Ruppert, D., Cline, D.B.H.: Bias reduction in kernel density estimation by smoothed empirical transformations. Ann. Stat. 22(1), 185–210 (1994)
Schervish, M.J.: Theory of Statistics. Springer, Berlin (1995)
Severini, T. A.: Likelihood Methods in Statistics. Oxfordscience publications. Oxford University Press. ISBN9780198506508 (2000)
Silverman, B.W.: Density Estimation for Statistics and Data Analysis, vol. 26. CRC Press, London (1986)
Sklar, A.: Fonctions de répartition à n dimensions et leurs marges. Université Paris 8 (1959)
Sperlich, S., Tjøstheim, D., Yang, L.: Nonparametric estimation and testing of interaction in additive models. Econom. Theory 18(02), 197–251 (2002)
Stone, C.J.: An asymptotically optimal window selection rule for kernel density estimates. Ann. Stat. 12(4), 1285–1297 (1984)
Stone, C.J.: Large-sample inference for log-spline models. Ann. Stat. 18(2), 717–741 (1990)
Stone, C.J., Hansen, M.H., Kooperberg, C., Truong, Y.K., et al.: Polynomial splines and their tensor products in extended linear modeling: 1994 Wald Memorial Lecture. Ann. Stat. 25(4), 1371–1470 (1997)
Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B (Methodological) 36(2), 111–147 (1974)
Støve, B., Tjøstheim, D., Hufthammer, K.O.: Using local Gaussian correlation in a nonlinear re-examination of financial contagion. J. Empir. Fin. 25, 62–82 (2014)
Tjøstheim, D., Hufthammer, K.O.: Local Gaussian correlation: a new measure of dependence. J. Econom. 172(1), 33–48 (2013)
Van der Vaart, A.W.: Asymptotic Statistics, vol. 3. Cambridge University Press, Cambridge (2000)
Wand, M.P., Jones, M.C.: Multivariate plug-in bandwidth selection. Comput. Stat. 9(2), 97–116 (1994)
Wand, M.P., Marron, J.S., Ruppert, D.: Transformations in density estimation. J. Am. Stat. Assoc. 86(414), 343–353 (1991)
Acknowledgments
The authors gratefully acknowledge Bård Støve for supplying the data set, and Kjersti Aas for valuable comments in the preparation of this paper. The authors would also like to thank two anonymous referees, whose comments have greatly improved this paper.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
Appendix A: Proofs
1.1 A.1: Proof of Theorem 1
The method of proof is the same as that of Severini (2000), pp. 105–107 for ordinary maximum likelihood estimates. The proof requires the additional assumption of uniform convergence in probability of the local likelihood function:
The bivariate version of (6) satisfies condition (13) provided that condition (ii) is fulfilled. To see this, consider \(\psi (\cdot ,\rho )\) as a function of the parameter; it is bounded and differentiable to any order on the compact set \(\varTheta \), and so is its logarithm. Thus \(g(\rho ) = \log \psi (\cdot ,\rho )f(\cdot ) - \psi (\cdot ;\rho )\) is uniformly continuous there, so for every \(\epsilon >0\) there exists a \(\delta >0\) such that if \(|\rho _1-\rho _2|<\delta \) then \(|g(\rho _1)-g(\rho _2)|<\epsilon \). Multiplying with a kernel and integrating over a different variable conserves this property, because if \(|\rho _1-\rho _2|<\delta \), then
The \(\epsilon \) and \(\delta \) do not depend on \(\mathbf {h}\) nor n, so \(\{Q_{\mathbf {h}_n,K}(\rho )\}\) form an equicontinuous family of functions. Further, and again exploiting the smoothness of \(\psi (\cdot ,\rho )\) on a compact set \(\varTheta \), the local likelihood functions are Lipschitz continuous there by the mean value theorem. The conditions in Corollary 2.2 by Newey (1991) are thus satisfied, and condition (13) follows thereof. It follows from the uniform convergence that
The rest of the argument follows exactly that of Severini (2000), pp. 105–107 for ordinary maximum likelihood estimates.
1.2 A.2: Proof of Theorems 2 and 3
We establish joint asymptotic normality of the local correlation vector by first following the standard argument for ordinary maximum likelihood estimates in the bivariate, and thus one-parameter case, and then apply a central limit argument, which amounts to a proof of Theorem 2. Then we make use of the Cramèr–Wold device to include the multi-parameter case. In the end, we show that the off-diagonal elements in the covariance matrix vanish asymptotically. In the bivariate case, we must verify the following conditions in order to use Theorem 7.63 in Schervish (1995) and Theorem 1A of Parzen (1962):
-
(I)
The parametric family \(\psi (\mathbf {z},\rho )\) is continuously differentiable with respect to \(\rho \);
-
(II)
\(\int |u(\mathbf {z},\rho _0) f(\mathbf {z})|<\infty \);
-
(III)
There exists a function \(T_r(\mathbf {z},\rho )\) such that for each \(\rho _0\in \text {int}(\varTheta )\) and,
$$\begin{aligned} \sup _{|\rho -\rho _0|\le r} \left| \partial ^2 L_n(\rho _0, \mathbf {z})/\partial \rho ^2 - \partial ^2L_n(\rho ,\mathbf {z})/\partial \rho ^2 \right| \le T_r(\mathbf {z},\rho _0) \end{aligned}$$with \(\lim _{r\rightarrow 0}\text {E}T_r(\mathbf {Z},\rho _0) = 0\) (stochastic equicontinuity).
The parametric family is Gaussian, so condition (I) is obviously true. The local score function \(u(\mathbf {z},\rho ) = \partial \log \psi (\mathbf {z},\rho )/\partial \rho \) in the bivariate Gaussian case is given by
and the stochastic variable \(\mathbf {Z} = (Z_1,Z_ 2)\), having density \(f_{\mathbf {Z}}\), has moments of all orders since the marginals are standard normal. Therefore, E\(|u(\mathbf {Z},\rho )|<\infty \), so (II) is satisfied. Further, Andrews (1992) shows that uniform continuity of \(\partial ^2L_n(\rho )/\partial \rho ^2\) as well as Lipschitz continuity of \(|\partial ^2L_n(\rho ,\mathbf {z})/\partial \rho ^2 - \partial ^2L(\rho _0,\mathbf {z})/\partial \rho ^2|\) suffice for stochastic equicontinuity as required in condition (III). The argument in Appendix A.1 goes through also here.
Using a one-term Taylor expansion of the local score function \(\partial L_n(\widehat{\rho },\mathbf {z})/\partial \rho \), and following Schervish (1995), p. 422, in writing 0 as \(o_P\left( (nh_n^2)^{-1/2}\right) \), we get
where \(B_{n,h} = \partial ^2 L_n(\rho ^*, \mathbf {z})/\partial \rho ^2\), and \(\rho ^*\) lies between \(\rho _0\) and \(\widehat{\rho }\). As \(n\rightarrow \infty \), this quantity tends to its expectation, which we denote by \(J_h\), and is given by
The arguments of Hjort and Jones (1996), as well as the consistency of \(\widehat{\rho }_n\), can be used to see that,
Further, the variance of \(\sqrt{nh^2}\partial L_n(\rho _0,\mathbf {Z})/\partial \rho \) approaches \(M_h\) as \(n\rightarrow \infty \), where
The second term vanishes as \(\mathbf {h}\rightarrow 0\), so we have in the limit that
Following the details of Theorem 7.63 in Schervish (1995), it follows that
provided that the quantity
is asymptotically normal, and this follows along the lines of Parzen (1962), which we now proceed to establish.
Of the two terms comprising the local likelihood function (4), only the first depends on data. It follows readily from Theorem 1A of Parzen (1962) that the variance of the summands in (16), all identically distributed as \(V_n = K_{\mathbf {h}_n}(\mathbf {Z} - \mathbf {z})u(\mathbf {Z},\rho _0)\), satisfies
Further, a simple Taylor expansion reveals that
The quantity in (17) is finite because of assumption (iv) in Theorem 2. Further,
which tends to zero as \(n\rightarrow \infty \) because of (17), (18) and the second part of assumption (iv), and where, for a stochastic variable X, here and in the sequel we use the notation \(\sigma (X) = \text {sd}(X)\). The summands comprising \(Y_n(\mathbf {z})\) therefore satisfy the Lyapunov, and thus the Lindeberg, condition, so \(Y_n(\mathbf {z})\) is asymptotically normal.
Having established asymptotic normality for each \(\widehat{\rho }_k\) (and proven Theorem 2), we extend the argument above to the p-variate, and thus \(d=p(p-1)/2\)-parameter, case; let \(\mathbf {\rho } = (\rho _1,\ldots ,\rho _d)\) be the vector of local correlations, let \(\mathbf {u}(\mathbf {z},\mathbf {\rho }_0) = (u_1(\mathbf {z},\mathbf {\rho }_0),\ldots ,u_d(\mathbf {z},\mathbf {\rho }_0))\) be the vector of score functions, defined before as \(u_k(\mathbf {z},\mathbf {\rho }) = \partial \psi (\mathbf {z},\mathbf {\rho })/\partial \rho _k\), and, finally, note that \(\mathbf {Y}_n(\mathbf {z}) = n^{-1}\sum _{i=1}^n\mathbf {V}_{ni} \) is now a stochastic vector, so that \(\mathbf {Y}_{n}(\mathbf {z}) = \{Y_{nk}(\mathbf {z})\}_{k=1}^d\) and \(\mathbf {V}_{ni} = \{V_{nik}\}_{k=1}^d\).
We proceed to show that
where \(\mathbf {t}=(t_1,\ldots ,t_d)\) and \(\mathbf {Z}^* = (Z_1^*,\ldots ,Z_d^*)\) are an arbitrary vector of constants, and a jointly normally distributed vector respectively. Asymptotic normality of the vector \(\mathbf {Y}_n(\mathbf {z})\) then follows from the Cramèr–Wold device (Billingsley 2008, p. 383). First, if \(\mathbf {t}\mathbf {Y}_n(\mathbf {z})\) is asymptotically normal at all, it must converge to \(\mathbf {t}\mathbf {Z}^*\) because of Slutsky’s theorem and the asymptotic normality of each of the \(Y_{nk}\). The normality of \(\mathbf {t}\mathbf {Y}_n(\mathbf {z})\) follows immediately from the one-dimensional case by writing \(W_{ni}=\sum _{k=1}^dt_kV_{nik}\) so that \(\sum _{k=1}^dt_kY_{nk}(\mathbf {z}) = \sum _{i=1}^nW_{ni}\), where all summands are identically distributed as \(W_n = \sum _{k=1}^dt_kK_{\mathbf {h}_n}(\mathbf {Z}-\mathbf {z})u_k(\mathbf {z}, \mathbf {\rho _0}) = \sum _{k=1}^dt_kV_{nk}\). Jensen’s inequality implies \(|\sum _{k=1}^d Z_k|^{2+\delta }\le d^{1+\delta }\sum _{k=1}^d|Z_k|^{2+\delta }\), and so
Recall that all variables are on the same Gaussian scale, and that all bandwidths tend to zero at the same rate. Therefore, it does not matter which bandwidths we use in the above expression. Further, the variance in the denominator of (21) stays away from zero because of (17). Following the same reasoning as in the univariate case (19), the Lyapunov condition is satisfied for the \(W_n\), implying (20), and so the vector \(\mathbf {Y}_n(x)\) is jointly asymptotically normal.
It remains to show that the asymptotic covariance matrix is diagonal. Indeed, the covariance between two local correlation estimates with no common index goes to zero as \(n^{-1}\). If they share a common index, one can go through the arguments below once again and see that their covariance \(\text {Cov}(\widehat{\rho }_{ij}, \widehat{\rho }_{jk})\) tends to zero as \((nh_n)^{-1}\). Both rates are negligible compared to \((nh_n^2)^{-1}\).
Assume without loss of generality that we have four variables \(Z_1,\ldots ,Z_4\) with joint density \(f_{\mathbf {Z}}(\mathbf {z})\) and that we estimate the local correlations \(\widehat{\rho }_{12}\) and \(\widehat{\rho }_{34}\) according to the scheme described in Sect. 2. Again, we identify the parameters with single indices, so that we in this case have \(\mathbf {\rho } = (\rho _1, \rho _2)\). They are estimated independently from each other by maximising the local likelihood functions \(L_{n,1}(\rho _{1}, Z_1,Z_2)\) and \(L_{n,2}(\rho _{2}, Z_3,Z_4)\), as defined by Eq. (4). Taylor-expanding the estimation equations \(L_{n,1} = 0\) and \(L_{n,2} = 0\) about the population values \(\rho _{1,0}\) and \(\rho _{2,0}\) respectively, yields
where \(\rho _k^*\) again lies between \(\widehat{\rho }_k\) and \(\rho _{k,0}\). More compactly, we write
where \(\mathbf {U}\) is the diagonal matrix of derivatives. The non-zero elements in \(\mathbf {U}\) converge, as \(n\rightarrow \infty \) and \(h\rightarrow 0\) , to the quantities \(J_{1}\) and \(J_{2}\), which we have seen to be
Denote by \(\mathbf {M}_{h}\) the covariance matrix of \(\sqrt{nh^2}\mathbf {S}(\mathbf {\rho }_0)\). The diagonal elements of \(\mathbf {M}_{\mathbf {h}}\) are given by
The off-diagonal element in \(\mathbf {M}_{\mathbf {h}}\) is \(O(h^2)\), because
Writing \(\mathbf {J}_h = \text {diag}(J_{\mathbf {h},1}, J_{\mathbf {h},2}) \), where \(J_{\mathbf {h},k}\) was defined in (15), we collect these results and write the covariance matrix of \(\sqrt{n\mathbf {h}^2}(\widehat{\rho }_{1},\widehat{\rho }_{2})^T\) in terms of its asymptotic order;
as \(h\rightarrow 0\), indicating that the asymptotic covariance between \(\widehat{\rho }_{1}\) and \(\widehat{\rho }_{2}\) tends to zero as \(n^{-1}\). The same procedure must be repeated in order to establish \(\text {Cov}(\widehat{\rho }_{ij}, \widehat{\rho }_{jk}) = O((nh)^{-1})\).
1.3 Appendix A.3: Proof of Lemma 1
By inspecting the preceding proof of Theorems 2 and 3, we see that Lemma 1 holds if the asymptotic distribution of \(Y_n(\mathbf {z})\) in (16) remains unchanged when we replace the marginally standard normal observations \(\mathbf {Z}_n\) with their pseudo-observations \(\widehat{\mathbf {Z}}_n\) as defined by (2). Apart from the factor \(u(\cdot )\), this is exactly the same expression as analysed in Proposition 3.1 by Geenens et al. (2014), so we proceed to show that this difference does not alter their proof in any other way than a little more complicated algebraic expressions.
We have assumed the bivariate kernel function to be the product of two univariate kernels, so write in this section \(K(\mathbf {z})=K(z_1)K(z_2)\), even though that is a slight abuse of notation. Further, and following the notation of Geenens et al. (2014), write
where \(\mathbf {v} = (v_1,v_2) \in (0,1)^2\). Thus, writing \(k(z)=K'(z)\) and \(u_i(\mathbf {z})=\partial u(\mathbf {z})/\partial z_i\), we have
which means that the expressions for \(R_n(\mathbf {z})\), \(B_{n,1}(\mathbf {z})\) and \(B_{n,2}(\mathbf {z})\), as defined by Geenens et al. (2014), in our case have four terms instead of just one, resulting from the multiplications of \(\frac{\partial J_{\mathbf {z},h}}{\partial v_1}\) and \(\frac{\partial J_{\mathbf {z},h}}{\partial v_2}\). We will not write any more details here, because that will necessitate a much bigger body of notation. Straightforward algebra, however, exploiting the boundedness of \(K(\mathbf {z})\), \(k(\mathbf {z})\) [from assumption (iv)], and the boundedness of \(u(\mathbf {z})\) [defined in (14)] for a fixed \(\mathbf {z}\), will reveal that each of our terms is smaller than a constant times the corresponding term in Geenens et al. (2014), which is enough to prove the result.
1.4 Appendix A.4: Proof of Theorem 4
It follows from Lemma 1 and the delta method that \(\widehat{f}_{\mathbf {Z}}(\mathbf {z})\) is asymptotically normal. It remains to show that the asymptotic normality still holds after the final back-transformation (8), with suitable estimates for the marginal density and distribution functions. Under assumption (viii), the normalised estimates \(\sqrt{nh^2}\left( \widehat{f}_i(x_i) - f_i(x_i)\right) \) and \(\sqrt{nh^2}\left( \widehat{F}_i(x_i) - F_i(x_i)\right) \) both converge in distribution to the constant 0, which again implies \(\widehat{f}_i(x_i) - f_i(x_i) = o_P(1)\), and \(\widehat{F}_i(x_i) - F_i(x_i) = o_P(1)\). It follows that
where the second term is \(o_P(1)\) in all x such that \(F(x) \in (0,1)\). We can then write
from which it follows that
By Slutsky’s Theorem, we have that \(\widehat{f}(\mathbf {x})\) as defined by Eq. (8) is asymptotically normal. The expression for the asymptotic variance of the density estimate follows from Theorem 3, Lemma 1, and the delta method applied to the asymptotic covariance matrix of the local correlations (11), using the function (1).
1.5 Appendix A.5: Proof of Theorem 5
Stone (1990) provides large sample theory for logspline density estimates. The asymptotic bias is shown to be asymptotically negligible provided that the true density is twice continuously differentiable. He shows further that \(\widehat{f}_i\) is asymptotically normal with asymptotic variance of order \(O(n^{-(0.5 + \epsilon )})\), where \(\epsilon \in (0,1/2)\) is a tuning parameter that controls the rate at which new nodes are added in the logspline model. Stone (1990) develops theory for compactly supported densities only, but we proceed using a truncation argument to show that the asymptotic normality holds equally well for densities \(f_i\) satisfying \(f_i = o(|z|^{-(5/2 + \gamma )})\), where \(2\epsilon /(1 - 2\epsilon )\) (see below) is close to zero if \(\epsilon \) is small. If \(\epsilon \rightarrow 1/2\), then \(k\rightarrow \infty \), which is intuitively reasonable because the number of nodes will increase very slowly, meaning that the probability of extreme observations beyond the smallest and largest node must necessarily be small.
Denote by \(J_n\) the number of nodes that are used to fit the logspline model to f(z) based on iid observations \(Z_1, \ldots , Z_n\). Stone (1990) assume \(J_n=o(n^{0.5-\epsilon })\). Construct a sequence \(L_n\), \(n = 1, 2, \ldots \) that is \(o(J_n)\), and define a new stochastic variable by truncation;
where \(U_n\) is uniformly distributed on \([-L_n, L_n]\). Let \(c_n = \int _{|z|\le L_n}f(z)\,\text {d}z\), then the density of \(Z^{(n)}\) is given by
The index i in \(f_i=f_i(z_i)\) is not important here, and will be omitted. Let \(\widehat{f}_n\) be the logspline estimate of f based on \(Z_1, \ldots , Z_n\), and let \(\widehat{f}_n^{(n)}\) be the logspline estimate of \(f^{(n)}\) based on \(Z_1^{(n)}, \ldots , Z_n^{(n)}\). We wish to show that \(\sqrt{n^{0.5 + \epsilon }}(\widehat{f}(z) - f(z))\) is asymptotically normal, and make the following decomposition:
The first parenthesis converges in probability to zero provided that the tails of f are not too heavy. To see this, assume that there exists a \(z_0>0\) such that \(f(z)<M_1|z|^{-s}\) for all \(|z|>z_0\) and some constant \(M_1\). It follows from elementary calculus that
for a new constant \(M_2\). We have from a Taylor expansion that the right-hand side is \(O(n^{(1/2-\epsilon )(1-s) + 1})\) since \(L_n = o(J_n) = o(n^{0.5-\epsilon })\), and so balancing this with the convergence rate in the normal approximation, \(n^{1/4+\epsilon /2}\), yields \(s^* = 5/2 + 2\epsilon /(1-2\epsilon )\) as the limiting value for s. Thus \(\gamma = 2\epsilon /(1-2\epsilon )\).
It follows easily that the third parenthesis in (22) converges to zero if we assume that \(f=o(|z|^{-s^*})\).
To see that the middle parenthesis in (22) is asymptotically normal, choose an arbitrary constant \(T>0\). Then there exists a positive integer N such that \([-T,T]\subset [-L_n,L_n]\) for all \(n>N\). Make a new decomposition:
If we let \(n\rightarrow \infty \) we have on the right-hand side, and for any T according to the theory by Stone (1990), an asymptotically normally distributed variable in the middle. The first and third parentheses in (23) can be made arbitrarily small by choosing a large enough T. It follows from Slutsky’s Theorem that the logspline estimates of the marginal densities are asymptotically normal with convergence rate \(\sqrt{n^{1/2 + \epsilon }}\), provided their tails are thinner than those of \(|z|^{-s^*}\).
The exact same argument as above goes through for the marginal distribution and quantile functions as well, with convergence rate being equal to the usual \(\sqrt{n}\).
Appendix B: Supplementary material
The supplementary file “code.zip” contains the data set that was used in Sect. 6, and necessary computer code for calculating the locally Gaussian density estimates, written in the R programming language (R Core Team 2015).
Rights and permissions
About this article
Cite this article
Otneim, H., Tjøstheim, D. The locally Gaussian density estimator for multivariate data. Stat Comput 27, 1595–1616 (2017). https://doi.org/10.1007/s11222-016-9706-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-016-9706-6