Skip to main content
Log in

Conditional density estimation using the local Gaussian correlation

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Let \(\mathbf {X} = (X_1,\ldots ,X_p)\) be a stochastic vector having joint density function \(f_{\mathbf {X}}(\mathbf {x})\) with partitions \(\mathbf {X}_1 = (X_1,\ldots ,X_k)\) and \(\mathbf {X}_2 = (X_{k+1},\ldots ,X_p)\). A new method for estimating the conditional density function of \(\mathbf {X}_1\) given \(\mathbf {X}_2\) is presented. It is based on locally Gaussian approximations, but simplified in order to tackle the curse of dimensionality in multivariate applications, where both response and explanatory variables can be vectors. We compare our method to some available competitors, and the error of approximation is shown to be small in a series of examples using real and simulated data, and the estimator is shown to be particularly robust against noise caused by independent variables. We also present examples of practical applications of our conditional density estimator in the analysis of time series. Typical values for k in our examples are 1 and 2, and we include simulation experiments with values of p up to 6. Large sample theory is established under a strong mixing condition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Bashtannyk, D.M., Hyndman, R.J.: Bandwidth selection for kernel conditional density estimation. Comput. Stat. Data Anal. 36(3), 279–298 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Berentsen, G.D., Cao, R., Francisco-Fernández, M., Tjøstheim, D.: Some properties of local gaussian correlation and other nonlinear dependence measures. J. Time Ser. Anal. 38, 352–380. doi:10.1111/jtsa.12183 (2017)

  • Brockwell, P.J., Davis, R.A.: Time Series: Theory and Methods. Springer, Berlin (2013)

    MATH  Google Scholar 

  • Bücher, A., Volgushev, S.: Empirical and sequential empirical copula processes under serial dependence. J. Multivar. Anal. 119, 61–70 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Chacón, J.E., Duong, T.: Multivariate plug-in bandwidth selection with unconstrained pilot bandwidth matrices. Test 19(2), 375–398 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, X., Linton, O.B.: The estimation of conditional densities. LSE STICERD Research Paper No. EM415 (2001)

  • Dette, H., Van Hecke, R., Volgushev, S.: Some comments on copula-based regression. J. Am. Stat. Assoc. 109(507), 1319–1324 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Yao, Q.: Nonlinear Time Series: Nonparametric and Parametric Methods. Springer, Berlin (2003)

    Book  MATH  Google Scholar 

  • Fan, J., Yim, T.H.: A crossvalidation method for estimating conditional densities. Biometrika 91(4), 819–834 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Yao, Q., Tong, H.: Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems. Biometrika 83(1), 189–206 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Faugeras, O.P.: A quantile-copula approach to conditional density estimation. J. Multivar. Anal. 100(9), 2083–2099 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Geenens, G., Charpentier, A., Paindaveine, D.: Probit transformation for nonparametric kernel estimation of the copula density. arXiv preprint arXiv:1404.4414 (2014)

  • Hall, P.: On Kullback–Leibler loss and density estimation. Ann. Stat. 15(4), 1491–1519 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  • Hall, P., Racine, J.S., Li, Q.: Cross-validation and the estimation of conditional probability densities. J. Am. Stat. Assoc. 99(468), 1015–1026 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Hayfield, T., Racine, J.S.: Nonparametric econometrics: the np package. J. Stat. Softw. 27(5), 1–32 (2008)

    Article  Google Scholar 

  • Hjort, N.L., Jones, M.C.: Locally parametric nonparametric density estimation. Ann. Stat. 24(4), 1619–1647 (1996)

  • Holmes, M.P., Gray, A.G., Isbell, C.L.: Fast nonparametric conditional density estimation. arXiv preprint arXiv:1206.5278 (2012)

  • Hothorn, T., Kneib, T., Bühlmann, P.: Conditional transformation models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 76(1), 3–27 (2014)

    Article  MathSciNet  Google Scholar 

  • Hyndman, R.J., Yao, Q.: Nonparametric estimation and symmetry tests for conditional density functions. J. Nonparametr. Stat. 14(3), 259–278 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Hyndman, R.J., Bashtannyk, D.M., Grunwald, G.K.: Estimating and visualizing conditional densities. J. Comput. Graph. Stat. 5(4), 315–336 (1996)

    MathSciNet  Google Scholar 

  • Irle, A.: On consistency in nonparametric estimation under mixing conditions. J. Multivar. Anal. 60(1), 123–147 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  • Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis, 6th edn. Pearson Education Iternational, New York (2007)

    MATH  Google Scholar 

  • Lacal, V., Tjøstheim, D.: Local Gaussian autocorrelation and tests for serial independence. J. Time Ser. Anal. 38(1), 51–71 (2017). doi:10.1111/jtsa.12195

    Article  MathSciNet  MATH  Google Scholar 

  • Matt, P.: Transformations in density estimation. J. Am. Stat. Assoc. 86(414), 343–353 (1991)

    Article  MathSciNet  Google Scholar 

  • Nelsen, R.B.: An Introduction to Copulas, vol. 139. Springer, Berlin (2013)

    MATH  Google Scholar 

  • Newey, W.K.: Uniform convergence in probability and stochastic equicontinuity. Econometrica 59(4), 1161–1167 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  • Noh, H., El Ghouch, A., Bouezmarni, T.: Copula-based regression estimation and inference. J. Am. Stat. Assoc. 108(502), 676–688 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Otneim, H., Tjøstheim, D.: The locally Gaussian density estimator for multivariate data. Stat. Comput. 1–22 (2016). doi:10.1007/s11222-016-9706-6. ISSN: 1573-1375

  • Palaro, H.P., Hotta, L.K.: Using conditional copula to estimate value at risk. J. Data Sci. 4, 93–115 (2006)

    Google Scholar 

  • Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  • Peligrad, M.: On the central limit theorem for weakly dependent sequences with a decomposed strong mixing coefficient. Stoch. Process. Appl 42(2), 181–193 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  • R Core Team: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2015).https://www.R-project.org/

  • Rosenblatt, M.: Conditional probability density and regression estimators. Multivar. Anal. II 25, 31 (1969)

    Google Scholar 

  • Rosenblatt, M., et al.: Remarks on some nonparametric estimates of a density function. Ann. Math. Stat. 27(3), 832–837 (1956)

  • Ruppert, D., Cline, D.B.H.: Bias reduction in kernel density estimation by smoothed empirical transformations. Ann. Stat. 22(1), 185–210 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  • Schervish, M.J.: Theory of Statistics. Springer, Berlin (1995)

    Book  MATH  Google Scholar 

  • Severini, T.A.: Likelihood Methods in Statistics. Oxford Science Publications, Oxford University Press, Oxford (2000). ISBN 9780198506508

    MATH  Google Scholar 

  • Sheather, S.J., Jones, M.C.: A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. Ser. B (Methodol) 53(3), 683–690 (1991)

  • Silverman, B.W.: Density estimation for statistics and data analysis. Monogr. Stat. Appl. Probab. 26 (1986)

  • Stone, C.J.: Large-sample inference for log-spline models. Ann. Stat. 18(2), 717–741 (1990)

  • Stone, C.J., Hansen, M.H., Kooperberg, C., Truong, Y.K., et al.: Polynomial splines and their tensor products in extended linear modeling: 1994 Wald Memorial Lecture. Ann. Stat. 25(4), 1371–1470 (1997)

    Article  MATH  Google Scholar 

  • Tjøstheim, D., Hufthammer, K.O.: Local Gaussian correlation: a new measure of dependence. J. Econom. 172(1), 33–48 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Wand, M.P., Jones, M.C.: Multivariate plug-in bandwidth selection. Comput. Stat. 9(2), 97–116 (1994)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank two anonymus refrees, who provided several useful comments and suggestions in the preparation of this article. This work has been partly supported by The Finance Market Fund, project number 261570.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Håkon Otneim.

Electronic supplementary material

Below is the link to the electronic supplementary material.

11222_2017_9732_MOESM1_ESM.zip

Supplementary material The file code.zip, that accompanies this article, contains the data sets that has been used, as well as routines for implementing the conditional density estimator in the R programming language (R Core Team, 2015). (zip 50KB)

Appendices

Appendix 1: Proofs

1.1 Proof of Theorem 1

Except from a slight modification that accounts for the replacement of independence with \(\alpha \)-mixing, the proof of Theorem 1 is identical to the corresponding proof in Otneim and Tjøstheim (2016), which again is based on the global maximum likelihood case covered by Severini (2000). For each location \(\mathbf {z}\), that we for simplicity suppress from notation, denote by \(Q_{\mathbf {h}_n,K}(\rho )\) the expectation of the local likelihood function \(L_n(\rho , \mathbf {Z})\). Consistency follows from uniform convergence in probability of \(L_n(\rho , \mathbf {Z})\) toward \(Q_{\mathbf {h}_n,K}(\rho )\), conditions for which are provided in Corollary 2.2 by Newey (1991).

The result requires compact support of the parameter space, equicontinuity and Lipschitz continuity of the family of functions \(\{Q_{\mathbf {h}_n, K}(\rho )\}\), as well as pointwise convergence of the local likelihood functions. Compactness is covered by Assumption D, and the demonstration of equi- and Lipschitz continuity in Otneim and Tjøstheim (2016) does not rely on the independent data assumption. Pointwise convergence follows from a standard nonparametric law of large numbers in the independent case. Our assumption B about \(\alpha \)-mixing data, however, ensures that pointwise convergence still holds; see, for example, Theorem 1 by Irle (1997), conditions for which are straightforward to verify in our local likelihood setting.

The rest of the proof is identical to the corresponding argument by Severini (2000, pp. 105–107).

1.2 Proof of Theorem 2

Consider first the bivariate case, in which there is only one local correlation to estimate. The first part of the proof goes through exactly as in the iid case of Otneim and Tjøstheim (2016). We follow the argument for global maximum likelihood estimators as presented in Theorem 7.63 by Schervish (1995). The statement of Theorem 2 follows provided that

$$\begin{aligned} Y_n(\mathbf {z}) = \sum _{i=1}^nK\left( |\mathbf {h}_n|^{-1}(\mathbf {Z}_i - \mathbf {z})\right) u(\mathbf {Z}_i,\rho _0) = \sum _{i=1}^nV_{ni},\nonumber \\ \end{aligned}$$
(17)

is asymptotically normal, and this follows from a standard Taylor expansion. In the iid case, the limiting distribution of (17) is derived using the same technique as when demonstrating asymptotic normality for the standard kernel estimator, for example, as in the proof of Theorem 1A by Parzen (1962). We establish asymptotic normality of (17) in case of \(\alpha \)-mixing data, however, by going through the steps used in proving Theorem 2.22 in Fan and Yao (2003). Let \(W_i = h^{-1}V_{ni}\), then

$$\begin{aligned} \frac{1}{nh^2}\text {Var}(Y_n(\mathbf {z}))&= \frac{1}{nh^2} \Bigg \{ \sum _{i=1}^n \text {Var}(V_{ni}) \\&\qquad + 2{\sum \sum }_{1\le i < j \le n}\text {Cov}(V_{ni},V_{nj})\Bigg \} \\&= \text {Var}(W_1) + 2\sum _{j=1}^n (1-j/n)\text {Cov}(W_1, W_{j+1}), \end{aligned}$$

where

$$\begin{aligned} \text {Var}(W_1)&= \textit{E}(W_1^2) - (\textit{E}(W_1))^2 \\&= \int h^{-2}u^2(\mathbf {z}, \rho _0)K^2(h^{-1}(\mathbf {y} - \mathbf {z})) f(\mathbf {y}) \,\text {d}\mathbf {y} + O(h^2) \\&= \int u^2(\mathbf {z} + h\mathbf {v})K^2(\mathbf {v})f(\mathbf {z} + h\mathbf {v})\, \text {d}\mathbf {v} + O(h^2) \\&\quad \rightarrow u^2(\mathbf {z}, \rho _0)f(\mathbf {z})\int K^2(\mathbf {v})\, \text {d}\mathbf {v} \mathop {=}\limits ^{\text {def}} M(\mathbf {z}) \,\,\text {as}\,\, \mathbf {h}\rightarrow 0, \end{aligned}$$

and

$$\begin{aligned} |\text {Cov}(W_1, W_{j+1})|= & {} |\textit{E}(W_1W_{j+1}) - \textit{E}(W_1)\textit{E}(W_{j+1})|\\= & {} O(h^2), \end{aligned}$$

using the same argument once again. Therefore,

$$\begin{aligned} \left| \sum _{j=1}^{m_n}\text {Cov}(W_1,W_{j+1})\right| = O(m_nh^2). \end{aligned}$$

Fan and Yao (2003) require that

$$\begin{aligned} \textit{E}(u(\mathbf {Z}_n, \rho _0(\mathbf {z}))^{\delta })<\infty \end{aligned}$$
(18)

for some \(\delta >2,\) but this is of course true for our transformed data, because it is marginally normal. In proposition 2.5(i) by Fan and Yao (2003) we can therefore use \(p=q=\delta >2\) in order to obtain, for some constant C,

$$\begin{aligned} |\text {Cov}(W_|, W_{j+1})|\le C\alpha (j)^{1-2/\delta }h^{4/\delta -2}. \end{aligned}$$

Let \(m_n=(h_n^2|\log h_n^2|)^{-1}\). Then \(m_n\rightarrow \infty \), \(m_nh^2\rightarrow 0\), and

$$\begin{aligned}&\sum _{j=m_n+1}^{n-1}|\text {Cov}(W_1,W_{j+1})| \end{aligned}$$
(19)
$$\begin{aligned}&\quad \le C\frac{h^{4/\delta -2}}{m_n^{\lambda }}\sum _{j=m_n+1}^nj^{\lambda } \alpha (j)^{1-2/\delta }\rightarrow 0, \end{aligned}$$
(20)

which follows from Assumption B. Thus,

$$\begin{aligned} \sum _{j=1}^{n-1}\text {Cov}(W_1, W_{j+1})\rightarrow 0, \end{aligned}$$

and it follows that

$$\begin{aligned} \frac{1}{nh^2}\text {Var}(Y_n(\mathbf {z})) = M(\mathbf {z})(1+o(1)). \end{aligned}$$

The proof now continues exactly as in Fan and Yao (2003) using the “big block small block” technique, but with the obvious replacement of h with \(h^2\) to accommodate the bivariate case.

We expand the argument to the multivariate case using the Cramèr–Wold device. Let \(\mathbf {\rho } = (\rho _1, \ldots , \rho _d)^T\) be the vector of local correlations, where \(d = p(p-1)/2\), write \(\mathbf {u}(\mathbf {z}, \mathbf {\rho }_0) = (u_1(\mathbf {z}, \mathbf {\rho }_0), \ldots , u_d(\mathbf {z}, \mathbf {\rho }_0))\) and let \(\mathbf {S}_{n}(\mathbf {z}) = \{S_{ni}(\mathbf {z})\}_{i=1}^d\), where

$$\begin{aligned} S_{ni} = \sum _{n=1}^nu_k(\mathbf {Z}_t, \mathbf {\rho }_0)K(|\mathbf {h}|^{-1}(\mathbf {Z}_t - \mathbf {z})). \end{aligned}$$

We must show that

$$\begin{aligned} \sum _ka_kS_{nk} \mathop {\rightarrow }\limits ^{\mathcal {L}} \sum _ka_kZ_k^*, \end{aligned}$$
(21)

where \(\mathbf {a} = (a_1, \ldots , a_d)^T\) is an arbitrary vector of constants, and \(\mathbf {Z}^* = (Z_1^*, \ldots , Z_k^*)\) is a jointly normally distributed random vector. Because of Slutsky’s Theorem, it suffices to show that the left-hand side of (21) is asymptotically normal. This follows from observing that it is on the same form as the original sequence comprising \(S_n\), with

$$\begin{aligned} \sum _ka_kS_{nk} = \sum _nu^*(\mathbf {Z}_n, \mathbf {\rho }_0)K(|\mathbf {h}|^{-1}(\mathbf {Z}_n-\mathbf {z})), \end{aligned}$$

where \(u^*(\mathbf {Z}_n, \mathbf {\rho }_0) = \sum _ka_ku_k(\mathbf {Z}_n,\mathbf {\rho }_0)\). It is well known that any measurable mapping of a mixing sequence of random variables inherit the mixing properties of the original series, so condition B is therefore satisfied by the linear combination. The new sequence of observations satisfies (18) because it follows from Jensen’s inequality that for \(\delta >2\),

$$\begin{aligned} \left[ \frac{u^*(\mathbf {Z}_t, \mathbf {\rho }_0)}{\sum _ka_k}\right] ^{\delta }&= \left[ \frac{\sum _ka_ku_k(\mathbf {Z}_t, \mathbf {\rho }_0)}{\sum _ka_k}\right] ^{\delta } \\&\le \frac{\sum _ka_k[u_k(\mathbf {Z}_t,\mathbf {\rho }_0)]^{\delta }}{\sum _ka_k}, \end{aligned}$$

so that

$$\begin{aligned} \textit{E}[u^*(\mathbf {Z}_t,\mathbf {\rho }_0)]^{\delta } \le \sum _ka_k\textit{E}[u_k(\mathbf {Z}_t,\mathbf {\rho }_0)]^{\delta }\left[ \sum _ka_k\right] ^{\delta - 1}<\infty . \end{aligned}$$

The off-diagonal elements in the asymptotic covariance matrix are zero using the same arguments as in Otneim and Tjøstheim (2016).

1.3 Proof of Theorem 3

The key to proving 3 is to show that the asymptotic distribution of (17) remains unchanged when the marginally standard normal stochastic vectors \(\mathbf {Z}_n\) are replaced with the pseudo-observations

$$\begin{aligned} \widehat{\mathbf {Z}}_n = \left( \varPhi ^{-1}(\widehat{F}_1(X_{j1})), \ldots , \varPhi ^{-1}(\widehat{F}_p(X_{jp}))\right) ^\mathrm{T}, \end{aligned}$$

where \(\widehat{F}_i(\cdot )\), \(i=1,\ldots ,p\) are the marginal empirical distribution functions. This is shown in the independent case under assumptions FG in Otneim and Tjøstheim (2016), by providing a slight modification to Proposition 3.1 by Geenens et al (2014). The essence in that proof is the convergence of the empirical copula process, which remain unchanged if we replace the assumption of independent observations with \(\alpha \)-mixing, according to Bücher and Volgushev (2013).

The multivariate delta method states that if \(\sqrt{nh^2}(\theta _n - \theta ) \mathop {\rightarrow }\limits ^{\mathcal {L}} N(0, A)\) and \(q:R^n\rightarrow R\) has continuous first partial derivatives, then \(\sqrt{nh^2}(q(\theta _n) - q(\theta )) \mathop {\rightarrow }\limits ^{\mathcal {L}} N(0, \nabla q(\theta )^TA\nabla q(\theta ))\) Schervish 1995, p. 403). In our case, \(q(\mathbf {\rho }) = \varPsi (\mathbf {z}, \mathbf {R})g(\mathbf {x})\), and

$$\begin{aligned} \nabla q(\mathbf {\rho }) = \varPsi (\mathbf {z}, \mathbf {R})g(\mathbf {x})\mathbf {u}(\mathbf {z}, \mathbf {R}), \end{aligned}$$

from which the result follows immediately.

Appendix 2: Large sample properties of the logspline estimator

The current implementation of our method in the R programming language (R Core Team 2015) uses the logspline method by Stone et al. (1997) for marginal density estimation. The asymptotic theory for the logspline estimator is derived by Stone (1990), but restricted to density functions with compact support. Otneim and Tjøstheim (2016) relax this requirement using a truncation argument, so that the requirement of compact support can be replaced by an assumption on the tails of the unknown density not being too heavy.

In particular, Stone (1990) denotes by \(\epsilon \in (0,1/2)\) a tuning parameter that determines the asymptotic rate at which new nodes are added to the logspline procedure. If \(\epsilon \) is close to zero, new nodes are added quickly to the procedure, and as \(\epsilon \rightarrow 1/2\), new nodes are added very slowly. Stone (1990) then provides the following asymptotic results (again, under the assumption that the true density \(f(\mathbf {x})\) has compact support):

$$\begin{aligned} \sqrt{n^{0.5 + \epsilon }}\left( \widehat{f}_i(x) - f(x)\right) \mathop {\rightarrow }\limits ^{\mathcal {L}} N(0, \sigma _1^2), \end{aligned}$$

and

$$\begin{aligned} \sqrt{n^{0.5}}\left( \widehat{F}_i(x) - F(x)\right) \mathop {\rightarrow }\limits ^{\mathcal {L}} N(0, \sigma _2^2). \end{aligned}$$

Otneim and Tjøstheim (2016) show that these results hold if there exist constants \(M>0\), \(\gamma > 2\epsilon /(1-2\epsilon )\), and \(x_0>0\) such that \(f(x)\le M|x|^{-(5/2+\gamma )}\) for all \(|x|>x_0\), so the “worst case scenario” with respect to assumption I when using the logspline estimator for the final back-transformation, is \(\epsilon \) being close to zero. In that case, we must require the bandwidths to tend to zero fast enough so that \(n^{1/2}h^2\rightarrow 0\), but on the other hand, that will allow \(\gamma \) to approach zero, and thus the tail-thickness of the density to approach that of \(|x|^{-5/2}\).

What remains here is to show that these results hold also in the case where the observations are \(\alpha \)-mixing. This is easily done by replacing the use of the iid central limit theorem (clt) in the proof of Theorem 3 in Stone (1990), with a corresponding clt that holds under our mixing condition. For example, Theorem A by Peligrad (1992) proves the clt under \(\alpha \)-mixing provided that the mixing coefficients satisfy \(\sum _{n=1}^{\infty }\alpha (n)^{1-2/\delta } < \infty \). This condition follows from our Assumption B.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Otneim, H., Tjøstheim, D. Conditional density estimation using the local Gaussian correlation. Stat Comput 28, 303–321 (2018). https://doi.org/10.1007/s11222-017-9732-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-017-9732-z

Keywords