Bandwidth matrix selectors for kernel regression

Koláček, Jan; Horová, Ivana

doi:10.1007/s00180-017-0709-3

Bandwidth matrix selectors for kernel regression

Original Paper
Published: 16 January 2017

Volume 32, pages 1027–1046, (2017)
Cite this article

Computational Statistics Aims and scope Submit manuscript

511 Accesses
2 Citations
Explore all metrics

Abstract

Choosing a bandwidth matrix belongs to the class of significant problems in multivariate kernel regression. The problem consists of the fact that a theoretical optimal bandwidth matrix depends on the unknown regression function which to be estimated. Thus data-driven methods should be applied. A method proposed here is based on a relation between asymptotic integrated square bias and asymptotic integrated variance. Statistical properties of this method are also treated. The last two sections are devoted to simulations and an application to real data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimator Selection: a New Method with Applications to Kernel Density Estimation

Article 12 June 2017

Maximum likelihood method for bandwidth selection in kernel conditional density estimate

Article 27 March 2019

On automatic kernel density estimate-based tests for goodness-of-fit

Article 01 February 2022

References

Aldershof B, Marron J, Park B, Wand M (1995) Facts about the gaussian probability density function. Appl Anal 59:289–306
Chacón JE, Duong T, Wand MP (2011) Asymptotics for general multivariate kernel density derivative estimators. Stat Sin 21(2):807–840
Article MathSciNet MATH Google Scholar
Chiu S (1990) Why bandwidth selectors tend to choose smaller bandwidths, and a remedy. Biometrika 77(1):222–226
Article MathSciNet MATH Google Scholar
Chiu S (1991) Some stabilized bandwidth selectors for nonparametric regression. Ann Stat 19(3):1528–1546
Article MathSciNet MATH Google Scholar
Craven P, Wahba G (1979) Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer Math 31(4):377–403
Article MathSciNet MATH Google Scholar
Droge B (1996) Some comments on cross-validation. Tech. Rep. 1994-7, Humboldt Universitaet Berlin. http://ideas.repec.org/p/wop/humbsf/1994-7.html
Duong T, Hazelton M (2005a) Convergence rates for unconstrained bandwidth matrix selectors in multivariate kernel density estimation. J Multivar Anal 93(2):417–433
Duong T, Hazelton M (2005b) Cross-validation bandwidth matrices for multivariate kernel density estimation. Scand J Stat 32(3):485–506
Fan J (1993) Local linear regression smoothers and their minimax efficiencies. Ann Stat 21(1):196–216
Article MathSciNet MATH Google Scholar
Gasser T, Müller HG (1979) Kernel estimation of regression functions. In: Gasser T, Rosenblatt M (eds) Smoothing techniques for curve estimation. Lecture Notes in Mathematics, vol 757. Springer, Berlin, pp 23–68
Chapter Google Scholar
Härdle W (1990) Applied nonparametric regression, 1st edn. Cambridge University Press, Cambridge
Book MATH Google Scholar
Härdle W (2004) Nonparametric and semiparametric models. Springer, Berlin
Book MATH Google Scholar
Herrmann E, Engel J, Wand M, Gasser T (1995) A bandwidth selector for bivariate kernel regression. J R Stat Soc Ser B (Methodological) 57:171–180
Horová I, Koláček J, Vopatová K (2013) Full bandwidth matrix selectors for gradient kernel density estimate. Comput Stat Data Anal 57(1):364–376
Article MathSciNet MATH Google Scholar
Horová I, Zelinka J (2007) Contribution to the bandwidth choice for kernel density estimates. Comput Stat 22(1):31–47
Article MathSciNet MATH Google Scholar
Jones MC, Kappenman RF (1991) On a class of kernel density estimate bandwidth selectors. Scand J Stat 19(4):337–349
MathSciNet MATH Google Scholar
Jones MC, Marron JS, Park BU (1991) A simple root n bandwidth selector. Ann Stat 19(4):1919–1932
Article MathSciNet MATH Google Scholar
Koláček J (2005) Kernel estimation of the regression function (in Czech). Ph.D. thesis, Masaryk University, Brno
Koláček J (2008) Plug-in method for nonparametric regression. Comput Stat 23(1):63–78
Article MathSciNet Google Scholar
Koláček J, Horová I (2016) Selection of bandwidth for kernel regression. Commun Stat Theory Methods 45(5):1487–1500
Khler M, Schindler A, Sperlich S (2014) A review and comparison of bandwidth selection methods for kernel regression. Int Stat Rev 82(2):243–274
Article MathSciNet Google Scholar
Lafferty J, Wasserman L (2008) Rodeo: sparse, greedy nonparametric regression. Ann Stat 36:28–63
Lau G, Ooi PL, Phoon B (1998) Fatal falls from a height: the use of mathematical models to estimate the height of fall from the injuries sustained. Forensic Sci Int 93(1):33–44
Article Google Scholar
Magnus JR, Neudecker H (1979) Commutation matrix: some properties and application. Ann Stat 7(2):381–394
Article MathSciNet MATH Google Scholar
Magnus JR, Neudecker H (1999) Matrix differential calculus with applications in statistics and econometrics, 2nd edn. Wiley, New York
MATH Google Scholar
Manteiga WG, Miranda MM, González AP (2004) The choice of smoothing parameter in nonparametric regression through wild bootstrap. Comput Stat Data Anal 47(3):487–515
Article MathSciNet MATH Google Scholar
Rice J (1984) Bandwidth choice for nonparametric regression. Ann Stat 12(4):1215–1230
Article MathSciNet MATH Google Scholar
Ruppert D (1997) Empirical-bias bandwidths for local polynomial nonparametric regression and density estimation. J Am Stat Assoc 92(439):1049–1062
Article MathSciNet MATH Google Scholar
Ruppert D, Wand MP (1994) Multivariate locally weighted least squares regression. Ann Stat 22:1346–1370
Seifert B, Gasser T (1996) Variance properties of local polynomials and ensuing modifications. In: Härdle W, Schimek M (eds) Statistical theory and computational aspects of smoothing, contributions to statistics. Physica, Heidelberg, pp 50–79
Chapter Google Scholar
Silverman BW (1985) Some aspects of the spline smoothing approach to non-parametric regression curve fitting. J R Stat Soc Ser B (Methodological) 47:1–52
MATH Google Scholar
Simonoff JS (1996) Smoothing methods in statistics. Springer, New York
Book MATH Google Scholar
Staniswalis JG, Messer K, Finston DR (1993) Kernel estimators for multivariate regression. J Nonparametric Stat 3(2):103–121
Article MathSciNet MATH Google Scholar
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B Stat Methodol 36(2):111–147
MathSciNet MATH Google Scholar
Wand M, Jones M (1993) Comparison of smoothing parameterizations in bivariate kernel density-estimation. J Am Stat Assoc 88(422):520–528
Article MathSciNet MATH Google Scholar
Wand M, Jones M (1995) Kernel smoothing. Chapman and Hall, London
Book MATH Google Scholar
Wand MP, Jones MC (1994) Multivariate plug-in bandwidth selection. Comput Stat 9(2):97–116
MathSciNet MATH Google Scholar
Yang L, Tschernig R (1999) Multivariate bandwidth selection for local linear regression. J R Stat Soc Ser B (Stat Methodol) 61(4):793–815
Article MathSciNet MATH Google Scholar
Zhang X, Brooks RD, King ML (2009) A bayesian approach to bandwidth selection for multivariate kernel regression with an application to state-price density estimation. J Econom 153(1):21–32
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research was supported by Masaryk University, Project GAČR GA15-06991S.

Author information

Authors and Affiliations

Masaryk University, Kotlářská 2, 61137, Brno, Czech Republic
Jan Koláček & Ivana Horová

Authors

Jan Koláček
View author publications
You can also search for this author inPubMed Google Scholar
Ivana Horová
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jan Koláček.

Appendix: Proofs

As the first, we introduce some facts on matrix differential calculus and on the Gaussian density (see Magnus and Neudecker 1979, 1999; Aldershof et al. 1995).

Let $\mathbf {A},\, \mathbf {B}$ be $d\times d$ matrices:

$1^\circ $ :

$\displaystyle {{\mathrm{tr}}}(\mathbf {A}^T \mathbf {B})=\mathrm {vec}^T \mathbf {A}\mathrm {vec}\mathbf {B}$

$2^\circ $ :

$\displaystyle \int \phi _{c\mathbf {I}}(\mathbf {z})\{{{\mathrm{tr}}}(\mathbf {H}^{1/2} D^2 \mathbf {H}^{1/2} \mathbf {z}\mathbf {z}^T)\}m(\mathbf {x})d\mathbf {z}=c\{{{\mathrm{tr}}}(\mathbf {H}D^2)\}m(\mathbf {x})$

$\displaystyle \int \phi _{c\mathbf {I}}(\mathbf {z})\{{{\mathrm{tr}}}^2(\mathbf {H}^{1/2} D^2 \mathbf {H}^{1/2}\mathbf {z}\mathbf {z}^T) \}m(\mathbf {x}) d\mathbf {z}=3c^2\{{{\mathrm{tr}}}^2(\mathbf {H}D^2) \}m(\mathbf {x})$

$\displaystyle \int \phi _{c\mathbf {I}}(\mathbf {z})\{{{\mathrm{tr}}}^k(\mathbf {H}^{1/2} D^2\mathbf {H}^{1/2} \mathbf {z}\mathbf {z}^T) {{\mathrm{tr}}}(\mathbf {H}^{1/2}D\mathbf {z}^T)\}m(\mathbf {x})d\mathbf {z}=\mathbf {0},\quad c,\,k\in \mathbb {N}_0$

$3^\circ $ :

$\displaystyle \int D^km(\mathbf {x})[D^km(\mathbf {x})]^Td\mathbf {x}= (-1)^k\int D^{2k}m(\mathbf {x})m(\mathbf {x})d\mathbf {x},\quad k\in \mathbb {N}$

$4^\circ $ :

$\displaystyle \varLambda (\mathbf {z})=\phi _{4\mathbf {I}}(\mathbf {z})-2\phi _{3\mathbf {I}}(\mathbf {z})+\phi _{2\mathbf {I}}(\mathbf {z})$,

then using $3^\circ $ yields

$\displaystyle \int \varLambda (\mathbf {z})d\mathbf {z}= 0$

$\displaystyle \int \varLambda (\mathbf {z})\{{{\mathrm{tr}}}(\mathbf {H}^{1/2} D^2\mathbf {H}^{1/2} \mathbf {z}\mathbf {z}^T)\}m(\mathbf {x})d\mathbf {z}= \mathbf {0}$

$\displaystyle \int \varLambda (\mathbf {z})\{{{\mathrm{tr}}}^2(\mathbf {H}^{1/2} D^2\mathbf {H}^{1/2} \mathbf {z}\mathbf {z}^T) \}m(\mathbf {x}) d\mathbf {z}=6\{{{\mathrm{tr}}}^2(\mathbf {H}D^2) \}m(\mathbf {x})$

$\displaystyle \int \varLambda (\mathbf {z})\{{{\mathrm{tr}}}^k(\mathbf {H}^{1/2} D^2\mathbf {H}^{1/2} \mathbf {z}\mathbf {z}^T) {{\mathrm{tr}}}(\mathbf {H}^{1/2}D\mathbf {z}^T)\}m(\mathbf {x})d\mathbf {z}=\mathbf {0},\quad k\in \mathbb {N}_0$

$5^\circ $ :

Taylor expansion takes the form

$$\begin{aligned} \begin{aligned} m(\mathbf {x}-\mathbf {H}^{1/2}\mathbf {z})&= m(\mathbf {x})-\{\mathbf {z}^T\mathbf {H}^{1/2}D\}m(\mathbf {x})\\&\quad + \frac{1}{2!}\{(\mathbf {z}^T\mathbf {H}^{1/2}D)^2\}m(\mathbf {x})+\dots \\&\quad +\frac{(-1)^k}{k!}\{(\mathbf {z}^T\mathbf {H}^{1/2}D)^k\}m(\mathbf {x})\\&\quad + o(||\mathbf {H}^{1/2}\mathbf {z}||^k). \end{aligned} \end{aligned}$$

Lemma 2

The integrated square bias can be expressed as

$$\begin{aligned} {{\mathrm{ISB}}}(\mathbf {H}) = \int \left( (K_{\mathbf {H}}*m)(\mathbf {x}) -m(\mathbf {x}) \right) ^2\,{{\mathrm{d}}}\mathbf {x}+ O(n^{-1}), \end{aligned}$$

the symbol $*$ denotes convolution.

Proof

$$\begin{aligned} \widehat{m}(\mathbf {x},\mathbf {H}) =\sum _{i=1}^n \int \limits _{A_i} K_{\mathbf {H}}(\mathbf {x}-\mathbf {z})\,{{\mathrm{d}}}\mathbf {z}Y_i . \end{aligned}$$

Each integral in the sum can be approximated in the following way

$$\begin{aligned} \int \limits _{A_i} K_{\mathbf {H}}(\mathbf {x}-\mathbf {z})\,{{\mathrm{d}}}\mathbf {z}= \underbrace{\lambda (A_i)}_{\frac{1}{n}}K_{\mathbf {H}}(\mathbf {x}-\mathbf {x}_i) + O\left( n^{-1}\right) \end{aligned}$$

Thus

$$\begin{aligned} \widehat{m}(\mathbf {x},\mathbf {H}) =\frac{1}{n}\sum _{i=1}^n K_{\mathbf {H}}(\mathbf {x}-\mathbf {x}_i)Y_i + O\left( n^{-1}\right) . \end{aligned}$$

Further

$$\begin{aligned} E\widehat{m}(\mathbf {x},\mathbf {H})&=\frac{1}{n}\sum _{i=1}^n K_{\mathbf {H}}(\mathbf {x}-\mathbf {x}_i)EY_i + O\left( n^{-1}\right) \\&=\frac{1}{n}\sum _{i=1}^n K_{\mathbf {H}}(\mathbf {x}-\mathbf {x}_i)m(\mathbf {x}_i) + O\left( n^{-1}\right) \\&=\int K_{\mathbf {H}}(\mathbf {x}-\mathbf {z})m(\mathbf {z})d\mathbf {z}+ O\left( n^{-1}\right) \\&= (K_{\mathbf {H}}*m)(\mathbf {x})+ O\left( n^{-1}\right) . \end{aligned}$$

Sketch of the proof of Theorem 1:

Proof

In order to show the $\widehat{\varGamma }$ is the asymptotically unbiased estimator of $\varGamma $, we evaluate $E(\widehat{{{\mathrm{AISB}}}}(\mathbf {H}))$

$$\begin{aligned} \begin{aligned} E(\widehat{{{\mathrm{AISB}}}}(\mathbf {H}))&=\frac{1}{n^2}\sum _{\begin{array}{c} i,j=1 \\ i\ne j \end{array}}^n \varLambda _\mathbf {H}(\mathbf {x}_i-\mathbf {x}_j)m(\mathbf {x}_i)m(\mathbf {x}_j)\\&=\iint \varLambda _\mathbf {H}(\mathbf {x}-\mathbf {y})m(\mathbf {y})m(\mathbf {x})d\mathbf {y}d\mathbf {x}+ O\left( n^{-1}\right) \\&=\iint \varLambda (\mathbf {z})m(\mathbf {x}-\mathbf {H}^{1/2}\mathbf {z})m(\mathbf {x})d\mathbf {z}d\mathbf {x}+ O\left( n^{-1}\right) .\\ \end{aligned} \end{aligned}$$

Taylor expansion, defined in $5^{\circ }$, and using $4^\circ $ yields

$$\begin{aligned} \begin{aligned}&= \iint \varLambda (\mathbf {z})\Biggl (\sum _{i=0}^5\frac{(-1)^i}{i!}\{(\mathbf {z}^T\mathbf {H}^{1/2}D)^i\}m(\mathbf {x}) \\&\phantom {==}+ o(||\mathbf {H}^{1/2}\mathbf {z}||^5)\Biggr )m(\mathbf {x})d\mathbf {z}d\mathbf {x}+ O\left( n^{-1}\right) \\&= \iint \varLambda (\mathbf {z})\biggl (\frac{1}{4!}\{(\mathbf {z}^T\mathbf {H}^{1/2}D)^4\}m(\mathbf {x}) +o(||\mathbf {H}^{1/2}\mathbf {z}||^5)\biggr )m(\mathbf {x})d\mathbf {z}d\mathbf {x}\\&\qquad + O\left( n^{-1}\right) \\&=\frac{1}{4!}\iint \varLambda (\mathbf {z})\{{{\mathrm{tr}}}^2(\mathbf {H}^{1/2} D^2\mathbf {H}^{1/2} \mathbf {z}\mathbf {z}^T) \}m(\mathbf {x})m(\mathbf {x})d\mathbf {z}d\mathbf {x}\\&\qquad +o(||\mathrm {vec}\mathbf {H}||^{5/2})+ O\left( n^{-1}\right) , \\ \end{aligned} \end{aligned}$$

using properties $3^\circ $ and $4^\circ $ we arrive at

$$\begin{aligned} \begin{aligned}&=\frac{1}{4} \int \{{{\mathrm{tr}}}^2(\mathbf {H}D^2) \}m(\mathbf {x}) m(\mathbf {x})d\mathbf {x}+o(||\mathrm {vec}\mathbf {H}||^{5/2}) + O\left( n^{-1}\right) \\&=\frac{1}{4}\int ||\{{{\mathrm{tr}}}(\mathbf {H}D^{2})\}m(\mathbf {x})||^2d\mathbf {x}+o(||\mathrm {vec}\mathbf {H}||^{5/2}) + O\left( n^{-1}\right) \\&=\frac{1}{4}\mathrm {vec}^T\mathbf {H}V(\{\mathrm {vec}D^{2}\}m)\mathrm {vec}\mathbf {H}+o(||\mathrm {vec}\mathbf {H}||^{5/2})+ O\left( n^{-1}\right) . \end{aligned} \end{aligned}$$

To finish the proof of Theorem 1 it is sufficient to derive ${{\mathrm{Var}}}(\widehat{{{\mathrm{AISB}}}}(\mathbf {H}))$

$$\begin{aligned} \begin{aligned} {{\mathrm{Var}}}(\widehat{{{\mathrm{AISB}}}}(\mathbf {H}))&=\frac{1}{n^4}\sum _{\begin{array}{c} i,j=1 \\ i\ne j \end{array}}^n \varLambda ^2_\mathbf {H}(\mathbf {x}_i-\mathbf {x}_j){{\mathrm{Var}}}Y_iY_j\\&=\frac{\sigma ^4}{n^4}\sum _{\begin{array}{c} i,j=1 \\ i\ne j \end{array}}^n \varLambda ^2_\mathbf {H}(\mathbf {x}_i-\mathbf {x}_j). \end{aligned} \end{aligned}$$

Since the estimator $\widehat{{{\mathrm{AISB}}}}(\mathbf {H})$ is asymptotically unbiased and its variance tends to zero the estimator is consistent.

Sketch of the proof of Lemma 1:

Proof

In order to show the $\widehat{\psi }_{4,0}$ is the asymptotically unbiased estimator of $\psi _{4,0}$, we evaluate $E(\widehat{\psi }_{4,0})$

$$\begin{aligned} \begin{aligned} E(\widehat{\psi }_{4,0})&=\frac{1}{n^2}\sum _{\begin{array}{c} i,j=1 \\ i\ne j \end{array}}^n \frac{\partial ^4 K_\mathbf {G}}{\partial x_1^4}(\mathbf {x}_i-\mathbf {x}_j)m(\mathbf {x}_i)m(\mathbf {x}_j)\\&=\iint \frac{\partial ^4 K_\mathbf {G}}{\partial y_1^4}(\mathbf {x}-\mathbf {y})m(\mathbf {y})m(\mathbf {x})d\mathbf {y}d\mathbf {x}+ O\left( n^{-1}\right) \\&=\iint \frac{\partial ^4 m(\mathbf {y})}{\partial y_1^4}K_\mathbf {G}(\mathbf {x}- \mathbf {y})m(\mathbf {x})d\mathbf {y}d\mathbf {x}+ O\left( n^{-1}\right) \\&=\iint \frac{\partial ^4 m(\mathbf {y})}{\partial y_1^4}K(\mathbf {z})m\left( \mathbf {y}+ \mathbf {G}^{1/2}\mathbf {z}\right) d\mathbf {y}d\mathbf {z}+ O\left( n^{-1}\right) \\ \end{aligned} \end{aligned}$$

Taylor expansion defined in $5^{\circ }$ yields

$$\begin{aligned}&= \iint K(\mathbf {z})\Biggl (\sum _{i=0}^2\frac{(-1)^i}{i!}\{(\mathbf {z}^T\mathbf {G}^{1/2}D)^i\}m(\mathbf {y}) \\&\phantom {==}+ o(||\mathbf {G}^{1/2}\mathbf {z}||^2)\Biggr )\frac{\partial ^4 m(\mathbf {y})}{\partial y_1^4}d\mathbf {y}d\mathbf {z}+ O\left( n^{-1}\right) \\&= \int \frac{\partial ^4 m(\mathbf {y})}{\partial y_1^4} m(\mathbf {y})d\mathbf {y}+ \frac{\beta _2}{2}\int \{{{\mathrm{tr}}}(GD^2)\} m(\mathbf {x}) d\mathbf {x}+ O\left( n^{-1}\right) \\&=\,\psi _{4,0} + \frac{\beta _2}{2}\int \{{{\mathrm{tr}}}(GD^2)\} m(\mathbf {x}) d\mathbf {x}+ O\left( n^{-1}\right) . \end{aligned}$$

To finish the proof of Lemma 1 it is sufficient to derive ${{\mathrm{Var}}}(\widehat{\psi }_{4,0})$

$$\begin{aligned} \begin{aligned} {{\mathrm{Var}}}(\widehat{\psi }_{4,0})&=\frac{1}{n^4}\sum _{\begin{array}{c} i,j=1 \\ i\ne j \end{array}}^n \frac{\partial ^4 K_\mathbf {G}}{\partial x_1^4}(\mathbf {x}_i-\mathbf {x}_j){{\mathrm{Var}}}Y_iY_j\\&=\frac{\sigma ^4}{n^4}\sum _{\begin{array}{c} i,j=1 \\ i\ne j \end{array}}^n \frac{\partial ^4 K_\mathbf {G}}{\partial x_1^4}(\mathbf {x}_i-\mathbf {x}_j). \end{aligned} \end{aligned}$$

Since the estimator $\widehat{\psi }_{4,0}$ is asymptotically unbiased and its variance tends to zero, the estimator is consistent. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koláček, J., Horová, I. Bandwidth matrix selectors for kernel regression. Comput Stat 32, 1027–1046 (2017). https://doi.org/10.1007/s00180-017-0709-3

Download citation

Received: 03 February 2016
Accepted: 05 January 2017
Published: 16 January 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s00180-017-0709-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bandwidth matrix selectors for kernel regression

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Estimator Selection: a New Method with Applications to Kernel Density Estimation

Maximum likelihood method for bandwidth selection in kernel conditional density estimate

On automatic kernel density estimate-based tests for goodness-of-fit

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs

Appendix: Proofs

Lemma 2

Proof

Proof

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now