Abstract
Choosing a bandwidth matrix belongs to the class of significant problems in multivariate kernel regression. The problem consists of the fact that a theoretical optimal bandwidth matrix depends on the unknown regression function which to be estimated. Thus data-driven methods should be applied. A method proposed here is based on a relation between asymptotic integrated square bias and asymptotic integrated variance. Statistical properties of this method are also treated. The last two sections are devoted to simulations and an application to real data.






Similar content being viewed by others
References
Aldershof B, Marron J, Park B, Wand M (1995) Facts about the gaussian probability density function. Appl Anal 59:289–306
Chacón JE, Duong T, Wand MP (2011) Asymptotics for general multivariate kernel density derivative estimators. Stat Sin 21(2):807–840
Chiu S (1990) Why bandwidth selectors tend to choose smaller bandwidths, and a remedy. Biometrika 77(1):222–226
Chiu S (1991) Some stabilized bandwidth selectors for nonparametric regression. Ann Stat 19(3):1528–1546
Craven P, Wahba G (1979) Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer Math 31(4):377–403
Droge B (1996) Some comments on cross-validation. Tech. Rep. 1994-7, Humboldt Universitaet Berlin. http://ideas.repec.org/p/wop/humbsf/1994-7.html
Duong T, Hazelton M (2005a) Convergence rates for unconstrained bandwidth matrix selectors in multivariate kernel density estimation. J Multivar Anal 93(2):417–433
Duong T, Hazelton M (2005b) Cross-validation bandwidth matrices for multivariate kernel density estimation. Scand J Stat 32(3):485–506
Fan J (1993) Local linear regression smoothers and their minimax efficiencies. Ann Stat 21(1):196–216
Gasser T, Müller HG (1979) Kernel estimation of regression functions. In: Gasser T, Rosenblatt M (eds) Smoothing techniques for curve estimation. Lecture Notes in Mathematics, vol 757. Springer, Berlin, pp 23–68
Härdle W (1990) Applied nonparametric regression, 1st edn. Cambridge University Press, Cambridge
Härdle W (2004) Nonparametric and semiparametric models. Springer, Berlin
Herrmann E, Engel J, Wand M, Gasser T (1995) A bandwidth selector for bivariate kernel regression. J R Stat Soc Ser B (Methodological) 57:171–180
Horová I, Koláček J, Vopatová K (2013) Full bandwidth matrix selectors for gradient kernel density estimate. Comput Stat Data Anal 57(1):364–376
Horová I, Zelinka J (2007) Contribution to the bandwidth choice for kernel density estimates. Comput Stat 22(1):31–47
Jones MC, Kappenman RF (1991) On a class of kernel density estimate bandwidth selectors. Scand J Stat 19(4):337–349
Jones MC, Marron JS, Park BU (1991) A simple root n bandwidth selector. Ann Stat 19(4):1919–1932
Koláček J (2005) Kernel estimation of the regression function (in Czech). Ph.D. thesis, Masaryk University, Brno
Koláček J (2008) Plug-in method for nonparametric regression. Comput Stat 23(1):63–78
Koláček J, Horová I (2016) Selection of bandwidth for kernel regression. Commun Stat Theory Methods 45(5):1487–1500
Khler M, Schindler A, Sperlich S (2014) A review and comparison of bandwidth selection methods for kernel regression. Int Stat Rev 82(2):243–274
Lafferty J, Wasserman L (2008) Rodeo: sparse, greedy nonparametric regression. Ann Stat 36:28–63
Lau G, Ooi PL, Phoon B (1998) Fatal falls from a height: the use of mathematical models to estimate the height of fall from the injuries sustained. Forensic Sci Int 93(1):33–44
Magnus JR, Neudecker H (1979) Commutation matrix: some properties and application. Ann Stat 7(2):381–394
Magnus JR, Neudecker H (1999) Matrix differential calculus with applications in statistics and econometrics, 2nd edn. Wiley, New York
Manteiga WG, Miranda MM, González AP (2004) The choice of smoothing parameter in nonparametric regression through wild bootstrap. Comput Stat Data Anal 47(3):487–515
Rice J (1984) Bandwidth choice for nonparametric regression. Ann Stat 12(4):1215–1230
Ruppert D (1997) Empirical-bias bandwidths for local polynomial nonparametric regression and density estimation. J Am Stat Assoc 92(439):1049–1062
Ruppert D, Wand MP (1994) Multivariate locally weighted least squares regression. Ann Stat 22:1346–1370
Seifert B, Gasser T (1996) Variance properties of local polynomials and ensuing modifications. In: Härdle W, Schimek M (eds) Statistical theory and computational aspects of smoothing, contributions to statistics. Physica, Heidelberg, pp 50–79
Silverman BW (1985) Some aspects of the spline smoothing approach to non-parametric regression curve fitting. J R Stat Soc Ser B (Methodological) 47:1–52
Simonoff JS (1996) Smoothing methods in statistics. Springer, New York
Staniswalis JG, Messer K, Finston DR (1993) Kernel estimators for multivariate regression. J Nonparametric Stat 3(2):103–121
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B Stat Methodol 36(2):111–147
Wand M, Jones M (1993) Comparison of smoothing parameterizations in bivariate kernel density-estimation. J Am Stat Assoc 88(422):520–528
Wand M, Jones M (1995) Kernel smoothing. Chapman and Hall, London
Wand MP, Jones MC (1994) Multivariate plug-in bandwidth selection. Comput Stat 9(2):97–116
Yang L, Tschernig R (1999) Multivariate bandwidth selection for local linear regression. J R Stat Soc Ser B (Stat Methodol) 61(4):793–815
Zhang X, Brooks RD, King ML (2009) A bayesian approach to bandwidth selection for multivariate kernel regression with an application to state-price density estimation. J Econom 153(1):21–32
Acknowledgements
This research was supported by Masaryk University, Project GAČR GA15-06991S.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proofs
Appendix: Proofs
As the first, we introduce some facts on matrix differential calculus and on the Gaussian density (see Magnus and Neudecker 1979, 1999; Aldershof et al. 1995).
Let \(\mathbf {A},\, \mathbf {B}\) be \(d\times d\) matrices:
- \(1^\circ \) :
-
\(\displaystyle {{\mathrm{tr}}}(\mathbf {A}^T \mathbf {B})=\mathrm {vec}^T \mathbf {A}\mathrm {vec}\mathbf {B}\)
- \(2^\circ \) :
-
\(\displaystyle \int \phi _{c\mathbf {I}}(\mathbf {z})\{{{\mathrm{tr}}}(\mathbf {H}^{1/2} D^2 \mathbf {H}^{1/2} \mathbf {z}\mathbf {z}^T)\}m(\mathbf {x})d\mathbf {z}=c\{{{\mathrm{tr}}}(\mathbf {H}D^2)\}m(\mathbf {x})\)
\(\displaystyle \int \phi _{c\mathbf {I}}(\mathbf {z})\{{{\mathrm{tr}}}^2(\mathbf {H}^{1/2} D^2 \mathbf {H}^{1/2}\mathbf {z}\mathbf {z}^T) \}m(\mathbf {x}) d\mathbf {z}=3c^2\{{{\mathrm{tr}}}^2(\mathbf {H}D^2) \}m(\mathbf {x})\)
\(\displaystyle \int \phi _{c\mathbf {I}}(\mathbf {z})\{{{\mathrm{tr}}}^k(\mathbf {H}^{1/2} D^2\mathbf {H}^{1/2} \mathbf {z}\mathbf {z}^T) {{\mathrm{tr}}}(\mathbf {H}^{1/2}D\mathbf {z}^T)\}m(\mathbf {x})d\mathbf {z}=\mathbf {0},\quad c,\,k\in \mathbb {N}_0\)
- \(3^\circ \) :
-
\(\displaystyle \int D^km(\mathbf {x})[D^km(\mathbf {x})]^Td\mathbf {x}= (-1)^k\int D^{2k}m(\mathbf {x})m(\mathbf {x})d\mathbf {x},\quad k\in \mathbb {N}\)
- \(4^\circ \) :
-
\(\displaystyle \varLambda (\mathbf {z})=\phi _{4\mathbf {I}}(\mathbf {z})-2\phi _{3\mathbf {I}}(\mathbf {z})+\phi _{2\mathbf {I}}(\mathbf {z})\),
then using \(3^\circ \) yields
\(\displaystyle \int \varLambda (\mathbf {z})d\mathbf {z}= 0\)
\(\displaystyle \int \varLambda (\mathbf {z})\{{{\mathrm{tr}}}(\mathbf {H}^{1/2} D^2\mathbf {H}^{1/2} \mathbf {z}\mathbf {z}^T)\}m(\mathbf {x})d\mathbf {z}= \mathbf {0}\)
\(\displaystyle \int \varLambda (\mathbf {z})\{{{\mathrm{tr}}}^2(\mathbf {H}^{1/2} D^2\mathbf {H}^{1/2} \mathbf {z}\mathbf {z}^T) \}m(\mathbf {x}) d\mathbf {z}=6\{{{\mathrm{tr}}}^2(\mathbf {H}D^2) \}m(\mathbf {x})\)
\(\displaystyle \int \varLambda (\mathbf {z})\{{{\mathrm{tr}}}^k(\mathbf {H}^{1/2} D^2\mathbf {H}^{1/2} \mathbf {z}\mathbf {z}^T) {{\mathrm{tr}}}(\mathbf {H}^{1/2}D\mathbf {z}^T)\}m(\mathbf {x})d\mathbf {z}=\mathbf {0},\quad k\in \mathbb {N}_0\)
- \(5^\circ \) :
-
Taylor expansion takes the form
$$\begin{aligned} \begin{aligned} m(\mathbf {x}-\mathbf {H}^{1/2}\mathbf {z})&= m(\mathbf {x})-\{\mathbf {z}^T\mathbf {H}^{1/2}D\}m(\mathbf {x})\\&\quad + \frac{1}{2!}\{(\mathbf {z}^T\mathbf {H}^{1/2}D)^2\}m(\mathbf {x})+\dots \\&\quad +\frac{(-1)^k}{k!}\{(\mathbf {z}^T\mathbf {H}^{1/2}D)^k\}m(\mathbf {x})\\&\quad + o(||\mathbf {H}^{1/2}\mathbf {z}||^k). \end{aligned} \end{aligned}$$
Lemma 2
The integrated square bias can be expressed as
the symbol \(*\) denotes convolution.
Proof
Each integral in the sum can be approximated in the following way
Thus
Further
Sketch of the proof of Theorem 1:
Proof
In order to show the \(\widehat{\varGamma }\) is the asymptotically unbiased estimator of \(\varGamma \), we evaluate \(E(\widehat{{{\mathrm{AISB}}}}(\mathbf {H}))\)
Taylor expansion, defined in \(5^{\circ }\), and using \(4^\circ \) yields
using properties \(3^\circ \) and \(4^\circ \) we arrive at
To finish the proof of Theorem 1 it is sufficient to derive \({{\mathrm{Var}}}(\widehat{{{\mathrm{AISB}}}}(\mathbf {H}))\)
Since the estimator \(\widehat{{{\mathrm{AISB}}}}(\mathbf {H})\) is asymptotically unbiased and its variance tends to zero the estimator is consistent.
Sketch of the proof of Lemma 1:
Proof
In order to show the \(\widehat{\psi }_{4,0}\) is the asymptotically unbiased estimator of \(\psi _{4,0}\), we evaluate \(E(\widehat{\psi }_{4,0})\)
Taylor expansion defined in \(5^{\circ }\) yields
To finish the proof of Lemma 1 it is sufficient to derive \({{\mathrm{Var}}}(\widehat{\psi }_{4,0})\)
Since the estimator \(\widehat{\psi }_{4,0}\) is asymptotically unbiased and its variance tends to zero, the estimator is consistent. \(\square \)
Rights and permissions
About this article
Cite this article
Koláček, J., Horová, I. Bandwidth matrix selectors for kernel regression. Comput Stat 32, 1027–1046 (2017). https://doi.org/10.1007/s00180-017-0709-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-017-0709-3