The choice of smoothing parameter in nonparametric regression through Wild Bootstrap

https://doi.org/10.1016/j.csda.2003.12.007Get rights and content

Abstract

A bootstrap method to estimate the mean squared error and the smoothing parameter for the multidimensional regression local linear estimator is proposed. This method is based on resampling of the estimated residuals. It uses a bootstrap estimator of the mean squared error to select an asymptotically optimal bandwidth parameter. This is achieved by showing that the mean squared error and its bootstrap estimator are very closed. Thus, the smoothing parameter minimizing the mean squared error is asymptotically close to the smoothing parameter minimizing the bootstrap estimator of the mean squared error. The results are extended to the case in which the response variable contains missing observations.

Introduction

The bootstrap method was first introduced by Efron (1979). Since then, many works have been published in which bootstrap methodology has been developed in different contexts, including curve estimation and finite populations, the principal aim being the construction of confidence intervals, the approximation of critical points in hypothesis tests, etc.

The applications to regression models (of the nonparametric type) are of special importance. Assume a sample {(Xit,Yi)}i=1n of independent and identically distributed (i.i.d.) (d+1) dimensional observations of (Xt,Y) following the model,Y=m(X)+v1/2(X)ε=m(X)+η,in which m(x)=E[Y/X=x],x∈Rd, is the regression function, v(x)=Var[Y/X=x] is the conditional variance function, and ε is a random variable of mean zero and variance one; our aim is to make inferences over m.

When dealing with regression models such as type (1), when heteroscedasticity is present, the so-called “Wild Bootstrap” method is very useful. This model was introduced by Wu (1986) for the more particular context of linear regression models m(x)=mβ(x)=xtβ, where the superscript t means transpose; it was later given the above mentioned name in Härdle and Mammen (1993). For sample generation, the general procedure is as follows:

  • (a)

    The residuals η̂i=Yim̂0(Xi) are constructed, where m̂0 is a pilot estimation of m.

  • (b)

    Bootstrap residuals ηi,i=1,…,n are drawn, verifyingEi]=0,Ei∗2]=η̂i2,Ei∗3]=η̂i3,….

  • (c)

    The bootstrap sample {(Xit,Yi)}i=1n is constructed with Yi=m̂1(Xi)+ηi,i=1,…,n, where m̂1 is another pilot estimator; and

  • (d)

    The inference over m is obtained replicating the previous process B times.

Regarding the estimation of m, for example, in Cao-Abad (1991) this type of resampling is applied with the aim of approximating the distribution ofPY/X{(nh)1/2(m̂h(x)−m(x))⩽z},where Y/X denotes the conditional distribution to the values of covariables Xi,i=1,…,n, where m̂h is the Nadaraya–Watson estimator of m (see Nadaraya, 1964; Watson, 1964), d=1 and m̂1=m̂g, with g being a pilot bandwidth. In Härdle et al. (1995) and Neumann (1997) the wild bootstrap method is used to approximate the distribution of studentized versions associated with the estimation m̂h. Furthermore, in Neumann and Polzehl (1998), the study has been extended to the construction of confidence bands for the curve m.

When the objective is the contrast over m, for example contrasting H0:m∈{mθ}θ∈Rq, noteworthy applications have been developed by Härdle and Mammen (1993) and by Stute et al. (1998), in which m̂1 or m̂0 depend on mθ̂, which reflects the null hypothesis, see also the excellent study by Mammen (2000).

In the context of nonparametric regression, for example of the kernel type, one of the key points is the choice of bandwidth parameter (see among others an early paper by Vieu (1993)). Thus, given a nonparametric estimatorm̂H(x)=i=1nwi(x,H)Yi,x∈Rd,in which {wi(x,H)}i=1n is a succession of weights that depend on the smoothing parameter H (normally a matrix parameter); it is very important to estimate H in such a way that the conditional mean squared error (MSE) is minimized:MSE(x;H)=E[(m̂H(x)−m(x))2/X1,…,Xn]=EY/X[(m̂H(x)−m(x))2],or in general unconditional versions.

In recent years several proposals have been made concerning the choice of smoothing parameter, focussed on local linear estimators of type (3). That is, m̂H(x)=α̂, resulting fromminα,βi=1n{Yi−α−βt(Xi−x)}2KH(Xi−x),where KH(u)=|H|−1/2K(H−1/2u), H is a symmetrical and positively defined d×d matrix and K is a d-dimensional kernel function, that is K⩾0 and ∫K(u)du=1. Then the expression for this estimator ism̂H(x)=e1t(XxtWx,HXx)−1XxtWx,HY,whereXxt=1(X1−x)t1(Xn−x)t,Wx,H=diag(KH(Xi−x))i=1n,Y=(Y1,…,Yn)t and e1 is the (d+1)×1 dimensional vector with 1 in the first co-ordinate and zero in the rest. This estimator has been studied by Ruppert and Wand (1994). Among the family of proposed selectors for parameter H of estimator (4), the so-called “plug-in type” (see for instance Yang and Tschernig, 1999) or empirical-bias bandwidth of Ruppert (1997) are prominent. In this paper, we consider the bootstrap selector as an alternative mechanism consisting of choosing Ĥ as the estimator of H, minimizing:MSE(x;H,G)=E[(m̂H(x)−m̂G(x))2],where E denotes the expectation operator over the bootstrap distribution, m̂H is estimator (4) constructed with bootstrap sample {(Xit,Yi)}i=1n, which is obtained with (2) with smoothing parameter H and m̂0=m̂1=m̂G; the pilot estimation m̂G is an estimator of type (4) with pilot bandwidth G. Although earlier studies regarding the choice of smoothing parameter in the estimation of curves through bootstrap do exist (see for example Hall (1990), or Cao-Abad (1993) in density estimation; Saavedra and Cao-Abad (2001) for moving average processes, Delaigle and Gijbels (2004) in deconvolution kernel density estimation, etc.), little has been done with respect to the bootstrap selector in the regression context (see Mammen, 2000).

In this paper we study a bandwidth selector for (4) based on bootstrap mean squared error (5). In order to do this, we prove the consistency of the Wild bootstrap method for MSE using the so-called “imitation technique” (see Shao and Tu (1995) for more details). Furthermore, we offer some indications as to how to choose the smoothing parameter G used in the expression of the bootstrap MSE (5), which is here denoted the pilot bandwidth.

In order to complete this study, it is extended to the case in which the response variable contains missing variables. For this purpose, we present two possible nonparametric estimators of the regression function for this context, and propose an appropriate resampling method based on Wild bootstrap.

Section 2 presents theorems to demonstrate the asymptotic validity of the bootstrap MSE for complete and incomplete data, while Section 3 offers some ideas of how the pilot bandwidth should be selected for each case. The simulations are described in Section 4 and some conclusions are drawn in Section 5. Finally, Appendix A presents the proofs of the theorems given in previous sections.

Section snippets

Complete data case

In this section we see how the wild bootstrap resampling mechanism is consistent in relation to the mean squared error of the multidimensional local linear estimator (4). Therefore, it can be used as a selection mechanism of the bandwidth parameter.

The resampling mechanism is analogous to that specified in introduction (2) with the following options: in paragraph (a) the residuals are constructed as η̂i=Yim̂G(Xi), where m̂G(Xi) is the multidimensional local linear estimator (4) with pilot

Pilot bandwidth selection

Once the consistency of the bootstrap has been ensured, the next step is to see how the pilot bandwidth should be chosen. It is known that the choice of this parameter plays an important role in bootstrap behaviour, especially for small samples. As Härdle and Marron (1991) pointed out, the principal function of the pilot parameter is to provide a good fit of bias, and so the bias estimation is used as the error criteria.

Like these authors, we calculate the asymptotic mean squared error

Simulations

A brief simulation study was carried out to illustrate the behaviour for finite samples of the bootstrap estimation proposed for the MSE of the local linear estimator of the regression function (4). The study was carried out both for the case of complete data and for that of missing observations. On the one hand, our aim is to verify the validity of the bootstrap approach of the MSE, and on the other, to test the practical behaviour of the bootstrap bandwidth selector resulting from the

Conclusions

It is well-known that the behaviour of many nonparametric estimators of the regression function depends to a great degree on the selection of a bandwidth parameter. This is normally chosen following certain criteria of theoretical error, such as the mean squared error (MSE), the asymptotic mean squared error (AMSE) or its integrated versions MISE or AMISE, respectively.

The disadvantage of these selection methods lies in the fact that a priori unknown quantities appear in the expression of the

Acknowledgements

We are grateful to the referees and the Associate Editor for their constructive suggestions.

References (27)

  • W. Härdle et al.

    Comparing nonparametric versus parametric regression fits

    Ann. Statist

    (1993)
  • W. Härdle et al.

    Bootstrap simultaneous error bars for nonparametric regression

    Ann. Statist

    (1991)
  • R.J.A. Little et al.

    Statistical analysis with missing data. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics

    (2002)
  • Cited by (25)

    • Optimal time-varying tail risk network with a rolling window approach

      2021, Physica A: Statistical Mechanics and its Applications
    • Locally adaptive image denoising by a statistical multiresolution criterion

      2012, Computational Statistics and Data Analysis
      Citation Excerpt :

      However, Bissantz et al. (2006, 2008) essentially still apply a one-dimensional version of the multiresolution criterion to this end, whereas the two-dimensional application in Davies and Meise (2008) is described only briefly and somewhat rudimentarily. There exist entirely different approaches such as wavelet thresholding, see for example González Manteiga et al. (2004) for a data-driven, bootstrap-based method in a multivariate setting. The present paper therefore appears to be the first to give a comprehensive and detailed exposition of a two-dimensional multiresolution criterion, and also to propose a method for choosing a localized smoothing parameter in a purely data-driven and general fashion.

    • Data-driven local bandwidth selection for additive models with missing data

      2011, Applied Mathematics and Computation
      Citation Excerpt :

      Here we propose an extension of the method involving missing-data adjustments. The selector also extends the previous work in multivariate local linear estimation by González-Manteiga et al. [35]. In this paper the authors formulate the problem in the presence of missing data in the response variable.

    • Bootstrap in functional linear regression

      2011, Journal of Statistical Planning and Inference
    View all citing articles on Scopus
    1

    Research supported in part by MCyT Grant BFM2002-03213 (European FEDER support included) and the project PGIDIT03PXIC20702PN Dirección Xeral de I+D Xunta de Galicia.

    View full text