Skip to main content
Log in

On robust cross-validation for nonparametric smoothing

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

An essential problem in nonparametric smoothing of noisy data is a proper choice of the bandwidth or window width, which depends on a smoothing parameter \(k\). One way to choose \(k\) based on the data is leave-one-out-cross-validation. The selection of the cross-validation criterion is similarly important as the choice of the smoother. Especially, when outliers are present, robust cross-validation criteria are needed. So far little is known about the behaviour of robust cross-validated smoothers in the presence of discontinuities in the regression function. We combine different smoothing procedures based on local constant fits with each of several cross-validation criteria. These combinations are compared in a simulation study under a broad variety of data situations with outliers and abrupt jumps. There is not a single overall best cross-validation criterion, but we find Boente-cross-validation to perform well in case of large percentages of outliers and the Tukey-criterion in case of data situations with jumps, even if the data are contaminated with outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Benhenni K, Ferraty F, Rachdi M, Vieu P (2007) Local smoothing regression functional data. Comput Stat 22:353–369

    Article  MathSciNet  Google Scholar 

  • Bernholt T, Fried R, Gather U, Wegener I (2006) Modified repeated median filters. Stat Comput 16:177–192

    Article  MathSciNet  Google Scholar 

  • Bianco A, Boente G (2006) Robust estimators under semi-parametric partly linear autoregression: asymptotic behaviour and bandwidth selection. J Time Ser Anal 28:274–306

    Article  MathSciNet  Google Scholar 

  • Boente G, Rodriguez D (2008) Robust bandwidth selection in semi-parametric partly linear regression models: Monte Carlo study and influential analysis. Comput Stat Data Anal 52:2808–2828

    Article  MathSciNet  MATH  Google Scholar 

  • Cantoni E, Ronchetti E (2001) Resistant selection of the smoothing parameter for smoothing splines. Stat Comput 11:141–146

    Article  MathSciNet  Google Scholar 

  • Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 74:829–836

    Article  MathSciNet  MATH  Google Scholar 

  • Cobb G (1978) The problem of the Nile: conditional solution to a change-point problem. Biometrika 65:243–251

    Article  MathSciNet  MATH  Google Scholar 

  • Davies PL, Fried R, Gather U (2004) Robust signal extraction for on-line monitoring data. J Stat Plan Inference 122:65–78

    Article  MathSciNet  MATH  Google Scholar 

  • Donoho DL, Huber PJ (1983) The notion of breakdown point. In: Bickel PJ, Doksum K, Hodges JL (eds) A Festschrift for Erich Lehmann. Wadsworth, Belmont, CA, pp 157–184

    Google Scholar 

  • Donoho DL, Johnstone IM (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81:425–455

    Article  MathSciNet  MATH  Google Scholar 

  • Fearnhead P, Clifford P (2003) On-line inference for hidden markov models via particle filters. J R Stat Soc Ser B 65:887–889

    Article  MathSciNet  MATH  Google Scholar 

  • Francisco-Fernandez M, Vilar-Fernandez JM (2005) Bandwidth selection for the local polynomial estimator under dependence: a simulation study. Comput Stat 20:539–558

    Article  MathSciNet  MATH  Google Scholar 

  • Fried R, Bernholt T, Gather U (2006) Repeated median and hybrid filters. Comput Stat Data Anal 50: 2313–2338

    Article  MathSciNet  MATH  Google Scholar 

  • Gather U, Schettlinger K, Fried R (2006) Online signal extraction by robust linear regression. Comput Stat 21:33–51

    Article  MathSciNet  MATH  Google Scholar 

  • Haerdle W (1984) How to determine the bandwidth of nonlinear smoothers in practice? In: Franke J, Haerdle W, Martin D (eds) Lecture notes in statistics, vol 26. Springer, Heidelberg, DE, pp 163–184

  • Haerdle W (2002) Applied nonparametric regression. Cambridge University Press, Edinburgh

    Google Scholar 

  • Haerdle W, Hall P, Marron JS (1988) How far are automatically chosen regression smoothing parameters from their optimum? J Am Stat Assoc 83:86–95

    MATH  Google Scholar 

  • Haerdle W, Marron JS (1985) Optimal bandwidth selection in nonparametric regression function estimation. Ann Stat 13:1465–1481

    Article  MATH  Google Scholar 

  • Heinonen P, Neuvo Y (1987) FIR-median hybrid filters. IEEE IEEE Trans Acoust Speech Signal Process 35:832–838

    Article  Google Scholar 

  • Hodges JL, Lehmann EL (1963) Estimates of location based on rank tests. Ann Math Stat 34:598–611

    Article  MathSciNet  MATH  Google Scholar 

  • Kerkyacharian K, Lepski O, Picard D (2001) Nonlinear estimation in anisotropic multi-index denoising. Probab Theory Relat Fields 121:137–170

    Article  MathSciNet  MATH  Google Scholar 

  • Lafferty J, Wasserman L (2008) Rodeo: sparse, greedy nonparametric regression. Ann Stat 36:28–63

    Article  MathSciNet  MATH  Google Scholar 

  • Lee JS, Cox DD (2010) Robust smoothing: smoothing parameter selection and applications to fluorescence spectroscopy. Comput Stat Data Anal 54:3131–3143

    Article  MathSciNet  Google Scholar 

  • Lee YH, Kassam SA (1985) Generalized median filters and related nonlinear filtering techniques. IEEE Trans Acoust Speech Signal Process 33:672–683

    Article  Google Scholar 

  • Leung DHY (2005) Cross-validation in nonparametric smoothing with outliers. Ann Stat 33:2291–2310

    Article  MATH  Google Scholar 

  • Leung DHY, Marriott FHC, Wu EKH (1993) Bandwidth selection in robust smoothing. J Nonparametric Stat 2:333–339

    Article  MathSciNet  MATH  Google Scholar 

  • Maechler M (1989) Parametric smoothing quality in nonparametric regression: shape control by penalizing inflection points. Phd thesis, no 8920, ETH Zuerich, Statistik, ETH-Zentrum, CH-8092 Zurich, Switzerland

  • Maronna RA, Martin RD, Yohai VJ (2006) Robust statistics. Wiley, Chichester

    Book  MATH  Google Scholar 

  • Maronna RA, Zamar RH (2002) Robust estimates of location and dispersion of high-dimensional datasets. Technometrics 44:307–317

    Article  MathSciNet  Google Scholar 

  • Mueller CH (2002) Robust estimators for estimating discontinuous functions. Metrika 55:99–109

    Article  MathSciNet  Google Scholar 

  • O Ruanaidh JJK, Fitzgerald WJ (1996) Numerical bayesian methods applied to signal processing. Springer, New York

    Book  MATH  Google Scholar 

  • R Development Core Team (2011) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0. http://www.R-project.org

  • Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79:871–880

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw PJ, Croux C (1993) Alternatives to the median absolute deviation. J Am Stat Assoc 88: 1273–1283

    Article  MathSciNet  MATH  Google Scholar 

  • Schmidt G, Mattern R, Schueler F (1981) Biomechanical investigation to determine physical and traumatological differentiation criteria for the maximum load capacity of head and vertebral column with and without protective helmet under effects of impact. EEC research program on biomechanics of impacts, final report phase III, Project 65, Institut fuer Rechtsmedizin, Universitaet Heidelberg, West Germany

  • Serneels S, Filzmoser P, Croux C, Van Espen PJ (2005) Robust continuum regression. Chemom Intell Lab Syst 76:197–204

    Article  Google Scholar 

  • Shibata R (1981) An optimal selection of regression variables. Biometrika 68:45–54

    Article  MathSciNet  MATH  Google Scholar 

  • Wang FT, Scott DW (1994) The \(L_1\) method for robust nonparametric regression. J Am Stat Assoc 89:65–76

    MathSciNet  MATH  Google Scholar 

  • Yang Y, Zheng Z (1992) Asymptotic properties for cross-validated nearest neighbor median estimators in nonparametric regression: the \(L_1\)-view. In: Jiang Z, Yan S, Cheng P, Wu R (eds) Probability and statistics. World Scientific, SG, pp 242–257

    Google Scholar 

  • Zhang X, Brooks RD, King ML (2009) A Bayesian approach to bandwidth selection for multivariate kernel regression with an application to state-price density estimation. J Econ 153:21–32

    MathSciNet  Google Scholar 

  • Zheng Z, Yang Y (1998) Cross-validation and median criterion. Stat Sin 8:907–921

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work has been supported in part by the Collaborative Research Center “Statistical modeling of nonlinear dynamic processes” (SFB 823) of the German Research Foundation (DFG). The helpful and stimulating comments of the referees and the associate editor is also acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oliver Morell.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Morell, O., Otto, D. & Fried, R. On robust cross-validation for nonparametric smoothing. Comput Stat 28, 1617–1637 (2013). https://doi.org/10.1007/s00180-012-0369-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-012-0369-2

Keywords

Navigation