Abstract
An essential problem in nonparametric smoothing of noisy data is a proper choice of the bandwidth or window width, which depends on a smoothing parameter \(k\). One way to choose \(k\) based on the data is leave-one-out-cross-validation. The selection of the cross-validation criterion is similarly important as the choice of the smoother. Especially, when outliers are present, robust cross-validation criteria are needed. So far little is known about the behaviour of robust cross-validated smoothers in the presence of discontinuities in the regression function. We combine different smoothing procedures based on local constant fits with each of several cross-validation criteria. These combinations are compared in a simulation study under a broad variety of data situations with outliers and abrupt jumps. There is not a single overall best cross-validation criterion, but we find Boente-cross-validation to perform well in case of large percentages of outliers and the Tukey-criterion in case of data situations with jumps, even if the data are contaminated with outliers.
Similar content being viewed by others
References
Benhenni K, Ferraty F, Rachdi M, Vieu P (2007) Local smoothing regression functional data. Comput Stat 22:353–369
Bernholt T, Fried R, Gather U, Wegener I (2006) Modified repeated median filters. Stat Comput 16:177–192
Bianco A, Boente G (2006) Robust estimators under semi-parametric partly linear autoregression: asymptotic behaviour and bandwidth selection. J Time Ser Anal 28:274–306
Boente G, Rodriguez D (2008) Robust bandwidth selection in semi-parametric partly linear regression models: Monte Carlo study and influential analysis. Comput Stat Data Anal 52:2808–2828
Cantoni E, Ronchetti E (2001) Resistant selection of the smoothing parameter for smoothing splines. Stat Comput 11:141–146
Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 74:829–836
Cobb G (1978) The problem of the Nile: conditional solution to a change-point problem. Biometrika 65:243–251
Davies PL, Fried R, Gather U (2004) Robust signal extraction for on-line monitoring data. J Stat Plan Inference 122:65–78
Donoho DL, Huber PJ (1983) The notion of breakdown point. In: Bickel PJ, Doksum K, Hodges JL (eds) A Festschrift for Erich Lehmann. Wadsworth, Belmont, CA, pp 157–184
Donoho DL, Johnstone IM (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81:425–455
Fearnhead P, Clifford P (2003) On-line inference for hidden markov models via particle filters. J R Stat Soc Ser B 65:887–889
Francisco-Fernandez M, Vilar-Fernandez JM (2005) Bandwidth selection for the local polynomial estimator under dependence: a simulation study. Comput Stat 20:539–558
Fried R, Bernholt T, Gather U (2006) Repeated median and hybrid filters. Comput Stat Data Anal 50: 2313–2338
Gather U, Schettlinger K, Fried R (2006) Online signal extraction by robust linear regression. Comput Stat 21:33–51
Haerdle W (1984) How to determine the bandwidth of nonlinear smoothers in practice? In: Franke J, Haerdle W, Martin D (eds) Lecture notes in statistics, vol 26. Springer, Heidelberg, DE, pp 163–184
Haerdle W (2002) Applied nonparametric regression. Cambridge University Press, Edinburgh
Haerdle W, Hall P, Marron JS (1988) How far are automatically chosen regression smoothing parameters from their optimum? J Am Stat Assoc 83:86–95
Haerdle W, Marron JS (1985) Optimal bandwidth selection in nonparametric regression function estimation. Ann Stat 13:1465–1481
Heinonen P, Neuvo Y (1987) FIR-median hybrid filters. IEEE IEEE Trans Acoust Speech Signal Process 35:832–838
Hodges JL, Lehmann EL (1963) Estimates of location based on rank tests. Ann Math Stat 34:598–611
Kerkyacharian K, Lepski O, Picard D (2001) Nonlinear estimation in anisotropic multi-index denoising. Probab Theory Relat Fields 121:137–170
Lafferty J, Wasserman L (2008) Rodeo: sparse, greedy nonparametric regression. Ann Stat 36:28–63
Lee JS, Cox DD (2010) Robust smoothing: smoothing parameter selection and applications to fluorescence spectroscopy. Comput Stat Data Anal 54:3131–3143
Lee YH, Kassam SA (1985) Generalized median filters and related nonlinear filtering techniques. IEEE Trans Acoust Speech Signal Process 33:672–683
Leung DHY (2005) Cross-validation in nonparametric smoothing with outliers. Ann Stat 33:2291–2310
Leung DHY, Marriott FHC, Wu EKH (1993) Bandwidth selection in robust smoothing. J Nonparametric Stat 2:333–339
Maechler M (1989) Parametric smoothing quality in nonparametric regression: shape control by penalizing inflection points. Phd thesis, no 8920, ETH Zuerich, Statistik, ETH-Zentrum, CH-8092 Zurich, Switzerland
Maronna RA, Martin RD, Yohai VJ (2006) Robust statistics. Wiley, Chichester
Maronna RA, Zamar RH (2002) Robust estimates of location and dispersion of high-dimensional datasets. Technometrics 44:307–317
Mueller CH (2002) Robust estimators for estimating discontinuous functions. Metrika 55:99–109
O Ruanaidh JJK, Fitzgerald WJ (1996) Numerical bayesian methods applied to signal processing. Springer, New York
R Development Core Team (2011) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0. http://www.R-project.org
Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79:871–880
Rousseeuw PJ, Croux C (1993) Alternatives to the median absolute deviation. J Am Stat Assoc 88: 1273–1283
Schmidt G, Mattern R, Schueler F (1981) Biomechanical investigation to determine physical and traumatological differentiation criteria for the maximum load capacity of head and vertebral column with and without protective helmet under effects of impact. EEC research program on biomechanics of impacts, final report phase III, Project 65, Institut fuer Rechtsmedizin, Universitaet Heidelberg, West Germany
Serneels S, Filzmoser P, Croux C, Van Espen PJ (2005) Robust continuum regression. Chemom Intell Lab Syst 76:197–204
Shibata R (1981) An optimal selection of regression variables. Biometrika 68:45–54
Wang FT, Scott DW (1994) The \(L_1\) method for robust nonparametric regression. J Am Stat Assoc 89:65–76
Yang Y, Zheng Z (1992) Asymptotic properties for cross-validated nearest neighbor median estimators in nonparametric regression: the \(L_1\)-view. In: Jiang Z, Yan S, Cheng P, Wu R (eds) Probability and statistics. World Scientific, SG, pp 242–257
Zhang X, Brooks RD, King ML (2009) A Bayesian approach to bandwidth selection for multivariate kernel regression with an application to state-price density estimation. J Econ 153:21–32
Zheng Z, Yang Y (1998) Cross-validation and median criterion. Stat Sin 8:907–921
Acknowledgments
This work has been supported in part by the Collaborative Research Center “Statistical modeling of nonlinear dynamic processes” (SFB 823) of the German Research Foundation (DFG). The helpful and stimulating comments of the referees and the associate editor is also acknowledged.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Morell, O., Otto, D. & Fried, R. On robust cross-validation for nonparametric smoothing. Comput Stat 28, 1617–1637 (2013). https://doi.org/10.1007/s00180-012-0369-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-012-0369-2