Abstract
We consider kernel methods to construct nonparametric estimators of a regression function based on incomplete data. To tackle the presence of incomplete covariates, we employ Horvitz–Thompson-type inverse weighting techniques, where the weights are the selection probabilities. The unknown selection probabilities are themselves estimated using (1) kernel regression, when the functional form of these probabilities are completely unknown, and (2) the least-squares method, when the selection probabilities belong to a known class of candidate functions. To assess the overall performance of the proposed estimators, we establish exponential upper bounds on the \(L_p\) norms, \(1\le p<\infty \), of our estimators; these bounds immediately yield various strong convergence results. We also apply our results to deal with the important problem of statistical classification with partially observed covariates.
Similar content being viewed by others
References
Bernstein S (1946) The theory of probabilities. Gastehizdat Publishing House, Moscow
Bravo F (2015) Semiparametric estimation with missing covariates. J Multivar Anal 139:329–346
Chen HY (2004) Nonparametric and semiparametric models for missing covariates in parametric regression. J Am Stat Assoc 99(468):1176–1189
Cheng PE, Chu CK (1996) Kernel estimation of distribution functions and quantiles with missing data. Stat Sin 6:63–78
Devroye L (1981) On the almost everywhere convergence of nonparametric regression function estimates. Ann Stat 9:1310–1319
Devroye L, Györfi L, Lugosi G (1985) Nonparametric density estimation: the L1 view. Wiley, New York
Devroye L, Krzyz̀ak A (1989) An equivalence theorem for \(L_1\) convergence of kernel regression estimate. J Stat Plan Inference 23:71–82
Devroye L, Wagner T (1980) On the \(L_1\) convergence of kernel estimators of regression functions with applications in discrimination. Z. Wahrsch. Verw. Gebiete 51:15–25
Efromovich S (2012) Nonparametric regression with predictors missing at random. J Am Stat Assoc 106:306–319
Faes C, Ormerod JT, Wand MP (2011) Variational Bayesian inference for parametric and nonparametric regression with missing data. J Am Stat Assoc 106(495):959–971
Guo X, Xu W, Zhu L (2014) Multi-index regression models with missing covariates at random. J Multivar Anal 123:345–363
Györfi L, Kohler M, Krzyz̀ak A, Walk H (2002) A distribution-free theory of nonparametric regression. Springer, New York
Hardle W, Marron J (1985) Optimal bandwidth selection in nonparametric regression function estimation. Ann Stat 13:1465–1481
Hirano KI, Ridder G (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71:1161–1189
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
Hu Y, Zhu Q, Tian M (2014) An efficient technique of multiple imputation in nonparametric quantile regression. J Math Stat 10:30–44
Ibrahim JG, Lipsitz SR, Chen MH (1999) Missing covariates in generalized linear models when the missing data mechanism is non-ignorable. J R Stat Soc Ser B (Statistical Methodology) 61(1):173–190
Kohler M, Krzyz̀ak A, Walk H (2003) Strong consistency of automatic kernel regression estimates. Ann. Inst. Stat. Math. 55:287–308
Liang H, Wang S, Robins J, Carroll R (2004) Estimation in partially linear models with missing covariates. J Am Stat Assoc 99(466):357–367
Lipsitz SR, Ibrahim JG (1996) A conditional model for incomplete covariates in parametric regression models. Biometrika 83(4):916–922
Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, New York
Meier L, van de Geer S, Bühlmann P (2009) High-dimensional additive modeling. Ann Stat 37:3779–3821
Mojirsheibani M (2007) Nonparametric curve estimation with missing data: a general empirical process approach. J Stat Plan Inference 137:2733–2758
Mojirsheibani M (2012) Some results on classifier selection with missing covariates. Metrika 75:521–539
Pollard D (1984) Convergence of stochastic processes. Springer, New York
Racine J, Hayfield T (2008) Nonparametric econometrics: the np package. J Stat Softw 27:1–32
Racine J, Li Q (2004) Nonparametric estimation of regression functions with both categorical and continuous data. J Econom 119:99–130
Robins JM, Rotnitzky A, Zhao LP (1994) Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 89(427):846–866
Sinha S, Saha KK, Wang S (2014) Semiparametric approach for non-monotone missing covariates in a parametric regression model. Biometrics 70(2):299–311
Spiegelman C, Sacks J (1980) Consistent window estimation in nonparametric regression. Ann Stat 8:240–246
van Der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes with applications to statistics. Springer, New York
Walk H (2002a) On cross-validation in kernel and partitioning regression estimation. Stat Probab Lett 59:113–123
Walk H (2002b) Almost sure convergence properties of Nadaraya–Watson regression estimates. In: Modeling uncertainty. International Series of Operational Research and Management Science, vol 46. Kluwer Academic Publishing, Boston
Wang L, Rotnitzky A, Lin X (2010) Nonparametric regression with missing outcomes using weighted kernel estimating equations. J Am Stat Assoc 105:1135–1146
Zhang Z, Rockette HE (2005) On maximum likelihood estimation in parametric regression with missing covariates. J Stat Plan Inference 134:206–223
Acknowledgments
This work is supported by the National Science Foundation Grant DMS-1407400 of Majid Mojirsheibani.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by the NSF Grant DMS-1407400 of Majid Mojirsheibani.
Rights and permissions
About this article
Cite this article
Reese, T., Mojirsheibani, M. On the \(L_p\) norms of kernel regression estimators for incomplete data with applications to classification. Stat Methods Appl 26, 81–112 (2017). https://doi.org/10.1007/s10260-016-0359-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-016-0359-6