Abstract
Parametric regression, such as linear regression, plays an important role in statistics. The use of parametric regression models typically involves the specification of a regression function of the covariates, the distribution of response and the link between the response and covariates, which are commonly at the risk of misspecification. In this paper, we introduce a fully nonparametric regression model, a Polya tree (PT)-based nearest neighborhood regression. To approximate the true conditional probability measure of the response given the covariate value, we construct a PT-distributed probability measure of the response in the nearest neighborhood of the covariate value of interest. Our proposed method gives consistent and robust estimators, and has a faster convergence rate than the kernel density estimation. We conduct extensive simulation studies and analyze a Combined Cycle Power Plant dataset to compare the performance of our method relative to kernel density estimation, PT density estimation, and linear dependent tail-free process (LDTFP). The studies suggest that the proposed method exhibits the superiority to the kernel and PT density estimation methods in terms of the estimation accuracy and convergence rate and to LDTFP in terms of robustness.
Similar content being viewed by others
References
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: International Conference on Database Theory Theory, pp. 217–235. Springer
Chung, Y., Dunson, D.B.: The local dirichlet process. Ann. Inst. Stat. Math. 63(1), 59–80 (2011)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
De Iorio, M., Müller, P., Rosner, G.L., MacEachern, S.N.: An ANOVA model for dependent random measures. J. Am. Stat. Assoc. 99(465), 205–215 (2004)
Dunson, D.B., Pillai, N., Park, J.H.: Bayesian density regression. J. R. Stat. Soc.: Ser. B 69, 163–183 (2007)
Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 70(5), 849–911 (2008)
Griffin, J.E., Steel, M.F.J.: Order-based dependent Dirichlet processes. J. Am. Stat. Assoc. 101, 179–94 (2006)
Hanson, T., Zhou, H., de Carvalho, V. I.: Bayesian nonparametric spatially smoothed density estimation. In: New Frontiers of Biostatistics and Bioinformatics, pp. 87–105. Springer (2018)
Jara, A., Hanson, T.E.: A class of mixtures of dependent tail-free processes. Biometrika 98(3), 553–566 (2011)
Lavine, M., et al.: More aspects of Polya tree distributions for statistical modelling. Ann. Stat. 22(3), 1161–1176 (1994)
MacEachern, S.N.: Dependent nonparametric processes. In: ASA Proceedings of the Section on Bayesian Statistical Science, Vol. 1, pp. 50–55. American Statistical Association (1999)
Ruppert, D., Wand, M.P.: Multivariate locally weighted least squares regression. Ann. Stat. 22(3), 1346–1370 (1994)
Sethuraman, J.: A constructive definition of Dirichlet priors. Stat. Sin. 4, 639–650 (1994)
Silverman, B.W.: Density Estimation for Statistics and Data Analysis, vol. 26. CRC Press London, Chapman and Hall (1986)
Trippa, L., Müller, P., Johnson, W.: The multivariate Beta process and an extension of the Polya tree model. Biometrika 98(1), 17–34 (2011)
Walker, S.G., Mallick, B.K.: Hierarchical generalized linear models and frailty models with Bayesian nonparametric mixing. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 59(4), 845–860 (1997)
Acknowledgements
This research is supported by the Natural Science and Engineering Research Council of Canada (NSERC; Grant ID: 299493 (YY) and 1280961 (LD)). Yi is a Canada Research Chair in Data Science (Tier 1). Her research was undertaken, in part, thanks to funding from the Canada Research Chairs Program.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhuang, H., Diao, L. & Yi, G. Polya tree-based nearest neighborhood regression. Stat Comput 32, 59 (2022). https://doi.org/10.1007/s11222-021-10076-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-021-10076-w