Skip to main content
Log in

Polya tree-based nearest neighborhood regression

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Parametric regression, such as linear regression, plays an important role in statistics. The use of parametric regression models typically involves the specification of a regression function of the covariates, the distribution of response and the link between the response and covariates, which are commonly at the risk of misspecification. In this paper, we introduce a fully nonparametric regression model, a Polya tree (PT)-based nearest neighborhood regression. To approximate the true conditional probability measure of the response given the covariate value, we construct a PT-distributed probability measure of the response in the nearest neighborhood of the covariate value of interest. Our proposed method gives consistent and robust estimators, and has a faster convergence rate than the kernel density estimation. We conduct extensive simulation studies and analyze a Combined Cycle Power Plant dataset to compare the performance of our method relative to kernel density estimation, PT density estimation, and linear dependent tail-free process (LDTFP). The studies suggest that the proposed method exhibits the superiority to the kernel and PT density estimation methods in terms of the estimation accuracy and convergence rate and to LDTFP in terms of robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: International Conference on Database Theory Theory, pp. 217–235. Springer

  • Chung, Y., Dunson, D.B.: The local dirichlet process. Ann. Inst. Stat. Math. 63(1), 59–80 (2011)

    Article  MathSciNet  Google Scholar 

  • Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Article  Google Scholar 

  • De Iorio, M., Müller, P., Rosner, G.L., MacEachern, S.N.: An ANOVA model for dependent random measures. J. Am. Stat. Assoc. 99(465), 205–215 (2004)

    Article  MathSciNet  Google Scholar 

  • Dunson, D.B., Pillai, N., Park, J.H.: Bayesian density regression. J. R. Stat. Soc.: Ser. B 69, 163–183 (2007)

    Article  MathSciNet  Google Scholar 

  • Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 70(5), 849–911 (2008)

    Article  MathSciNet  Google Scholar 

  • Griffin, J.E., Steel, M.F.J.: Order-based dependent Dirichlet processes. J. Am. Stat. Assoc. 101, 179–94 (2006)

    Article  MathSciNet  Google Scholar 

  • Hanson, T., Zhou, H., de Carvalho, V. I.: Bayesian nonparametric spatially smoothed density estimation. In: New Frontiers of Biostatistics and Bioinformatics, pp. 87–105. Springer (2018)

  • Jara, A., Hanson, T.E.: A class of mixtures of dependent tail-free processes. Biometrika 98(3), 553–566 (2011)

    Article  MathSciNet  Google Scholar 

  • Lavine, M., et al.: More aspects of Polya tree distributions for statistical modelling. Ann. Stat. 22(3), 1161–1176 (1994)

    Article  Google Scholar 

  • MacEachern, S.N.: Dependent nonparametric processes. In: ASA Proceedings of the Section on Bayesian Statistical Science, Vol. 1, pp. 50–55. American Statistical Association (1999)

  • Ruppert, D., Wand, M.P.: Multivariate locally weighted least squares regression. Ann. Stat. 22(3), 1346–1370 (1994)

    Article  MathSciNet  Google Scholar 

  • Sethuraman, J.: A constructive definition of Dirichlet priors. Stat. Sin. 4, 639–650 (1994)

    MathSciNet  MATH  Google Scholar 

  • Silverman, B.W.: Density Estimation for Statistics and Data Analysis, vol. 26. CRC Press London, Chapman and Hall (1986)

  • Trippa, L., Müller, P., Johnson, W.: The multivariate Beta process and an extension of the Polya tree model. Biometrika 98(1), 17–34 (2011)

    Article  MathSciNet  Google Scholar 

  • Walker, S.G., Mallick, B.K.: Hierarchical generalized linear models and frailty models with Bayesian nonparametric mixing. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 59(4), 845–860 (1997)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research is supported by the Natural Science and Engineering Research Council of Canada (NSERC; Grant ID: 299493 (YY) and 1280961 (LD)). Yi is a Canada Research Chair in Data Science (Tier 1). Her research was undertaken, in part, thanks to funding from the Canada Research Chairs Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haoxin Zhuang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 14018 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhuang, H., Diao, L. & Yi, G. Polya tree-based nearest neighborhood regression. Stat Comput 32, 59 (2022). https://doi.org/10.1007/s11222-021-10076-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-021-10076-w

Keywords

Navigation