Abstract
A discrete approximation to the Polya tree prior suitable for latent data is proposed that enjoys surprisingly simple and efficient conjugate updating. This approximation is illustrated in two applied contexts: the implementation of a nonparametric meta-analysis involving studies on the relationship between alcohol consumption and breast cancer, and random intercept Poisson regression for Ache armadillo hunting treks. The discrete approximation is then smoothed with Gaussian kernels to provide a smooth density for use with continuous data; the smoothed approximation is illustrated on a classic dataset on galaxy velocities and on recent data involving breast cancer survival in Louisiana.
Similar content being viewed by others
Change history
18 October 2016
An erratum to this article has been published.
References
Agresti, A.: Categorical Data Analysis. Wiley, New York (2002)
Aitchison, J., Shen, S.M.: Logistic-normal distributions: some properties and uses. Biometrika 67, 261–272 (1980)
Aitkin, M.: A general maximum likelihood analysis of variance components in generalized linear models. Biometrics 55, 117–128 (1999)
Branscum, A., Hanson, T.: Bayesian nonparametric meta-analysis using Polya tree mixture models. Biometrics 64, 825–833 (2008)
Buckley, J.J., James, I.R.: Linear regression with censored data. Biometrika 66, 429–436 (1979)
Burr, D., Doss, H.: A Bayesian semiparametric model for random-effects meta-analysis. J. Am. Stat. Assoc. 100, 242–251 (2005)
Canale, A., Dunson, D.: Bayesian kernel mixtures for counts. J. Am. Stat. Assoc. 106, 1528–1539 (2011)
Canale, A., Dunson, D.B.: Multiscale Bernstein polynomials for densities. Stat. Sin. (in press, 2016)
Chen, Y., Hanson, T., Zhang, J.: Accelerated hazards model based on parametric families generalized with Bernstein polynomials. Biometrics 70, 192–201 (2014)
Christensen, R., Johnson, W., Branscum, A., Hanson, T.: Bayesian Ideas and Data Analysis: An Introduction for Scientists and Statisticians. CRC Press, Boca Raton (2010)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1–38 (1977)
Draper, D.: Discussion of Bayesian nonparametric inference for random distributions and related functions. J. R. Stat. Soc. B 61, 510–513 (1999)
Escobar, M., West, M.: Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc. 90, 577–588 (1995)
Ferguson, T.S.: A Bayesian analysis of some nonparametric problems. Ann. Stat. 1, 209–230 (1973)
Ferguson, T.S.: Prior distributions on spaces of probability measures. Ann. Stat. 02, 615–629 (1974)
Follman, D.A., Lambert, D.: Generalizing logistic regression by nonparametric mixing. J. Am. Stat. Assoc. 84, 295–300 (1989)
Gamerman, D.: Sampling from the posterior distribution in generalized linear mixed models. Stat. Comput. 7, 57–68 (1997)
Gans, P., Gill, J.: Smoothing and differentiation of spectroscopic curves using spline functions. Appl. Spectrosc. 38, 370–376 (1984)
Geisser, S., Eddy, W.F.: A predictive approach to model selection. J. Am. Stat. Assoc. 74, 153–160 (1979)
Gelfand, A.E., Dey, D.K.: Bayesian model choice: asymptotics and exact calculations. J. R. Stat. Soc. B 56, 501–514 (1994)
Ghidey, W., Lesaffre, E., Eilers, P.: Smooth random effects distribution in a linear mixed model. Biometrics 60, 945–953 (2004)
Ghosh, J.K., Ramamoorthi, R.V.: Bayesian Nonparametrics. Springer, New York (2003)
Green, P.J.: Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732 (1995)
Hanson, T.: Inference for mixtures of finite Polya tree models. J. Am. Stat. Assoc. 101, 1548–1565 (2006)
Hanson, T., Jara, A.: Surviving fully Bayesian nonparametric regression models. In: Bayesian Theory and Applications, pp. 593–615. Oxford University Press, Oxford (2013)
Hanson, T., Johnson, W.: Modeling regression error with a mixture of Polya trees. J. Am. Stat. Assoc. 97, 1020–1033 (2002)
Harrell Jr., F.: rms: Regression Modeling Strategies. R Package Version 4.4-0 (2015)
Higdon, D.: Space and space–time modeling using process convolutions. In: Anderson, C., Barnett, V., Chatwin, P., El-Shaarawi, A. (eds.) Quantitative Methods for Current Environmental Issues, pp. 37–56. Springer, London (2002)
Hjort, N., Holmes, C., Müller, P., Walker, S.G. (eds.): Bayesian Nonparametrics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge (2010)
Ibrahim, J.G., Chen, M.H., Sinha, D.: Bayesian Survival Analysis. Springer, New York (2001)
Ishwaran, H., Zarepour, M.: Exact and approximate sum representations for the Dirichlet process. Can. J. Stat. 30, 269–283 (2002)
Jara, A., Hanson, T., Lesaffre, E.: Robustifying generalized linear mixed models using a new class of mixtures of multivariate Polya trees. J. Comput. Graph. Stat. 18, 838–860 (2009)
Jara, A., Hanson, T., Quintana, F., Müeller, P., Rosner, G.: DPpackage: Bayesian semi- and nonparametric modeling in R. J. Stat. Softw. 40, 1–30 (2011). http://www.jstatsoft.org/v40/i05/
Kleinman, K.P., Ibrahim, J.G.: A semi-parametric Bayesian approach to generalized linear mixed models. Stat. Med. 17, 2579–2596 (1998)
Komárek, A., Lesaffre, E.: Generalized linear mixed model with a penalized Gaussian mixture as a random-effects distribution. Comput. Stat. Data Anal. 52, 3441–3458 (2008)
Komárek, A., Lesaffre, E., Hilton, J.: Accelerated failure time model for arbitrarily censored data with smoothed error distribution. J. Comput. Graph. Stat. 14, 726–745 (2005)
Lavine, M.: Some aspects of Polya tree distributions for statistical modelling. Ann. Stat. 20, 1222–1235 (1992)
Lavine, M.: More aspects of Polya tree distributions for statistical modelling. Ann. Stat. 22, 1161–1176 (1994)
Longnecker, M.: Alcoholic beverage consumption in relation to risk of breast cancer: meta-analysis and review. Cancer Causes Control 5, 73–82 (1994)
Mauldin, R.D., Sudderth, W.D., Williams, S.C.: Polya trees and random distributions. Ann. Stat. 20, 1203–1221 (1992)
McMillan, G.: Ache residential grouping and social foraging. PhD Thesis, University of New Mexico (2001)
Mitra, R., Müller, P. (eds.): Nonparametric Bayesian Inference in Biostatistics. Frontiers in Probability and the Statistical Sciences. Springer, Cham (2015)
Müller, P., Quintana, F., Jara, A., Hanson, T.: Bayesian Nonparametric Data Analysis. Springer, Cham (2015)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2014)
Roeder, K.: Density estimation with confidence sets exemplified by superclusters and voids in galaxies. J. Am. Stat. Assoc. 85, 617–624 (1990)
Sargent, D.J., Hodges, J.S., Carlin, B.P.: Structured Markov chain Monte Carlo. J. Comput. Graph. Stat. 9, 217–234 (2000)
Sethuraman, J.: A constructive definition of Dirichlet priors. Stat. Sin. 4, 639–650 (1994)
Unser, M., Aldroubi, A., Eden, M.: On the asymptotic convergence of B-spline wavelets to Gabor functions. IEEE Trans. Inf. Theory 38, 864–872 (1992)
Wong, W.H., Ma, L.: Optional Polya tree and Bayesian inference. Ann. Stat. 38, 1433–1459 (2010)
Zhao, L., Hanson, T.: Spatially dependent Polya tree modeling for survival data. Biometrics 67, 391–403 (2011)
Author information
Authors and Affiliations
Corresponding author
Additional information
An erratum to this article is available at https://doi.org/10.1007/s11222-016-9686-6.
Rights and permissions
About this article
Cite this article
Cipolli, W., Hanson, T. Computationally tractable approximate and smoothed Polya trees. Stat Comput 27, 39–51 (2017). https://doi.org/10.1007/s11222-016-9652-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-016-9652-3