Abstract
In this paper, we focus on the variable selection problem in normal regression models using the expected-posterior prior methodology. We provide a straightforward MCMC scheme for the derivation of the posterior distribution, as well as Monte Carlo estimates for the computation of the marginal likelihood and posterior model probabilities. Additionally, for large spaces, a model search algorithm based on \(\mathit{MC}^{3}\) is constructed. The proposed methodology is applied in two real life examples, already used in the relevant literature of objective variable selection. In both examples, uncertainty over different training samples is taken into consideration.
Similar content being viewed by others
References
Barbieri, M., Berger, J.: Optimal predictive model selection. Ann. Stat. 32, 870–897 (2004)
Berger, J., Molina, G.: Posterior model probabilities via path-based pairwise priors. Stat. Neerl. 59, 3–15 (2005)
Berger, J., Pericchi, L.: The intrinsic Bayes factor for model selection and prediction. J. Am. Stat. Assoc. 91, 109–122 (1996)
Casella, G., Girón, F., Martínez, M., Moreno, E.: Consistency of Bayesian procedures for variable selection. Ann. Stat. 37, 1207–1228 (2009)
Casella, G., Moreno, E.: Objective Bayesian variable selection. J. Am. Stat. Assoc. 101, 157–167 (2006)
Celeux, G., El Anbari, M., Marin, J.-M., Robert, C.P.: Regularization in regression: comparing Bayesian and frequentist methods in a poorly informative situation. Bayesian Anal. (forthcoming), arXiv:1010.0300
Clyde, M., Ghosh, J., Littman, M.: Bayesian adaptive sampling for variable selection and model averaging. J. Comput. Graph. Stat. 20, 80–101 (2011)
Dellaportas, P., Forster, J., Ntzoufras, I.: Joint specification of model space and parameter space prior distributions. Statist. Sci. (2012, forthcoming). Currently available at http://www.stat-athens.aueb.gr/~jbn/papers/paper24.htm
Fernandez, C., Ley, E., Steel, M.: Benchmark priors for Bayesian model averaging. J. Econom. 100, 381–427 (2001)
Girón, F., Moreno, E., Martínez, M.: An objective Bayesian procedure for variable regression in regression. In: Balakrishnan, N., Castillo, E., Sarabia, J.M. (eds.) Advances on Distribution Theory, Order Statistics and Inference, pp. 393–408. Birkhäuser, Boston (2006)
Kass, R., Raftery, A.: Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995)
Leng, C., Tran, M.N., Nott, D.: Bayesian adaptive lasso (2010), arXiv:1009.2300. Available at http://adsabs.harvard.edu/abs/2010arXiv1009.2300L
Liang, F., Paulo, R., Molina, G., Clyde, M., Berger, J.: Mixtures of g priors for Bayesian variable selection. J. Am. Stat. Assoc. 103, 410–423 (2008)
Madigan, D., York, J.: Bayesian graphical models for discrete data. Int. Stat. Rev. 63, 215–232 (1995)
Montgomery, D., Peck, E.: Introduction to Linear Regression Analysis. Wiley, New York (1982)
Moreno, E., Girón, F.: Comparison of Bayesian objective procedures for variable selection in linear regression. Test 17, 472–490 (2008)
Ntzoufras, I.: Bayesian analysis of the normal regression model. In: Bocker, K. (ed.) Rethinking Risk Measurement and Reporting: Uncertainty, Bayesian Analysis and Expert Judgment: Volume I, pp. 69–106 (2010). ISBN-10:1-906348-40-5, ISBN-13:978-1-906348-40-3: Risk Books
Pérez, J.: Development of expected posterior prior distribution for model comparisons. Ph.D. thesis, Department of Statistics, Purdue University, USA (1998)
Pérez, J., Berger, J.: Expected-posterior prior distributions for model selection. Biometrika 89, 491–511 (2002)
Stamey, T., Kabakin, J., McNeal, J., Johnstone, I., Freiha, F., Redwine, E., Yang, N.: Prostate-specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate II: radical prostatectomy treated patients. J. Urol. 16, 1076–1083 (1989)
Zellner, A.: On assessing prior distributions and Bayesian regression analysis using g-prior distributions. In: Goel, P., Zellner, A. (eds.) Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, pp. 233–243. North-Holland, Amsterdam (1986)
Zellner, A., Siow, A.: Posterior odds ratios for selected regression hypothesis (with discussion). In: Bernardo, J.M., DeGroot, M.H., Lindley, D.V., Smith, A.F.M. (eds.) Bayesian Statistics, vol. 1, pp. 585–606 & 618–647 (discussion). Oxford University Press, Oxford (1980).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Using efficient and optimized R code running under Windows on an i5-2430M processor at 2.40 GHz, we estimate that the clock time for performing the full enumeration search, with 1000 iterations for estimating the marginal likelihood and 100 different training sub-samples, for the Hald’s cement data would be approximately 100 s, while for the prostate cancer data would be approximately 3500 s (about 2.4 days). On the contrary the clock time for the full enumeration R code using the Zellner’s g prior was approximately 1 s for each illustration. The R programs are available upon request.
Rights and permissions
About this article
Cite this article
Fouskakis, D., Ntzoufras, I. Computation for intrinsic variable selection in normal regression models via expected-posterior prior. Stat Comput 23, 491–499 (2013). https://doi.org/10.1007/s11222-012-9325-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-012-9325-9