Skip to main content
Log in

Computation for intrinsic variable selection in normal regression models via expected-posterior prior

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

In this paper, we focus on the variable selection problem in normal regression models using the expected-posterior prior methodology. We provide a straightforward MCMC scheme for the derivation of the posterior distribution, as well as Monte Carlo estimates for the computation of the marginal likelihood and posterior model probabilities. Additionally, for large spaces, a model search algorithm based on \(\mathit{MC}^{3}\) is constructed. The proposed methodology is applied in two real life examples, already used in the relevant literature of objective variable selection. In both examples, uncertainty over different training samples is taken into consideration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Barbieri, M., Berger, J.: Optimal predictive model selection. Ann. Stat. 32, 870–897 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Berger, J., Molina, G.: Posterior model probabilities via path-based pairwise priors. Stat. Neerl. 59, 3–15 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Berger, J., Pericchi, L.: The intrinsic Bayes factor for model selection and prediction. J. Am. Stat. Assoc. 91, 109–122 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Casella, G., Girón, F., Martínez, M., Moreno, E.: Consistency of Bayesian procedures for variable selection. Ann. Stat. 37, 1207–1228 (2009)

    Article  MATH  Google Scholar 

  • Casella, G., Moreno, E.: Objective Bayesian variable selection. J. Am. Stat. Assoc. 101, 157–167 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Celeux, G., El Anbari, M., Marin, J.-M., Robert, C.P.: Regularization in regression: comparing Bayesian and frequentist methods in a poorly informative situation. Bayesian Anal. (forthcoming), arXiv:1010.0300

  • Clyde, M., Ghosh, J., Littman, M.: Bayesian adaptive sampling for variable selection and model averaging. J. Comput. Graph. Stat. 20, 80–101 (2011)

    Article  MathSciNet  Google Scholar 

  • Dellaportas, P., Forster, J., Ntzoufras, I.: Joint specification of model space and parameter space prior distributions. Statist. Sci. (2012, forthcoming). Currently available at http://www.stat-athens.aueb.gr/~jbn/papers/paper24.htm

  • Fernandez, C., Ley, E., Steel, M.: Benchmark priors for Bayesian model averaging. J. Econom. 100, 381–427 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Girón, F., Moreno, E., Martínez, M.: An objective Bayesian procedure for variable regression in regression. In: Balakrishnan, N., Castillo, E., Sarabia, J.M. (eds.) Advances on Distribution Theory, Order Statistics and Inference, pp. 393–408. Birkhäuser, Boston (2006)

    Google Scholar 

  • Kass, R., Raftery, A.: Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995)

    Article  MATH  Google Scholar 

  • Leng, C., Tran, M.N., Nott, D.: Bayesian adaptive lasso (2010), arXiv:1009.2300. Available at http://adsabs.harvard.edu/abs/2010arXiv1009.2300L

  • Liang, F., Paulo, R., Molina, G., Clyde, M., Berger, J.: Mixtures of g priors for Bayesian variable selection. J. Am. Stat. Assoc. 103, 410–423 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Madigan, D., York, J.: Bayesian graphical models for discrete data. Int. Stat. Rev. 63, 215–232 (1995)

    Article  MATH  Google Scholar 

  • Montgomery, D., Peck, E.: Introduction to Linear Regression Analysis. Wiley, New York (1982)

    MATH  Google Scholar 

  • Moreno, E., Girón, F.: Comparison of Bayesian objective procedures for variable selection in linear regression. Test 17, 472–490 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Ntzoufras, I.: Bayesian analysis of the normal regression model. In: Bocker, K. (ed.) Rethinking Risk Measurement and Reporting: Uncertainty, Bayesian Analysis and Expert Judgment: Volume I, pp. 69–106 (2010). ISBN-10:1-906348-40-5, ISBN-13:978-1-906348-40-3: Risk Books

    Google Scholar 

  • Pérez, J.: Development of expected posterior prior distribution for model comparisons. Ph.D. thesis, Department of Statistics, Purdue University, USA (1998)

  • Pérez, J., Berger, J.: Expected-posterior prior distributions for model selection. Biometrika 89, 491–511 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Stamey, T., Kabakin, J., McNeal, J., Johnstone, I., Freiha, F., Redwine, E., Yang, N.: Prostate-specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate II: radical prostatectomy treated patients. J. Urol. 16, 1076–1083 (1989)

    Google Scholar 

  • Zellner, A.: On assessing prior distributions and Bayesian regression analysis using g-prior distributions. In: Goel, P., Zellner, A. (eds.) Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, pp. 233–243. North-Holland, Amsterdam (1986)

    Google Scholar 

  • Zellner, A., Siow, A.: Posterior odds ratios for selected regression hypothesis (with discussion). In: Bernardo, J.M., DeGroot, M.H., Lindley, D.V., Smith, A.F.M. (eds.) Bayesian Statistics, vol. 1, pp. 585–606 & 618–647 (discussion). Oxford University Press, Oxford (1980).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Fouskakis.

Appendix

Appendix

Using efficient and optimized R code running under Windows on an i5-2430M processor at 2.40 GHz, we estimate that the clock time for performing the full enumeration search, with 1000 iterations for estimating the marginal likelihood and 100 different training sub-samples, for the Hald’s cement data would be approximately 100 s, while for the prostate cancer data would be approximately 3500 s (about 2.4 days). On the contrary the clock time for the full enumeration R code using the Zellner’s g prior was approximately 1 s for each illustration. The R programs are available upon request.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fouskakis, D., Ntzoufras, I. Computation for intrinsic variable selection in normal regression models via expected-posterior prior. Stat Comput 23, 491–499 (2013). https://doi.org/10.1007/s11222-012-9325-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-012-9325-9

Keywords

Navigation