Abstract
This paper examines prior choice in probit regression through a predictive cross-validation criterion. In particular, we focus on situations where the number of potential covariates is far larger than the number of observations, such as in gene expression data. Cross-validation avoids the tendency of such models to fit perfectly. We choose the scale parameter c in the standard variable selection prior as the minimizer of the log predictive score. Naive evaluation of the log predictive score requires substantial computational effort, and we investigate computationally cheaper methods using importance sampling. We find that K-fold importance densities perform best, in combination with either mixing over different values of c or with integrating over c through an auxiliary distribution.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96, 6745–6750 (1999)
Brown, P.J., Vannucci, M.: Multivariate Bayesian variable selection and prediction. J. R. Stat. Soc. 60(3), 627–641 (1998)
Celeux, G., Marin, J.-M., Robert, C.P.: Sélection bayésienne de variables en régression linéaire. J. Soc. Fr. Stat. 147, 59–79 (2006)
Cui, W., George, E.I.: Empirical Bayes vs. fully Bayes variable selection. J. Stat. Plan. Inference 138, 888–900 (2008)
Denison, D.G.T., Holmes, C.C., Mallick, B.K., Smith, A.F.M.: Bayesian Methods for Nonlinear Classification and Regression. Wiley, New York (2002)
Dobra, A.: Variable selection and dependency networks for genomewide data. Biostatistics 10, 621–639 (2009)
Fernández, C., Ley, E., Steel, M.F.J.: Benchmark priors for Bayesian model averaging. J. Econom. 100, 381–427 (2001)
Geisser, S., Eddy, W.F.: A predictive approach to model selection. J. Am. Stat. Assoc. 74, 153–160 (1979)
Gelfand, A.E., Dey, D.K.: Bayesian model choice: asymptotics and exact calculations. J. R. Stat. Soc., Ser. B 56, 501–514 (1994)
Gelfand, A.E., Dey, D.K., Chang, H.: Model determination using predictive distributions with implementation via sampling-based methods. Bayesian Stat. 4, 147–167 (1992)
George, E.I., Foster, D.P.: Calibration and empirical Bayes variable selection. Biometrika 87(4), 731–747 (2000)
Geyer, C.J.: Estimating normalizing constants and reweighting mixtures in MCMC. Technical Report 568, University of Minnesota, School of Statistics (1994)
Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction and estimation. J. Am. Stat. Assoc. 102, 359–378 (2007)
Good, I.J.: Rational decisions. J. R. Stat. Soc., Ser. B 14(1), 107–114 (1952)
Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. Springer Series in Statistics. Springer, New York (2001)
Holmes, C.C., Held, L.: Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal. 1(1), 145–168 (2006)
Key, J., Pericchi, L., Smith, A.F.M.: Bayesian model choice: what and why? In: Bernardo, J., Berger, J.O., Dawid, A.P., Smith, A.F.M. (eds.) Bayesian Statistics, vol. 6, pp. 343–370. Oxford University Press, Oxford (1999)
Lee, K.E., Sha, N., Dougherty, E.R., Vannucci, M., Mallick, B.: Gene selection: A Bayesian variable selection approach. Bioinformatics 19, 90–97 (2003)
Liang, F., Paulo, R., Molina, G., Clyde, M.A., Berger, J.O.: Mixture of g-priors for Bayesian variable selection. J. Am. Stat. Assoc. 103, 410–423 (2008)
Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2001)
Owen, A., Zhou, Y.: Safe and effective importance sampling. J. Am. Stat. Assoc. 95, 135–143 (2000)
Robert, C.P., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer, New York (2004)
Scott, J.G., Berger, J.O.: An exploration of aspects of Bayesian multiple testing. J. Stat. Plan. Inference 136, 2144–2162 (2006)
Sha, N., Vannucci, M., Brown, P.J., Trower, M.K., Amphlett, G., Falciani, F.: Gene selection in arthritis classification with large-scale microarray expression profiles. Comp. Funct. Genomics 4, 171–181 (2003)
Sha, N., Vannucci, M., Tadesse, M.G., Brown, P.J., Dragoni, I., Davies, N., Roberts, T.C., Contestabile, A., Salmon, M., Buckley, C., Falciani, F.: Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage. Biometrics 60, 812–819 (2004)
Shafer, G.: Lindley’s paradox. J. Am. Stat. Assoc. 77, 325–351 (1982)
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)
Strimenopoulou, F., Brown, P.J.: Empirical Bayes logistic regression. Stat. Appl. Genet. Mol. Biol. 7, 9 (2008)
Veach, E., Guibas, L.: Optimally combining sampling techniques for Monte Carlo rendering. In: SIGGRAPH’95 Conference Proceedings, pp. 419–428. Addison–Wesley, Reading (1995)
Ventura, V.: Non-parametric bootstrap recycling. Stat. Comput. 12, 261–273 (2002)
Zhou, X., Liu, K.-Y., Wong, S.T.C.: Cancer classification and prediction using logistic regression with Bayesian gene selection. J. Biomed. Inform. 37(4), 249–259 (2004)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lamnisos, D., Griffin, J.E. & Steel, M.F.J. Cross-validation prior choice in Bayesian probit regression with many covariates. Stat Comput 22, 359–373 (2012). https://doi.org/10.1007/s11222-011-9228-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-011-9228-1