Abstract
Bayesian paradigm has been widely acknowledged as a coherent approach to learning putative probability model structures from a finite class of candidate models. Bayesian learning is based on measuring the predictive ability of a model in terms of the corresponding marginal data distribution, which equals the expectation of the likelihood with respect to a prior distribution for model parameters. The main controversy related to this learning method stems from the necessity of specifying proper prior distributions for all unknown parameters of a model, which ensures a complete determination of the marginal data distribution. Even for commonly used models, subjective priors may be difficult to specify precisely, and therefore, several automated learning procedures have been suggested in the literature. Here we introduce a novel Bayesian learning method based on the predictive entropy of a probability model, that can combine both subjective and objective probabilistic assessment of uncertain quantities in putative models. It is shown that our approach can avoid some of the limitations of the earlier suggested objective Bayesian methods.
Similar content being viewed by others
References
Aitkin, M., 1991, “Posterior Bayes factors,” J. Roy. Statist. Soc. B53, 111–142 (with discussion).
Akaike, H., 1974, “A new look at the statistical model identification,” IEEE Trans. Autom. Control 19, 716–723.
Akaike, H., 1978, “A new look at the Bayes procedure,” Biometrika 65, 53–59.
Akaike, H., 1979, “A Bayesian extension of the minimum AIC procedure of autoregressive model fitting,” Biometrika 66, 237–242.
Bayarri, M. J. and Berger, J., 1998, “Robust Bayesian analysis of selection models,” Ann. Statist. 26, 645–659.
Berger, J.O. and Pericchi, L.R., 1996, “The intrinsic Bayes factor for model selection and prediction,” J. Amer. Stat. Assoc. 91, 109–122.
Berger, J.O. and Bernardo, J.M., 1989, “Estimating a product of means: Bayesian analysis with reference priors,” J. Amer. Stat. Assoc. 84, 200–207.
Berger, J.O. and Bernardo, J.M., 1992, “On the development of reference priors,” in J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith (eds.), Bayesian Statistics 4, Oxford: Oxford University Press, pp. 35–60 (with discussion).
Berger, J.O. and Mortera, J., 1999, “Default Bayes factors for nonnested hypothesis testing,” J. Amer. Stat. Assoc. 94, 542–554.
Bernardo, J.M., 1979, “Reference posterior distributions for Bayesian inference,” J. Roy. Statist. Soc. B 41, 113–147 (with discussion).
Bernardo, J.M., 1999, “Nested hypothesis testing: The Bayesian reference criterion,” in J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (eds.), Bayesian Statistics 6, Oxford: Oxford University Press, pp. 101–130 (with discussion).
Bernardo, J.M. and Rueda, R., 2002, “Bayesian hypothesis testing: A reference approach,” Int. Stat. Review 70, 351–372.
Bernardo, J.M. and Smith, A.F.M., 1994, Bayesian Theory, Chichester: Wiley.
Corander, J., 2003a, “Bayesian graphical model determination using decision theory,” J. Multiv. Analysis 85, 253–266.
Corander, J., 2003b, “Labeled graphical models,” Scand. J. Stat. 30, 493–508.
Corander, J., Gyllenberg, M., and Koski, T., 2005, “Bayesian unsupervised classification algorithms based on parallel search strategy,” Patt. Recog. (under revision).
Dawid, A.P., 1984, “Present position and potential developments: Some personal views. Statistical theory. The prequential approach,” J. Roy. Statist. Soc. A47, 278–292 (with discussion).
Engel, Y., Mannor, S., and Meir, R., 2003, “Bayes meets Bellman: The Gaussian process approach to temporal difference learning,” in T. Fawcett and N. Mishra (eds.), Proceedings of the 20th International Conference on Machine Learning, Washington D.C.: AAAI Press.
de Finetti, B., 1974, Theory of Probability I, Chichester: Wiley.
Giudici, P. and Green, P.J., 1999, “Decomposable graphical Gaussian model determination,” Biometrika 86, 785–801.
Gutiérrez-Peña, E. and Walker, S.G., 2001, “A Bayesian predictive approach to model selection,” J. Statist. Planning Inference 93, 259–276.
Hannan, E.J. and Quinn, B.G., 1979, “The determination of the order of an autoregression”, J. Roy. Statist. Soc. B41, 190–195.
Jordan, M., 2004, Graphical models,” Stat. Sci. 19, 140–155.
Kass, R. and Wasserman, L., 1996, “The selection of prior distributions by formal rules,” J. Amer. Stat. Assoc. 91, 1343–1370.
Key, J.T, Pericchi, L.R., and Smith, A.F.M., 1999, “Bayesian model choice: What and why?” in J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (eds.), Bayesian Statistics 6, Oxford: Oxford University Press, pp. 343–370 (with discussion).
Lauritzen, S.L., 1996, Graphical Models, Oxford: Oxford University Press.
Lindley, D., 1991, “Discussion of paper by M. Aitkin,” J. Roy. Statist. Soc. B53, 111–142 (with discussion).
Lindley, D., 1992, “Discussion of paper by R. Royall,” in J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (eds.), Bayesian Statistics 4, Oxford: Oxford University Press, pp. 405–418 (with discussion).
Lindsey, J.K., 1996, Parametric Statistical Inference, Oxford: Oxford University Press.
Madigan, D. and Raftery, A.E., 1994, “Model selection and accounting for model uncertainty in graphical models using Occam's window,” J. Amer. Stat. Assoc. 89, 1535–1546.
Mardia, K.V., Kent, J.T. and Bibby, J.M., 1979, Multivariate Analysis, London: Academic Press.
Meir, R. and Merhav, N., 1995, “On the stochastic complexity of learning realizable and unrealizable rules,” Machine Learning 19, 241–261.
O'Hagan, A., 1995, “Fractional Bayes factors for model comparison,” J. Roy. Statist. Soc. B57, 99–138 (with discussion).
Perez, J.M. and Berger, J., 2002, “Expected posterior prior distributions for model selection,” Biometrika 89, 491–512.
Porteous, B.T., 1985, “Improved likelihood ratio statistics for covariance selection models,” Biometrika 72, 97–101.
Rissanen, J., 1987, “Stochastic complexity,” J. Roy. Statist. Soc. B49, 223–239.
Rissanen, J., 1995, “Fisher information and stochastic complexity,” IEEE Trans. Inf. Theory 42, 40–47.
Robert, C.P. and Casella, G., 1999, Monte Carlo Statistical Methods, New York: Springer.
Royall, R., 1992, “The elusive concept of statistical evidence,” in J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (eds.), Bayesian Statistics 4, Oxford: Oxford University Press, pp. 405–418 (with discussion).
Schervish, M.J., 1995, Theory of Statistics, New York: Springer-Verlag.
Schwarz, G., 1978, “Estimating the dimension of a model,” Ann. Stat. 6, 461–464.
Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van der Linde, A., 2002, “Bayesian measures of model complexity and fit,” J. Roy. Statist. Soc. B64, 583–640 (with discussion).
Weissman, T. and Merhav, N., 2003, “On competitive predictability and its relation to rate-distortion theory and to channel capacity theory,” IEEE Trans. Inform. Theory 49, 3185–3194.
Zellner, A., 1971, An Introduction to Bayesian Inference in Econometrics, New York: Wiley.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Corander, J., Marttinen, P. Bayesian Model Learning Based on Predictive Entropy. JoLLI 15, 5–20 (2006). https://doi.org/10.1007/s10849-005-9004-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10849-005-9004-8