Skip to main content
Log in

Bayesian Model Learning Based on Predictive Entropy

  • Published:
Journal of Logic, Language and Information Aims and scope Submit manuscript

Abstract

Bayesian paradigm has been widely acknowledged as a coherent approach to learning putative probability model structures from a finite class of candidate models. Bayesian learning is based on measuring the predictive ability of a model in terms of the corresponding marginal data distribution, which equals the expectation of the likelihood with respect to a prior distribution for model parameters. The main controversy related to this learning method stems from the necessity of specifying proper prior distributions for all unknown parameters of a model, which ensures a complete determination of the marginal data distribution. Even for commonly used models, subjective priors may be difficult to specify precisely, and therefore, several automated learning procedures have been suggested in the literature. Here we introduce a novel Bayesian learning method based on the predictive entropy of a probability model, that can combine both subjective and objective probabilistic assessment of uncertain quantities in putative models. It is shown that our approach can avoid some of the limitations of the earlier suggested objective Bayesian methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aitkin, M., 1991, “Posterior Bayes factors,” J. Roy. Statist. Soc. B53, 111–142 (with discussion).

    Google Scholar 

  • Akaike, H., 1974, “A new look at the statistical model identification,” IEEE Trans. Autom. Control 19, 716–723.

    Article  Google Scholar 

  • Akaike, H., 1978, “A new look at the Bayes procedure,” Biometrika 65, 53–59.

    Article  Google Scholar 

  • Akaike, H., 1979, “A Bayesian extension of the minimum AIC procedure of autoregressive model fitting,” Biometrika 66, 237–242.

    Article  Google Scholar 

  • Bayarri, M. J. and Berger, J., 1998, “Robust Bayesian analysis of selection models,” Ann. Statist. 26, 645–659.

    Article  Google Scholar 

  • Berger, J.O. and Pericchi, L.R., 1996, “The intrinsic Bayes factor for model selection and prediction,” J. Amer. Stat. Assoc. 91, 109–122.

    Article  Google Scholar 

  • Berger, J.O. and Bernardo, J.M., 1989, “Estimating a product of means: Bayesian analysis with reference priors,” J. Amer. Stat. Assoc. 84, 200–207.

    Article  Google Scholar 

  • Berger, J.O. and Bernardo, J.M., 1992, “On the development of reference priors,” in J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith (eds.), Bayesian Statistics 4, Oxford: Oxford University Press, pp. 35–60 (with discussion).

    Google Scholar 

  • Berger, J.O. and Mortera, J., 1999, “Default Bayes factors for nonnested hypothesis testing,” J. Amer. Stat. Assoc. 94, 542–554.

    Article  Google Scholar 

  • Bernardo, J.M., 1979, “Reference posterior distributions for Bayesian inference,” J. Roy. Statist. Soc. B 41, 113–147 (with discussion).

    Google Scholar 

  • Bernardo, J.M., 1999, “Nested hypothesis testing: The Bayesian reference criterion,” in J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (eds.), Bayesian Statistics 6, Oxford: Oxford University Press, pp. 101–130 (with discussion).

    Google Scholar 

  • Bernardo, J.M. and Rueda, R., 2002, “Bayesian hypothesis testing: A reference approach,” Int. Stat. Review 70, 351–372.

    Article  Google Scholar 

  • Bernardo, J.M. and Smith, A.F.M., 1994, Bayesian Theory, Chichester: Wiley.

    Google Scholar 

  • Corander, J., 2003a, “Bayesian graphical model determination using decision theory,” J. Multiv. Analysis 85, 253–266.

    Article  Google Scholar 

  • Corander, J., 2003b, “Labeled graphical models,” Scand. J. Stat. 30, 493–508.

    Article  Google Scholar 

  • Corander, J., Gyllenberg, M., and Koski, T., 2005, “Bayesian unsupervised classification algorithms based on parallel search strategy,” Patt. Recog. (under revision).

  • Dawid, A.P., 1984, “Present position and potential developments: Some personal views. Statistical theory. The prequential approach,” J. Roy. Statist. Soc. A47, 278–292 (with discussion).

    Google Scholar 

  • Engel, Y., Mannor, S., and Meir, R., 2003, “Bayes meets Bellman: The Gaussian process approach to temporal difference learning,” in T. Fawcett and N. Mishra (eds.), Proceedings of the 20th International Conference on Machine Learning, Washington D.C.: AAAI Press.

  • de Finetti, B., 1974, Theory of Probability I, Chichester: Wiley.

    Google Scholar 

  • Giudici, P. and Green, P.J., 1999, “Decomposable graphical Gaussian model determination,” Biometrika 86, 785–801.

    Article  Google Scholar 

  • Gutiérrez-Peña, E. and Walker, S.G., 2001, “A Bayesian predictive approach to model selection,” J. Statist. Planning Inference 93, 259–276.

    Article  Google Scholar 

  • Hannan, E.J. and Quinn, B.G., 1979, “The determination of the order of an autoregression”, J. Roy. Statist. Soc. B41, 190–195.

    Google Scholar 

  • Jordan, M., 2004, Graphical models,” Stat. Sci. 19, 140–155.

    Article  Google Scholar 

  • Kass, R. and Wasserman, L., 1996, “The selection of prior distributions by formal rules,” J. Amer. Stat. Assoc. 91, 1343–1370.

    Article  Google Scholar 

  • Key, J.T, Pericchi, L.R., and Smith, A.F.M., 1999, “Bayesian model choice: What and why?” in J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (eds.), Bayesian Statistics 6, Oxford: Oxford University Press, pp. 343–370 (with discussion).

    Google Scholar 

  • Lauritzen, S.L., 1996, Graphical Models, Oxford: Oxford University Press.

    Google Scholar 

  • Lindley, D., 1991, “Discussion of paper by M. Aitkin,” J. Roy. Statist. Soc. B53, 111–142 (with discussion).

    Google Scholar 

  • Lindley, D., 1992, “Discussion of paper by R. Royall,” in J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (eds.), Bayesian Statistics 4, Oxford: Oxford University Press, pp. 405–418 (with discussion).

    Google Scholar 

  • Lindsey, J.K., 1996, Parametric Statistical Inference, Oxford: Oxford University Press.

    Google Scholar 

  • Madigan, D. and Raftery, A.E., 1994, “Model selection and accounting for model uncertainty in graphical models using Occam's window,” J. Amer. Stat. Assoc. 89, 1535–1546.

    Article  Google Scholar 

  • Mardia, K.V., Kent, J.T. and Bibby, J.M., 1979, Multivariate Analysis, London: Academic Press.

    Google Scholar 

  • Meir, R. and Merhav, N., 1995, “On the stochastic complexity of learning realizable and unrealizable rules,” Machine Learning 19, 241–261.

    Google Scholar 

  • O'Hagan, A., 1995, “Fractional Bayes factors for model comparison,” J. Roy. Statist. Soc. B57, 99–138 (with discussion).

    Google Scholar 

  • Perez, J.M. and Berger, J., 2002, “Expected posterior prior distributions for model selection,” Biometrika 89, 491–512.

    Article  Google Scholar 

  • Porteous, B.T., 1985, “Improved likelihood ratio statistics for covariance selection models,” Biometrika 72, 97–101.

    Article  Google Scholar 

  • Rissanen, J., 1987, “Stochastic complexity,” J. Roy. Statist. Soc. B49, 223–239.

    Google Scholar 

  • Rissanen, J., 1995, “Fisher information and stochastic complexity,” IEEE Trans. Inf. Theory 42, 40–47.

    Article  Google Scholar 

  • Robert, C.P. and Casella, G., 1999, Monte Carlo Statistical Methods, New York: Springer.

    Google Scholar 

  • Royall, R., 1992, “The elusive concept of statistical evidence,” in J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (eds.), Bayesian Statistics 4, Oxford: Oxford University Press, pp. 405–418 (with discussion).

    Google Scholar 

  • Schervish, M.J., 1995, Theory of Statistics, New York: Springer-Verlag.

    Google Scholar 

  • Schwarz, G., 1978, “Estimating the dimension of a model,” Ann. Stat. 6, 461–464.

    Google Scholar 

  • Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van der Linde, A., 2002, “Bayesian measures of model complexity and fit,” J. Roy. Statist. Soc. B64, 583–640 (with discussion).

    Article  Google Scholar 

  • Weissman, T. and Merhav, N., 2003, “On competitive predictability and its relation to rate-distortion theory and to channel capacity theory,” IEEE Trans. Inform. Theory 49, 3185–3194.

    Article  Google Scholar 

  • Zellner, A., 1971, An Introduction to Bayesian Inference in Econometrics, New York: Wiley.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jukka Corander.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Corander, J., Marttinen, P. Bayesian Model Learning Based on Predictive Entropy. JoLLI 15, 5–20 (2006). https://doi.org/10.1007/s10849-005-9004-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10849-005-9004-8

Keywords

Navigation