Bayesian Model Learning Based on Predictive Entropy

Corander, Jukka; Marttinen, Pekka

doi:10.1007/s10849-005-9004-8

Bayesian Model Learning Based on Predictive Entropy

Published: 28 February 2006

Volume 15, pages 5–20, (2006)
Cite this article

Journal of Logic, Language and Information Aims and scope Submit manuscript

Jukka Corander¹ &
Pekka Marttinen¹

113 Accesses
2 Citations
Explore all metrics

Abstract

Bayesian paradigm has been widely acknowledged as a coherent approach to learning putative probability model structures from a finite class of candidate models. Bayesian learning is based on measuring the predictive ability of a model in terms of the corresponding marginal data distribution, which equals the expectation of the likelihood with respect to a prior distribution for model parameters. The main controversy related to this learning method stems from the necessity of specifying proper prior distributions for all unknown parameters of a model, which ensures a complete determination of the marginal data distribution. Even for commonly used models, subjective priors may be difficult to specify precisely, and therefore, several automated learning procedures have been suggested in the literature. Here we introduce a novel Bayesian learning method based on the predictive entropy of a probability model, that can combine both subjective and objective probabilistic assessment of uncertain quantities in putative models. It is shown that our approach can avoid some of the limitations of the earlier suggested objective Bayesian methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aitkin, M., 1991, “Posterior Bayes factors,” J. Roy. Statist. Soc. B53, 111–142 (with discussion).
Google Scholar
Akaike, H., 1974, “A new look at the statistical model identification,” IEEE Trans. Autom. Control 19, 716–723.
Article Google Scholar
Akaike, H., 1978, “A new look at the Bayes procedure,” Biometrika 65, 53–59.
Article Google Scholar
Akaike, H., 1979, “A Bayesian extension of the minimum AIC procedure of autoregressive model fitting,” Biometrika 66, 237–242.
Article Google Scholar
Bayarri, M. J. and Berger, J., 1998, “Robust Bayesian analysis of selection models,” Ann. Statist. 26, 645–659.
Article Google Scholar
Berger, J.O. and Pericchi, L.R., 1996, “The intrinsic Bayes factor for model selection and prediction,” J. Amer. Stat. Assoc. 91, 109–122.
Article Google Scholar
Berger, J.O. and Bernardo, J.M., 1989, “Estimating a product of means: Bayesian analysis with reference priors,” J. Amer. Stat. Assoc. 84, 200–207.
Article Google Scholar
Berger, J.O. and Bernardo, J.M., 1992, “On the development of reference priors,” in J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith (eds.), Bayesian Statistics 4, Oxford: Oxford University Press, pp. 35–60 (with discussion).
Google Scholar
Berger, J.O. and Mortera, J., 1999, “Default Bayes factors for nonnested hypothesis testing,” J. Amer. Stat. Assoc. 94, 542–554.
Article Google Scholar
Bernardo, J.M., 1979, “Reference posterior distributions for Bayesian inference,” J. Roy. Statist. Soc. B 41, 113–147 (with discussion).
Google Scholar
Bernardo, J.M., 1999, “Nested hypothesis testing: The Bayesian reference criterion,” in J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (eds.), Bayesian Statistics 6, Oxford: Oxford University Press, pp. 101–130 (with discussion).
Google Scholar
Bernardo, J.M. and Rueda, R., 2002, “Bayesian hypothesis testing: A reference approach,” Int. Stat. Review 70, 351–372.
Article Google Scholar
Bernardo, J.M. and Smith, A.F.M., 1994, Bayesian Theory, Chichester: Wiley.
Google Scholar
Corander, J., 2003a, “Bayesian graphical model determination using decision theory,” J. Multiv. Analysis 85, 253–266.
Article Google Scholar
Corander, J., 2003b, “Labeled graphical models,” Scand. J. Stat. 30, 493–508.
Article Google Scholar
Corander, J., Gyllenberg, M., and Koski, T., 2005, “Bayesian unsupervised classification algorithms based on parallel search strategy,” Patt. Recog. (under revision).
Dawid, A.P., 1984, “Present position and potential developments: Some personal views. Statistical theory. The prequential approach,” J. Roy. Statist. Soc. A47, 278–292 (with discussion).
Google Scholar
Engel, Y., Mannor, S., and Meir, R., 2003, “Bayes meets Bellman: The Gaussian process approach to temporal difference learning,” in T. Fawcett and N. Mishra (eds.), Proceedings of the 20th International Conference on Machine Learning, Washington D.C.: AAAI Press.
de Finetti, B., 1974, Theory of Probability I, Chichester: Wiley.
Google Scholar
Giudici, P. and Green, P.J., 1999, “Decomposable graphical Gaussian model determination,” Biometrika 86, 785–801.
Article Google Scholar
Gutiérrez-Peña, E. and Walker, S.G., 2001, “A Bayesian predictive approach to model selection,” J. Statist. Planning Inference 93, 259–276.
Article Google Scholar
Hannan, E.J. and Quinn, B.G., 1979, “The determination of the order of an autoregression”, J. Roy. Statist. Soc. B41, 190–195.
Google Scholar
Jordan, M., 2004, Graphical models,” Stat. Sci. 19, 140–155.
Article Google Scholar
Kass, R. and Wasserman, L., 1996, “The selection of prior distributions by formal rules,” J. Amer. Stat. Assoc. 91, 1343–1370.
Article Google Scholar
Key, J.T, Pericchi, L.R., and Smith, A.F.M., 1999, “Bayesian model choice: What and why?” in J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (eds.), Bayesian Statistics 6, Oxford: Oxford University Press, pp. 343–370 (with discussion).
Google Scholar
Lauritzen, S.L., 1996, Graphical Models, Oxford: Oxford University Press.
Google Scholar
Lindley, D., 1991, “Discussion of paper by M. Aitkin,” J. Roy. Statist. Soc. B53, 111–142 (with discussion).
Google Scholar
Lindley, D., 1992, “Discussion of paper by R. Royall,” in J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (eds.), Bayesian Statistics 4, Oxford: Oxford University Press, pp. 405–418 (with discussion).
Google Scholar
Lindsey, J.K., 1996, Parametric Statistical Inference, Oxford: Oxford University Press.
Google Scholar
Madigan, D. and Raftery, A.E., 1994, “Model selection and accounting for model uncertainty in graphical models using Occam's window,” J. Amer. Stat. Assoc. 89, 1535–1546.
Article Google Scholar
Mardia, K.V., Kent, J.T. and Bibby, J.M., 1979, Multivariate Analysis, London: Academic Press.
Google Scholar
Meir, R. and Merhav, N., 1995, “On the stochastic complexity of learning realizable and unrealizable rules,” Machine Learning 19, 241–261.
Google Scholar
O'Hagan, A., 1995, “Fractional Bayes factors for model comparison,” J. Roy. Statist. Soc. B57, 99–138 (with discussion).
Google Scholar
Perez, J.M. and Berger, J., 2002, “Expected posterior prior distributions for model selection,” Biometrika 89, 491–512.
Article Google Scholar
Porteous, B.T., 1985, “Improved likelihood ratio statistics for covariance selection models,” Biometrika 72, 97–101.
Article Google Scholar
Rissanen, J., 1987, “Stochastic complexity,” J. Roy. Statist. Soc. B49, 223–239.
Google Scholar
Rissanen, J., 1995, “Fisher information and stochastic complexity,” IEEE Trans. Inf. Theory 42, 40–47.
Article Google Scholar
Robert, C.P. and Casella, G., 1999, Monte Carlo Statistical Methods, New York: Springer.
Google Scholar
Royall, R., 1992, “The elusive concept of statistical evidence,” in J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (eds.), Bayesian Statistics 4, Oxford: Oxford University Press, pp. 405–418 (with discussion).
Google Scholar
Schervish, M.J., 1995, Theory of Statistics, New York: Springer-Verlag.
Google Scholar
Schwarz, G., 1978, “Estimating the dimension of a model,” Ann. Stat. 6, 461–464.
Google Scholar
Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van der Linde, A., 2002, “Bayesian measures of model complexity and fit,” J. Roy. Statist. Soc. B64, 583–640 (with discussion).
Article Google Scholar
Weissman, T. and Merhav, N., 2003, “On competitive predictability and its relation to rate-distortion theory and to channel capacity theory,” IEEE Trans. Inform. Theory 49, 3185–3194.
Article Google Scholar
Zellner, A., 1971, An Introduction to Bayesian Inference in Econometrics, New York: Wiley.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of Helsinki, Helsinki, 68, FIN-00014, Finland
Jukka Corander & Pekka Marttinen

Authors

Jukka Corander
View author publications
You can also search for this author in PubMed Google Scholar
Pekka Marttinen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jukka Corander.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Corander, J., Marttinen, P. Bayesian Model Learning Based on Predictive Entropy. JoLLI 15, 5–20 (2006). https://doi.org/10.1007/s10849-005-9004-8

Download citation

Received: 20 April 2005
Accepted: 24 July 2005
Published: 28 February 2006
Issue Date: July 2006
DOI: https://doi.org/10.1007/s10849-005-9004-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian Model Learning Based on Predictive Entropy

Abstract

Access this article

Similar content being viewed by others

Simple measures of uncertainty for model selection

Roles Played by Bayesian Networks in Machine Learning: An Empirical Investigation

Objective Bayesian inference with proper scoring rules

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian Model Learning Based on Predictive Entropy

Abstract

Access this article

Similar content being viewed by others

Simple measures of uncertainty for model selection

Roles Played by Bayesian Networks in Machine Learning: An Empirical Investigation

Objective Bayesian inference with proper scoring rules

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation