On predictive distributions and Bayesian networks

Kontkanen, P.; Myllymäki, P.; Silander, T.; Tirri, H.; Grünwald, P.

doi:10.1023/A:1008984400380

On predictive distributions and Bayesian networks

Published: January 2000

Volume 10, pages 39–54, (2000)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

P. Kontkanen¹,
P. Myllymäki²,
T. Silander³,
H. Tirri⁴ &
…
P. Grünwald⁵

254 Accesses
21 Citations
Explore all metrics

Abstract

In this paper we are interested in discrete prediction problems for a decision-theoretic setting, where the task is to compute the predictive distribution for a finite set of possible alternatives. This question is first addressed in a general Bayesian framework, where we consider a set of probability distributions defined by some parametric model class. Given a prior distribution on the model parameters and a set of sample data, one possible approach for determining a predictive distribution is to fix the parameters to the instantiation with the maximum a posteriori probability. A more accurate predictive distribution can be obtained by computing the evidence (marginal likelihood), i.e., the integral over all the individual parameter instantiations. As an alternative to these two approaches, we demonstrate how to use Rissanen's new definition of stochastic complexity for determining predictive distributions, and show how the evidence predictive distribution with Jeffrey's prior approaches the new stochastic complexity predictive distribution in the limit with increasing amount of sample data. To compare the alternative approaches in practice, each of the predictive distributions discussed is instantiated in the Bayesian network model family case. In particular, to determine Jeffrey's prior for this model family, we show how to compute the (expected) Fisher information matrix for a fixed but arbitrary Bayesian network structure. In the empirical part of the paper the predictive distributions are compared by using the simple tree-structured Naive Bayes model, which is used in the experiments for computational reasons. The experimentation with several public domain classification datasets suggest that the evidence approach produces the most accurate predictions in the log-score sense. The evidence-based methods are also quite robust in the sense that they predict surprisingly well even when only a small fraction of the full training set is used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of predictive uncertainty estimation with machine learning

Article Open access 18 March 2024

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Article Open access 07 July 2017

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

References

Baxter R. and Oliver J. 1994. MDL and MML: similarities and differences. Technical Report 207, Department of Computer Science, Monash University.
Berger J. 1985. Statistical Decision Theory and Bayesian Analysis. New York, Springer-Verlag.
Google Scholar
Bernardo J. and Smith A. 1994. Bayesian theory. John Wiley.
Blake C., Keogh E., and Merz C. 1998. UCI repository of machine learning databases. URL: http://www.ics.uci.edu/»mlearn/MLRepository.html.
Castillo E., Gutiérrez J., and Hadi A. 1997. Expert Systems and Probabilistic Network Models, Monographs in Computer Science. New York, NY, Springer-Verlag.
Google Scholar
Clarke B. and Barron A. 1990. Information-theoretic asymptotics of Bayes methods. IEEE Transactions on Information Theory 36(3): 453–471.
Google Scholar
Clarke B. and Barron A. 1994. Jeffrey's Prior is asymptotically least favorable under entropy risk. Journal of Statistical Planning and Inference 41: 37–60.
Google Scholar
Cooper G. 1990. The computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence 42(2–3): 393–405.
Google Scholar
Cooper G. and Herskovits E. 1992. A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9: 309–347.
Google Scholar
Cover T. and Thomas J. 1991. Elements of Information Theory. New York, NY, John Wiley & Sons.
Google Scholar
DeGroot M. 1970. Optimal Statistical Decisions. McGraw-Hill.
Dom B. 1995. MDL estimation with small sample sizes including an application to the problem of segmenting binary strings using Bernoulli models. Technical Report RJ 9997 (89085), IBM Research Division, Almaden Research Center.
Friedman N. and Goldszmidt 1996. Building classifiers using Bayesian networks. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, Oregon, pp. 1277–1284.
Geiger D. and Heckerman D. 1994. A characterization of the Dirichlet distribution through global and local independence. Technical Report MSR-TR-94-16, Microsoft Research.
Grünwald P. 1998. The minimum description length principle and reasoning under uncertainty. Ph.D. Thesis, CWI, ILLC Dissertation Series 1998-03.
Grünwald P., Kontkanen P., Myllymäki P., Silander T., and Tirri H. 1998. Minimum encoding approaches for predictive modeling. In: Cooper G. and Moral S. (Eds.), Proceedings of the 14th International Conference on Uncertainty in Artificial Intelligence (UAI'98), Madison, WI, pp. 183–192.
Heckerman D., Geiger D., and Chickering D. 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20(3): 197–243.
Google Scholar
Jensen F. 1996. An Introduction to Bayesian Networks. London, UCL Press.
Google Scholar
Kass R. and Voss P. 1997. Geometrical Foundations of Asymptotic Inference. Wiley Interscience.
Kontkanen P., Myllymäki P., Silander T., Tirri H., and Grünwald P. 1997. Comparing predictive inference methods for discrete domains. In: Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, Florida, pp. 311–318.
Kontkanen P., Myllymäki P., Silander T., Tirri H., and Valtonen K. 1999. Exploring the robustness of Bayesian and information-theoretic methods for predictive inference. In: Heckerman D. and Whittaker J. (Eds.), Proceedings of Uncertainty '99: The Seventh International Workshop on Artificial Intelligence and Statistics, Morgan Kaufmann Publishers, pp. 231–236.
Langley P. and Sage S. 1994 Induction of selective Bayesian classifiers. In: Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, Seattle, Oregon, pp. 399–406.
Michie D., Spiegelhalter D., and Taylor C. (Eds.), 1994. Machine Learning, Neural and Statistical Classification, London, Ellis Horwood.
Google Scholar
Neapolitan R. 1990. Probabilistic Reasoning in Expert Systems. New York, NY, John Wiley & Sons.
Google Scholar
Pearl J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA, Morgan Kaufmann Publishers.
Google Scholar
Rissanen J. 1987. Stochastic complexity. Journal of the Royal Statistical Society 49(3): 223–239 and 252–265.
Google Scholar
Rissanen J. 1989. Stochastic Complexity in Statistical Inquiry. New Jersey, World Scientific Publishing Company.
Google Scholar
Rissanen J. 1996. Fisher information and stochastic complexity. IEEE Transactions on Information Theory 42(1): 40–47.
Google Scholar
Shachter R. 1988. Probabilistic inference and influence diagrams. Operations Research 36(4): 589–604.
Google Scholar
Takeuchi J. and Barron A. 1998. Asymptotically minimax regret by Bayes mixtures. In: 1998 IEEE International Symposium on Information Theory. Cambridge, MA, August 1998.
Thiesson B. 1995. Score and information for recursive exponential models with incomplete data. Technical Report R-95-2020, Aalborg University, Institute for Electronic Systems, Department of Mathematics and Computer Science.
Tirri H., Kontkanen P., and Myllymüki P. 1996. Probabilistic instancebased learning. In: Saitta L. (Ed.), Machine Learning: Proceedings of the Thirteenth International Conference (ICML'96), pp. 507–515.
Wallace C. and Boulton D. 1968. An information measure for classification. Computer Journal 11: 185–194.
Google Scholar
Wallace C. and Freeman P. 1987. Estimation and inference by compact coding. Journal of the Royal Statistical Society 49(3): 240–265.
Google Scholar
Wallace C., Korb K., and Dai H. 1996a. Causal discovery via MML. Technical Report 96=254, Department of Computer Science, Monash University.
Wallace C., Korb K., and Dai H. 1996b. Causal discovery via MML. In: Saitta L. (Ed.), Machine Learning: Proceedings of the Thirteenth International Conference (ICML'96), pp. 516–524.

Download references

Author information

Authors and Affiliations

Complex Systems Computation Group (CoSCo), Department of Computer Science, University of Helsinki, P.O. Box 26, FIN-00014, Finland
P. Kontkanen
Complex Systems Computation Group (CoSCo), Department of Computer Science, University of Helsinki, P.O. Box 26, FIN-00014, Finland
P. Myllymäki
Complex Systems Computation Group (CoSCo), Department of Computer Science, University of Helsinki, P.O. Box 26, FIN-00014, Finland
T. Silander
Complex Systems Computation Group (CoSCo), Department of Computer Science, University of Helsinki, P.O. Box 26, FIN-00014, Finland
H. Tirri
Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
P. Grünwald

Authors

P. Kontkanen
View author publications
You can also search for this author in PubMed Google Scholar
P. Myllymäki
View author publications
You can also search for this author in PubMed Google Scholar
T. Silander
View author publications
You can also search for this author in PubMed Google Scholar
H. Tirri
View author publications
You can also search for this author in PubMed Google Scholar
P. Grünwald
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kontkanen, P., Myllymäki, P., Silander, T. et al. On predictive distributions and Bayesian networks . Statistics and Computing 10, 39–54 (2000). https://doi.org/10.1023/A:1008984400380

Download citation

Issue Date: January 2000
DOI: https://doi.org/10.1023/A:1008984400380

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On predictive distributions and Bayesian networks

Abstract

Access this article

Similar content being viewed by others

A review of predictive uncertainty estimation with machine learning

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

On predictive distributions and Bayesian networks

Abstract

Access this article

Similar content being viewed by others

A review of predictive uncertainty estimation with machine learning

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation