Skip to main content
Log in

On predictive distributions and Bayesian networks

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

In this paper we are interested in discrete prediction problems for a decision-theoretic setting, where the task is to compute the predictive distribution for a finite set of possible alternatives. This question is first addressed in a general Bayesian framework, where we consider a set of probability distributions defined by some parametric model class. Given a prior distribution on the model parameters and a set of sample data, one possible approach for determining a predictive distribution is to fix the parameters to the instantiation with the maximum a posteriori probability. A more accurate predictive distribution can be obtained by computing the evidence (marginal likelihood), i.e., the integral over all the individual parameter instantiations. As an alternative to these two approaches, we demonstrate how to use Rissanen's new definition of stochastic complexity for determining predictive distributions, and show how the evidence predictive distribution with Jeffrey's prior approaches the new stochastic complexity predictive distribution in the limit with increasing amount of sample data. To compare the alternative approaches in practice, each of the predictive distributions discussed is instantiated in the Bayesian network model family case. In particular, to determine Jeffrey's prior for this model family, we show how to compute the (expected) Fisher information matrix for a fixed but arbitrary Bayesian network structure. In the empirical part of the paper the predictive distributions are compared by using the simple tree-structured Naive Bayes model, which is used in the experiments for computational reasons. The experimentation with several public domain classification datasets suggest that the evidence approach produces the most accurate predictions in the log-score sense. The evidence-based methods are also quite robust in the sense that they predict surprisingly well even when only a small fraction of the full training set is used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Baxter R. and Oliver J. 1994. MDL and MML: similarities and differences. Technical Report 207, Department of Computer Science, Monash University.

  • Berger J. 1985. Statistical Decision Theory and Bayesian Analysis. New York, Springer-Verlag.

    Google Scholar 

  • Bernardo J. and Smith A. 1994. Bayesian theory. John Wiley.

  • Blake C., Keogh E., and Merz C. 1998. UCI repository of machine learning databases. URL: http://www.ics.uci.edu/»mlearn/MLRepository.html.

  • Castillo E., Gutiérrez J., and Hadi A. 1997. Expert Systems and Probabilistic Network Models, Monographs in Computer Science. New York, NY, Springer-Verlag.

    Google Scholar 

  • Clarke B. and Barron A. 1990. Information-theoretic asymptotics of Bayes methods. IEEE Transactions on Information Theory 36(3): 453–471.

    Google Scholar 

  • Clarke B. and Barron A. 1994. Jeffrey's Prior is asymptotically least favorable under entropy risk. Journal of Statistical Planning and Inference 41: 37–60.

    Google Scholar 

  • Cooper G. 1990. The computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence 42(2–3): 393–405.

    Google Scholar 

  • Cooper G. and Herskovits E. 1992. A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9: 309–347.

    Google Scholar 

  • Cover T. and Thomas J. 1991. Elements of Information Theory. New York, NY, John Wiley & Sons.

    Google Scholar 

  • DeGroot M. 1970. Optimal Statistical Decisions. McGraw-Hill.

  • Dom B. 1995. MDL estimation with small sample sizes including an application to the problem of segmenting binary strings using Bernoulli models. Technical Report RJ 9997 (89085), IBM Research Division, Almaden Research Center.

  • Friedman N. and Goldszmidt 1996. Building classifiers using Bayesian networks. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, Oregon, pp. 1277–1284.

  • Geiger D. and Heckerman D. 1994. A characterization of the Dirichlet distribution through global and local independence. Technical Report MSR-TR-94-16, Microsoft Research.

  • Grünwald P. 1998. The minimum description length principle and reasoning under uncertainty. Ph.D. Thesis, CWI, ILLC Dissertation Series 1998-03.

  • Grünwald P., Kontkanen P., Myllymäki P., Silander T., and Tirri H. 1998. Minimum encoding approaches for predictive modeling. In: Cooper G. and Moral S. (Eds.), Proceedings of the 14th International Conference on Uncertainty in Artificial Intelligence (UAI'98), Madison, WI, pp. 183–192.

  • Heckerman D., Geiger D., and Chickering D. 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20(3): 197–243.

    Google Scholar 

  • Jensen F. 1996. An Introduction to Bayesian Networks. London, UCL Press.

    Google Scholar 

  • Kass R. and Voss P. 1997. Geometrical Foundations of Asymptotic Inference. Wiley Interscience.

  • Kontkanen P., Myllymäki P., Silander T., Tirri H., and Grünwald P. 1997. Comparing predictive inference methods for discrete domains. In: Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, Florida, pp. 311–318.

  • Kontkanen P., Myllymäki P., Silander T., Tirri H., and Valtonen K. 1999. Exploring the robustness of Bayesian and information-theoretic methods for predictive inference. In: Heckerman D. and Whittaker J. (Eds.), Proceedings of Uncertainty '99: The Seventh International Workshop on Artificial Intelligence and Statistics, Morgan Kaufmann Publishers, pp. 231–236.

  • Langley P. and Sage S. 1994 Induction of selective Bayesian classifiers. In: Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, Seattle, Oregon, pp. 399–406.

  • Michie D., Spiegelhalter D., and Taylor C. (Eds.), 1994. Machine Learning, Neural and Statistical Classification, London, Ellis Horwood.

    Google Scholar 

  • Neapolitan R. 1990. Probabilistic Reasoning in Expert Systems. New York, NY, John Wiley & Sons.

    Google Scholar 

  • Pearl J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA, Morgan Kaufmann Publishers.

    Google Scholar 

  • Rissanen J. 1987. Stochastic complexity. Journal of the Royal Statistical Society 49(3): 223–239 and 252–265.

    Google Scholar 

  • Rissanen J. 1989. Stochastic Complexity in Statistical Inquiry. New Jersey, World Scientific Publishing Company.

    Google Scholar 

  • Rissanen J. 1996. Fisher information and stochastic complexity. IEEE Transactions on Information Theory 42(1): 40–47.

    Google Scholar 

  • Shachter R. 1988. Probabilistic inference and influence diagrams. Operations Research 36(4): 589–604.

    Google Scholar 

  • Takeuchi J. and Barron A. 1998. Asymptotically minimax regret by Bayes mixtures. In: 1998 IEEE International Symposium on Information Theory. Cambridge, MA, August 1998.

  • Thiesson B. 1995. Score and information for recursive exponential models with incomplete data. Technical Report R-95-2020, Aalborg University, Institute for Electronic Systems, Department of Mathematics and Computer Science.

  • Tirri H., Kontkanen P., and Myllymüki P. 1996. Probabilistic instancebased learning. In: Saitta L. (Ed.), Machine Learning: Proceedings of the Thirteenth International Conference (ICML'96), pp. 507–515.

  • Wallace C. and Boulton D. 1968. An information measure for classification. Computer Journal 11: 185–194.

    Google Scholar 

  • Wallace C. and Freeman P. 1987. Estimation and inference by compact coding. Journal of the Royal Statistical Society 49(3): 240–265.

    Google Scholar 

  • Wallace C., Korb K., and Dai H. 1996a. Causal discovery via MML. Technical Report 96=254, Department of Computer Science, Monash University.

  • Wallace C., Korb K., and Dai H. 1996b. Causal discovery via MML. In: Saitta L. (Ed.), Machine Learning: Proceedings of the Thirteenth International Conference (ICML'96), pp. 516–524.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kontkanen, P., Myllymäki, P., Silander, T. et al. On predictive distributions and Bayesian networks . Statistics and Computing 10, 39–54 (2000). https://doi.org/10.1023/A:1008984400380

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008984400380

Navigation