Abstract
This paper presents and evaluates an approach to Bayesian model averaging where the models are Bayesian nets (BNs). A comprehensive study of the literature on structural priors for BNs is conducted. A number of prior distributions are defined using stochastic logic programs and the MCMC Metropolis-Hastings algorithm is used to (approximately) sample from the posterior. We use proposals which are tightly coupled to the priors which give rise to cheaply computable acceptance probabilities. Experiments using data generated from known BNs have been conducted to evaluate the method. The experiments used 6 different BNs and varied: the structural prior, the parameter prior, the Metropolis-Hasting proposal and the data size. Each experiment was repeated three times with different random seeds to test the robustness of the MCMC-produced results. Our results show that with effective priors (i) robust results are produced and (ii) informative priors improve results significantly.
Similar content being viewed by others
References
Abramson, B., Brown, J., Murphy, A., Winker, R.L.: Hailfinder: a Bayesian system for forecasting severe weather. Int. J. Forecast. 12, 57–71 (1996)
Acid, S., de Campos, L.M.: Searching for Bayesian network structures in the space of restricted acyclic partially directed graphs. J. Artif. Intell. Res. 18, 445–490 (2003)
Andrieu, C., de Freitas, N., Doucet, A., Jordan, M.I.: An introduction to MCMC for machine learning. Mach. Learn. 50, 5–43 (2003)
Angelopoulos, N., Cussens, J.: Markov chain Monte Carlo using tree-based priors on model structure. In: Breese, J., Koller, D. (eds.) Proceedings of the Seventeenth Annual Conference on Uncertainty in Artificial Intelligence (UAI–2001), Seattle, August 2001. Morgan Kaufmann, San Francisco (2001)
Angelopoulos, N., Cussens, J.: Extended stochastic logic programs for informative priors over C&RTs. In: Camacho, R., King, R., Srinivasan, A. (eds.) Proceedings of the work-in-progress track of the Fourteenth International Conference on Inductive Logic Programming (ILP04), pp. 7–11, Porto, September 2004
Angelopoulos, N., Cussens, J.: On the implementation of MCMC proposals over stochastic logic programs. In: Colloquium on Implementation of Constraint and LOgic Programming Systems. Satellite workshop to ICLP’04, Saint-Malo, September 2004
Angelopoulos, N., Cussens, J.: Exploiting informative priors for Bayesian classification and regression trees. In: Proc. 19th International Joint Conference on AI (IJCAI-05), Edinburgh, August 2005
Angelopoulos, N., Cussens, J.: MCMCMS 0.3.4 User Guide. University of York (2005)
Angelopoulos, N., Cussens, J.: Tempering for Bayesian C&RT. In: Proceedings of the 22nd International Conference on Machine Learning (ICML05), Bonn, 7–11 August 2005
Beinlich, I.A., Suermondt, H.J., Chavez, R.M., Cooper, G.F.: The alarm monitoring system: a case study with two probabilistic inference techniques for belief networks. In: Proceedings of the European Conference on Artificial Intelligence in Medicine, pp. 247–256, London, 29–31 August 1989
Binder, J., Koller, D., Russell, S., Kanazawa, K.: Adaptive probabilistic networks with hidden variables. Mach. Learn. 29, 213–244 (1997)
Bøttcher, S.G., Dethlefsen, C.: Deal: a package for learning Bayesian networks. J. Stat. Softw. 8(20), 1–40 (2003)
Buntine, W.L.: Theory refinement of Bayesian networks. In: D’Ambrosio, B., Smets, P., Bonissone, P. (eds.) Proceedings of the Seventh Annual Conference on Uncertainty in Artificial Intelligence (UAI–1991), pp. 52–60, San Mateo, 13–15 July 1991
Cameron, P.J.: First-order logic. In: Beineke, L.W., Wilson R.J. (eds.) Graph Connections: Relationships between Graph Theory and other Areas of Mathematics, pp. 70–85. Clarendon, Oxford (1997)
Castelo, R., Kočka, T.: On inclusion-driven learning of Bayesian networks. J. Mach. Learn. Res. 4, 527–574 (2003)
Cooper, G., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9, 309–347 (1992). Appeared as 1991 Technical Report KSL-91-02 for the Knowledge Systems Laboratory, Stanford University (also SMI-91-0355)
Cussens, J.: Stochastic logic programs: sampling, inference and applications. In: Proc. UAI-00, pp. 115–122. Morgan Kaufmann, San Francisco (2000)
Cussens, J.: Parameter estimation in stochastic logic programs. Mach. Learn. 44(3), 245–271 (2001)
Dobra, A., Jones B., Hans, C., Nevins J., West, M.: Sparse graphical models for exploring gene expression data. J. Multivar. Anal. 90, 196–212 (2004)
Egeland, T., Mostad, P., Mevåg, B., Stenersen, M.: Beyond traditional paternity and identification cases. Selecting the most probable pedigree. Forensic Sci. Int. 110(1), 47–59 (2000)
Feller, W.: An Introduction to Probability Theory and Its Applications, vol. 1, 3rd edn. Wiley, New York (1950)
Frege, G.: Begriffsschrift, eine der arithmetischen nachgebildete Formelsprache des reinen Denkens (1879)
Friedman, N., Koller, D.: Being Bayesian about network structure: a Bayesian approach to structure discovery in Bayesian networks. Mach. Learn. 50, 95–126 (2003)
Gelman, A.: Parameterization and Bayesian modeling. J. Am. Stat. Assoc. 99(466), 537–545 (2004)
Gilks, W.R., Richardson, S., Spiegelhalter, D.J., (eds.).: Markov Chain Monte Carlo in Practice. Chapman & Hall, London (1996)
Häggström, O.: Finite Markov Chains and Algorithmic Applications. London Mathematical Society Student Texts, vol. 52. Cambridge University Press, Cambridge (2002)
Heckerman, D., Geiger, D., Chickering, D.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20, 197–243 (1995). Also appears as Technical Report MSR-TR-94-09, Microsoft Research, March, 1994 (revised December, 1994)
Heckerman, D., Chickering, D.M., Meek, C., Rounthwaite, R., Kadie, C.: Dependency networks for inference, collaborative filtering, and data visualization. J. Mach. Learn. Res. 1, 49–75 (2000)
Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)
Højsgaard, S., Thiesson, B.: BIFROST—block recursive models induced from relevant knowledge, observations, and statistical techniques. Comput. Stat. Data Anal. 19, 155–175 (1995)
Howson, C., Urbach, P.: Scientific Reasoning: The Bayesian Approach. Open Court, La Salle (1989)
Koivisto, M., Sood, K.: Exact Bayesian structure discovery in Bayesian networks. J. Mach. Learn. Res. 5, 549–573 (2004)
Langseth, H., Nielsen, T.D.: Fusion of domain knowledge with data for structural learning in object oriented domains. J. Mach. Learn. Res. 4, 339–368 (2003)
Laskey, K.B., Myers, J.W.: Population Markov chain Monte Carlo. Mach. Learn. 50, 175–196 (2003)
Lauritzen, S.L., Richardson, T.S.: Chain graph models and their causal interpretations. J. R. Stat. Soc. B 64(3), 321–361 (2002)
Lauritzen, S.L., Spiegelhalter, D.J.: Local computations with probabilities on graphical structures and their applications to expert systems. J. R. Stat. Soc. A 50(2), 157–224 (1988)
Madigan, D., York, J.: Bayesian graphical models for discrete data. Int. Stat. Rev. 63, 215–232 (1995)
Madigan, D., Gavrin, J., Raftery, A.E.: Eliciting prior information to enhance the predictive performance of Bayesian graphical models. Commun. Stat. Theory Methods 24, 2271–2292 (1995). Appeared as 1994 Technical Report 270, University of Washington.
Madigan, D., Raftery, A.E.: Model selection and accounting for model uncertainty in graphical models using Occam’s window. J. Am. Stat. Assoc. 89, 1535–1546 (1994). First version was 1991 Technical Report 213, University of Washington.
Muggleton, S.: Stochastic logic programs. In: De Raedt, L. (ed.) Advances in Inductive Logic Programming. Frontiers in Artificial Intelligence and Applications, vol. 32, pp. 254–264. IOS, Amsterdam (1996)
Nilsson, U., Małuszyński, J.: Logic, Programming and Prolog, 2nd edn. Wiley, Chichester (1995)
Richardson, M., Domingos, P.: Learning with knowledge from multiple experts. In: Proceedings of the Twentieth International Conference on Machine Learning. Morgan Kaufmann, Washington, DC (2003)
Robert, C.P., Casella, R.: Monte Carlo Statistical Methods, 2nd edn. Springer, New York (2004)
Sato, T., Kameya, Y.: Parameter learning of logic programs for symbolic-statistical modeling. J. Artif. Intell. Res. 15, 391–454 (2001)
Segal, E., Pe’er, D., Regev, A., Koller, D., Friedman, N.: Learning module networks. J. Mach. Learn. Res. 6, 557–588 (2005)
Sheehan, N., Sorensen, D.: Graphical models for mapping continuous traits. In: Green, P.J., Hjort, N.L., Richardson, S. (eds.) Highly Structured Stochastic Systems, pp. 382–386. Oxford University Press, Oxford (2003)
Srinivas, S., Russell, S., Agogino, A.M.: Automated construction of sparse Bayesian networks from unstructured probabilistic models and domain information. In: Henrion, M., Schachter, R., Kanal, L., Flemmer, J. (eds.) Uncertainty in Artificial Intelligence: Proceedings of the Fifth Conference (UAI-1989), pp. 295–308. Elsevier Science, New York (1990)
Stephens, M., Donelly, P.: A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am. J. Hum. Genet. 73, 1162–1169 (2003)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Angelopoulos, N., Cussens, J. Bayesian learning of Bayesian networks with informative priors. Ann Math Artif Intell 54, 53–98 (2008). https://doi.org/10.1007/s10472-009-9133-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-009-9133-x
Keywords
- Prior knowledge
- Bayesian inference
- Bayesian model averaging
- Markov chain Monte Carlo
- Loss functions
- Stochastic logic programs