Skip to main content
Log in

Parallell interacting MCMC for learning of topologies of graphical models

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Automated statistical learning of graphical models from data has attained a considerable degree of interest in the machine learning and related literature. Many authors have discussed and/or demonstrated the need for consistent stochastic search methods that would not be as prone to yield locally optimal model structures as simple greedy methods. However, at the same time most of the stochastic search methods are based on a standard Metropolis–Hastings theory that necessitates the use of relatively simple random proposals and prevents the utilization of intelligent and efficient search operators. Here we derive an algorithm for learning topologies of graphical models from samples of a finite set of discrete variables by utilizing and further enhancing a recently introduced theory for non-reversible parallel interacting Markov chain Monte Carlo-style computation. In particular, we illustrate how the non-reversible approach allows for novel type of creativity in the design of search operators. Also, the parallel aspect of our method illustrates well the advantages of the adaptive nature of search operators to avoid trapping states in the vicinity of locally optimal network topologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Andersson SA, Madigan D, Perlman MD (1996) An alternative Markov property for chain graphs. In: Uncertainty in artificial intelligence: proceedings of the twelfth conference. Morgan Kaufmann, San Francisco, pp 40–48

  • Andersson SA, Madigan D and Perlman MD (1997). A characterization of Markov equivalence classes for acyclic digraphs. Ann Statist 25: 505–541

    Article  MATH  MathSciNet  Google Scholar 

  • Andersson SA, Madigan D and Perlman MD (2001). Alternative Markov properties for chain graphs. Scand J Stat 28: 33–85

    Article  MATH  MathSciNet  Google Scholar 

  • Chickering DM (1995) A transformational characterization of equivalent Bayesian network structures. In: Uncertainty in artificial intelligence: proceedings of the eleventh conference. Morgan Kaufmann, San Francisco, pp 87–98

  • Chickering DM (2002a). Learning equivalence classes of Bayesian network structures. J Mach Learn Res 2: 445–498

    Article  MATH  MathSciNet  Google Scholar 

  • Chickering DM (2002b). Optimal structure identification with greedy search. J Mach Learn Res 3: 507–554

    Article  MathSciNet  Google Scholar 

  • Cooper G and Hershkovitz E (1992). A bayesian method for the induction of probabilistic networks from data. Mach Learn 9: 309–347

    MATH  Google Scholar 

  • Corander J (2003). Bayesian graphical model determination using decision theory. J Multivariate Anal 85: 253–266

    Article  MATH  MathSciNet  Google Scholar 

  • Corander J, Gyllenberg M and Koski T (2006). Bayesian model learning based on parallel mcmc strategy. Stat Comput 16: 355–362

    Article  MathSciNet  Google Scholar 

  • Cowell RG, Dawid AP, Lauritzen SL and Spiegelhalter DJ (1999). Probabilistic networks and expert systems. Springer, New York

    MATH  Google Scholar 

  • Dawid AP (1979). Conditional independence in statistical theory. J Roy Stat Soc B 41: 1–31

    MATH  MathSciNet  Google Scholar 

  • Dawid AP and Lauritzen SL (1993). Hyper-Markov laws in the statistical analysis of decomposable graphical models. Ann Statist 21: 1272–1317

    Article  MATH  MathSciNet  Google Scholar 

  • Dellaportas P and Forster J (1999). Markov chain monte carlo model determination for hierarchical and graphical log-linear models. Biometrika 86: 615–633

    Article  MATH  MathSciNet  Google Scholar 

  • Durrett R (1996). Probability: theory and examples. Duxbury Press, CA

    Google Scholar 

  • Frydenberg M (1990). The chain graph Markov property. Scand J Stat 17: 333–353

    MATH  MathSciNet  Google Scholar 

  • Frydenberg M and Lauritzen SL (1989). Decomposition of maximum likelihood in mixed graphical interaction models. Biometrika 76: 539–555

    Article  MATH  MathSciNet  Google Scholar 

  • Geyer CJ and Thompson EA (1995). Annealing Markov chain Monte Carlo with applications to ancestral inference. J Am Stat Assoc 90: 909–920

    Article  MATH  Google Scholar 

  • Gillispie SB, Perlman MD (2001) Enumerating Markov equivalence classes of acyclic digraph models. In: Uncertainty in artificial intelligence: proceedings of the seventeeth conference. Morgan Kaufmann, San Francisco, pp 171–177

  • Giudici P and Castelo R (2003). Improving Markov chain Monte Carlo model search for data mining. Mach Learn 50: 127–158

    Article  MATH  Google Scholar 

  • Giudici P and Green PJ (1999). Decomposable graphical Gaussian model determination. Biometrika 86: 785–801

    Article  MATH  MathSciNet  Google Scholar 

  • Isaacson DL and Madsen RW (1976). Markov Chains: theory and applications. Wiley, New York

    MATH  Google Scholar 

  • Janzura M and Nielsen J (2006). A simulated annealing-based method for learning Bayesian networks from statistical data. Int J Intell Syst 21: 335–348

    Article  MATH  Google Scholar 

  • Jones B, Carvalho C and Dobra A et al (2005). Experiments in stochastic computation for high-dimensional graphical models. Stat Sci 20: 388–400

    Article  MATH  MathSciNet  Google Scholar 

  • Jordan MI (1998). Learning in graphical models. MIT Press, Cumberland

    MATH  Google Scholar 

  • Koivisto M and Sood K (2004). Exact Bayesian structure discovery in Bayesian networks. J Mach Learn Res 5: 549–573

    MathSciNet  Google Scholar 

  • Lam W and Bacchus F (1994). Learning Bayesian belief networks: An approach based on the MDL principle. Comput Intell 10: 269–293

    Article  Google Scholar 

  • Madigan D, Andersson S, Perlman M and Volinsky C (1996). Bayesian model averaging and model selection for Markov equivalence classes of acyclic digraphs. Communtat Theor Meth 25: 2493–2519

    Article  MATH  Google Scholar 

  • Madigan D and Raftery A (1994). Model selection and accounting for model uncertainly in graphicalmodels using Occam’s window. J Am Stat Assoc 89: 1535–1546

    Article  MATH  Google Scholar 

  • Peña JM (2007) Approximate counting of graphical models via MCMC. In: Proceedings of the 11th international conference on artificial intelligence, pp 352–359

  • Poli I and Roverato A (1998). A genetic algorithm for graphical model selection. J Italian Stat Soc 2: 197–208

    Article  Google Scholar 

  • Riggelsen C (2005). MCMC learning of Bayesian network models by markov blanket decomposition. Springer, New York

    Google Scholar 

  • Robert C and Casella G (2004). Monte Carlo statistical methods, 2nd edn. Springer, New York

    MATH  Google Scholar 

  • Roverato A and Studený M (2006). A graphical representation of equivalence classes of AMP chain graphs. J Mach Learn Res 7: 1045–1078

    MathSciNet  Google Scholar 

  • Sanguesa R and Cortes U (1997). Learning causal networks from data: a survey and a new algorithm to learn possibilistic causal networks from data.. AI Commun 4: 1–31

    Google Scholar 

  • Spirtes P, Glymour C and Scheines R (1993). Causation, prediction and search. Springer, New York

    MATH  Google Scholar 

  • Studený M (1998) Bayesian networks from the point of view of chain graphs. Uncertainty in Artificial Intelligence: In: proceedings of the twelfth conference. Morgan Kaufmann, San Francisco, pp 496–503

  • Sundberg R (1975). Some results about decomposable (or markov-type) models for multidimensional contingency tables: distribution of marginals and partitioning of tests. Scand J Stat 2: 771–779

    MathSciNet  Google Scholar 

  • Suzuki J (1996) Learning Bayesian belief networks based on the minimum description length principle. In: International Conference Machine on Learning, Morgan Kaufmann, San Francisco, pp 462–470

  • Suzuki J (2006). On strong consistency of model selection in classification. IEEE Trans Inform Theory 52: 4767–4774

    Article  MathSciNet  Google Scholar 

  • van Laarhoven PJM, Aarts EHJ (1987). Simulated annealing: theory and applications. Kluwer, Norwell

    MATH  Google Scholar 

  • Verma E, Pearl J (1990) Equivalence and synthesis of causal models. In: Uncertainty in artificial intelligence: proceedings of the sixth conference. Elsevier, New York, pp 220–227

  • Volf M and Studený M (1999). A graphical characterization of the largest chain graphs. Int J Approx Reason 20: 209–236

    Article  MATH  Google Scholar 

  • Wedelin D (1996). Efficient estimation and model selection in large graphical models. Stat Comput 6: 313–323

    Article  Google Scholar 

  • Whittaker J (1990). Graphical models in applied multivariate statistics. Wiley, Chichester

    MATH  Google Scholar 

  • Wong F, Carter C and Kohn R (2003). Efficient estimation of covariance selection models. Biometrika 90: 809–830

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jukka Corander.

Additional information

Responsible editor: Charu Aggarwal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Corander, J., Ekdahl, M. & Koski, T. Parallell interacting MCMC for learning of topologies of graphical models. Data Min Knowl Disc 17, 431–456 (2008). https://doi.org/10.1007/s10618-008-0099-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-008-0099-9

Keywords

Navigation