Skip to main content
Log in

Learning Bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Learning Bayesian networks is known to be an NP-hard problem and that is the reason why the application of a heuristic search has proven advantageous in many domains. This learning approach is computationally efficient and, even though it does not guarantee an optimal result, many previous studies have shown that it obtains very good solutions. Hill climbing algorithms are particularly popular because of their good trade-off between computational demands and the quality of the models learned. In spite of this efficiency, when it comes to dealing with high-dimensional datasets, these algorithms can be improved upon, and this is the goal of this paper. Thus, we present an approach to improve hill climbing algorithms based on dynamically restricting the candidate solutions to be evaluated during the search process. This proposal, dynamic restriction, is new because other studies available in the literature about restricted search in the literature are based on two stages rather than only one as it is presented here. In addition to the aforementioned advantages of hill climbing algorithms, we show that under certain conditions the model they return is a minimal I-map of the joint probability distribution underlying the training data, which is a nice theoretical property with practical implications. In this paper we provided theoretical results that guarantee that, under these same conditions, the proposed algorithms also output a minimal I-map. Furthermore, we experimentally test the proposed algorithms over a set of different domains, some of them quite large (up to 800 variables), in order to study their behavior in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abellán J, Gómez-Olmedo M, Moral S (2006) Some variations on the PC algorithm. In: Proceedings of the 3rd European workshop on probabilistic graphical models (PGM-06), pp 1–8

  • Acid S, de Campos LM (2001) A hybrid methodology for learning belief networks: benedict. Int J Approx Reason 27(3): 235–262

    Article  MATH  Google Scholar 

  • Acid S, de Campos LM (2003) Searching for Bayesian network structures in the space of restricted acyclic partially directed graphs. J Artif Intell Res 18: 445–490

    MATH  Google Scholar 

  • Andreassen S, Jensen FV, Andersen SK, Falck B, Kjærulff U, Woldbye M, Sørensen AR, Rosenfalck A, Jensen F (1989) MUNIN—an expert EMG assistant. In: Desmedt JE (eds) Computer-aided electromyography and expert systems, chap 21. Elsevier, Amsterdam

    Google Scholar 

  • Beinlich IA, Suermondt HJ, Chavez RM, Cooper GF (1989) The ALARM monitoring system: a case study with two probabilistic inference techniques for belief networks. In: Second European conference on artificial intelligence in medicine, vol 38. Springer-Verlag, Berlin, pp 247–256

  • Binder J, Koller D, Russell SJ, Kanazawa K (1997) Adaptive probabilistic networks with hidden variables. Mach Learn 29(2-3): 213–244

    Article  MATH  Google Scholar 

  • Blanco R, Inza I, Larrañaga P (2003) Learning Bayesian networks in the space of structures by estimation of distribution algorithms. Int J Intell Syst 18(2): 205–220

    Article  MATH  Google Scholar 

  • Buntine WL (1991) Theory refinement on bayesian networks. In: Proceedings of the seventh annual conference on uncertainty in artificial intelligence, pp 52–60

  • Buntine W (1996) A guide to the literature on learning probabilistic networks from data. IEEE Trans Knowl Data Eng 8(2): 195–210

    Article  Google Scholar 

  • Cano R, Sordo C, Gutiérrez JM (2004) Applications of bayesian networks in meteorology. In: Gámez JA, Moral S, Salmerón A (eds) Advances in Bayesian networks. Springer-Verlag, pp 309–327

  • Chickering DM (1996) Learning Bayesian networks is NP-complete. In: Fisher D, Lenz H (eds) Learning from data: artificial intelligence and statistics V. Springer-Verlag, pp 121–130

  • Chickering DM (2002) Optimal structure identification with greedy search. J Mach Learn Res 3: 507–554

    Article  MathSciNet  Google Scholar 

  • Chickering DM, Geiger D, Heckerman D (1995) Learning bayesian networks: search methods and experimental results. In: Proceedings of the fifth international workshop on artificial intelligence and statistics, pp 112–128

  • Cooper G, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9: 309–347

    MATH  Google Scholar 

  • Cowell RG, Dawid AP, Lauritzen S, Spiegelhalter D (2003) Probabilistic networks and expert systems (Information Science and Statistics). Springer, New York

    Google Scholar 

  • Dash D, Druzdzel MJ (1999) A hybrid anytime algorithm for the construction of causal models from sparse data. In: Proceedings of the sixth annual conference on uncertainty in artificial intelligence (UAI’99), pp 142–149

  • de Campos LM (2006) A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. J Mach Learn Res 7: 2149–2187

    MathSciNet  Google Scholar 

  • de Campos LM, Puerta JM (2001) Stochastic local algorithms for learning belief networks: searching in the space of the orderings. In: 6th European conference on symbolic and quantitative approaches to reasoning with uncertainty (ECSQARU’01), pp 228–239

  • de Campos LM, Fernández-Luna JM, Gámez JA, Puerta JM (2002) Ant colony optimization for learning bayesian networks. Int J Approx Reason 31(3): 291–311

    Article  MATH  Google Scholar 

  • Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200): 675–701

    Article  Google Scholar 

  • Friedman N, Nachman I, Pe’er D (1999) Learning Bayesian network structure from massive datasets: the “sparse candidate” algorithm. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence (UAI’99), pp 206–215

  • Friedman N, Linial M, Nachman I, Pe’er D (2000) Using bayesian network to analyze expression data. Comput Biol 7: 601–620

    Article  Google Scholar 

  • Gámez JA, Puerta JM (2005) Constrained score+(local)search methods for learning bayesian networks. In: 8th European conference on symbolic and quantitative approaches to reasoning with uncertainty (ECSQARU-05). LNCS, vol. 3571, pp 161–173

  • Geiger D, Heckerman D, King H, Meek C (2001) Stratified exponential families: graphical models and model selection. Ann Stat 29(2): 505–529

    Article  MATH  MathSciNet  Google Scholar 

  • Haughton DMA (1988) On the choice of a model to fit data from an exponential family. Ann Stat 16(1): 342–355

    Article  MATH  MathSciNet  Google Scholar 

  • Heckerman D (1997) Bayesian networks for data mining. Data Min Knowl Disc 1: 79–119

    Article  Google Scholar 

  • Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20(3): 197–243

    MATH  Google Scholar 

  • Holm S (1979) A simple sequential rejective multiple Bonferroni procedures to pairwise multiple comparisons in balanced repeated measures designs. Comput Stat Q 6: 219–231

    MathSciNet  Google Scholar 

  • Jensen CS (1997) Blocking Gibbs sampling for inference in large and complex Bayesian networks with applications in genetics. PhD thesis, Aalborg University, Denmark

  • Jensen A, Jensen F (1996) Midas–an influence diagram for management of mildew in winter wheat. In: Proceedings of the 12th annual conference on uncertainty in artificial intelligence (UAI-96), pp 349–356

  • Jensen FV, Nielsen TD (2007) Bayesian networks and decision graphs, 2nd edn. Springer, New York

    Book  MATH  Google Scholar 

  • Kristensen K, Rasmussen IA (2002) The use of a Bayesian network in the design of a decision support system for growing malting barley without use of pesticides. Comput Electron Agric 33: 197–217

    Article  Google Scholar 

  • Larrañaga P, Poza M, Yurramendi Y, Murga RH, Kuijpers CMH (1996) Structure learning of Bayesian networks by genetic algorithms: a performance analysis of control parameters. IEEE Trans Pattern Anal Mach Intell 18(9): 912–926

    Article  Google Scholar 

  • Margaritis D (2003) Learning bayesian model structure from data. PhD thesis, Carnegie Mellon University

  • Moral S (2004) An empirical comparison of score measures for independence. In: Proceedings of the 10th IPMU international conference, pp 1307–1314

  • Nägele A, Dejori M, Stetter M (2007) Bayesian substructure learning—approximate learning of very large network structures. In: Proceedings of the 18th European conference on machine learning (ECML ’07), pp 238–249

  • Neapolitan R (2003) Learning Bayesian networks. Prentice Hall, Upper Saddle River

    Google Scholar 

  • Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Peña JM, Nilsson R, Björkegren J, Tegnér J (2007) Towards scalable and data efficient learning of Markov boundaries. Int J Approx Reason 45(2): 211–232

    Article  MATH  Google Scholar 

  • Robinson R (1977) Counting unlabeled acyclic digraphs. In: Combinatorial mathematics, vol 622. Springer-Verlag, Berlin, pp 28–43

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2): 461–464

    Article  MATH  Google Scholar 

  • Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen M, Brown P, Botstein D, Futcher B (1998) Comprehensive identification of cell cycle-regulated genes of the yeast sacccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9: 3273–3297

    Google Scholar 

  • Spirtes P, Glymour C, Scheines R (1993) Causation, prediction and search. In: Lecture notes in statistics, vol 81. Springer Verlag, New York

  • Statnikov A, Tsamardinos I, Aliferis CF (2003) An algorithm for generation of large Bayesian networks. Tech Rep DSL TR-03-01, Vanderbilt University

  • Tsamardinos I, Brown LE, Aliferis CF (2006a) The max- min hill-climbing bayesian network structure learning algorithm. Mach Learn 65(1): 31–78

    Article  Google Scholar 

  • Tsamardinos I, Statnikov A, Brown LE, Aliferis CF (2006b) Generating realistic large Bayesian networks by tiling. In: Proceedings of the Nineteenth International Florida Artificial Intelligence Research Society FLAIRS conference, pp 592–597

  • van Dijk S, van der Gaag LC, Thierens D (2003) A skeleton-based approach to learning bayesian networks from data. In: In proceedings of the 7th European conference on principles and practice of knowledge discovery in databases (PKDD’03), pp 132–143

  • Verma T, Pearl J (1991) Equivalence and synthesis of causal models. In: Proceedings of the sixth annual conference on uncertainty in artificial intelligence (UAI’90). Elsevier Science Inc., pp 255–270

  • WenChen X, Anantha G, Lin X (2008) Improving Bayesian network structure learning with mutual information-based node ordering in the k2 algorithm. IEEE Trans Knowl Data Eng 20(5): 628–640

    Article  Google Scholar 

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

  • Wong ML, Leung KS (2004) An efficient data mining method for learning Bayesian networks using an evolutionary algorithm-based hybrid approach. IEEE Trans Evol Comput 8(4): 378–404

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan L. Mateo.

Additional information

Responsible editor: Charles Elkan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gámez, J.A., Mateo, J.L. & Puerta, J.M. Learning Bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood. Data Min Knowl Disc 22, 106–148 (2011). https://doi.org/10.1007/s10618-010-0178-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-010-0178-6

Keywords

Navigation