Abstract
Learning Bayesian networks is known to be an NP-hard problem and that is the reason why the application of a heuristic search has proven advantageous in many domains. This learning approach is computationally efficient and, even though it does not guarantee an optimal result, many previous studies have shown that it obtains very good solutions. Hill climbing algorithms are particularly popular because of their good trade-off between computational demands and the quality of the models learned. In spite of this efficiency, when it comes to dealing with high-dimensional datasets, these algorithms can be improved upon, and this is the goal of this paper. Thus, we present an approach to improve hill climbing algorithms based on dynamically restricting the candidate solutions to be evaluated during the search process. This proposal, dynamic restriction, is new because other studies available in the literature about restricted search in the literature are based on two stages rather than only one as it is presented here. In addition to the aforementioned advantages of hill climbing algorithms, we show that under certain conditions the model they return is a minimal I-map of the joint probability distribution underlying the training data, which is a nice theoretical property with practical implications. In this paper we provided theoretical results that guarantee that, under these same conditions, the proposed algorithms also output a minimal I-map. Furthermore, we experimentally test the proposed algorithms over a set of different domains, some of them quite large (up to 800 variables), in order to study their behavior in practice.
Similar content being viewed by others
References
Abellán J, Gómez-Olmedo M, Moral S (2006) Some variations on the PC algorithm. In: Proceedings of the 3rd European workshop on probabilistic graphical models (PGM-06), pp 1–8
Acid S, de Campos LM (2001) A hybrid methodology for learning belief networks: benedict. Int J Approx Reason 27(3): 235–262
Acid S, de Campos LM (2003) Searching for Bayesian network structures in the space of restricted acyclic partially directed graphs. J Artif Intell Res 18: 445–490
Andreassen S, Jensen FV, Andersen SK, Falck B, Kjærulff U, Woldbye M, Sørensen AR, Rosenfalck A, Jensen F (1989) MUNIN—an expert EMG assistant. In: Desmedt JE (eds) Computer-aided electromyography and expert systems, chap 21. Elsevier, Amsterdam
Beinlich IA, Suermondt HJ, Chavez RM, Cooper GF (1989) The ALARM monitoring system: a case study with two probabilistic inference techniques for belief networks. In: Second European conference on artificial intelligence in medicine, vol 38. Springer-Verlag, Berlin, pp 247–256
Binder J, Koller D, Russell SJ, Kanazawa K (1997) Adaptive probabilistic networks with hidden variables. Mach Learn 29(2-3): 213–244
Blanco R, Inza I, Larrañaga P (2003) Learning Bayesian networks in the space of structures by estimation of distribution algorithms. Int J Intell Syst 18(2): 205–220
Buntine WL (1991) Theory refinement on bayesian networks. In: Proceedings of the seventh annual conference on uncertainty in artificial intelligence, pp 52–60
Buntine W (1996) A guide to the literature on learning probabilistic networks from data. IEEE Trans Knowl Data Eng 8(2): 195–210
Cano R, Sordo C, Gutiérrez JM (2004) Applications of bayesian networks in meteorology. In: Gámez JA, Moral S, Salmerón A (eds) Advances in Bayesian networks. Springer-Verlag, pp 309–327
Chickering DM (1996) Learning Bayesian networks is NP-complete. In: Fisher D, Lenz H (eds) Learning from data: artificial intelligence and statistics V. Springer-Verlag, pp 121–130
Chickering DM (2002) Optimal structure identification with greedy search. J Mach Learn Res 3: 507–554
Chickering DM, Geiger D, Heckerman D (1995) Learning bayesian networks: search methods and experimental results. In: Proceedings of the fifth international workshop on artificial intelligence and statistics, pp 112–128
Cooper G, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9: 309–347
Cowell RG, Dawid AP, Lauritzen S, Spiegelhalter D (2003) Probabilistic networks and expert systems (Information Science and Statistics). Springer, New York
Dash D, Druzdzel MJ (1999) A hybrid anytime algorithm for the construction of causal models from sparse data. In: Proceedings of the sixth annual conference on uncertainty in artificial intelligence (UAI’99), pp 142–149
de Campos LM (2006) A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. J Mach Learn Res 7: 2149–2187
de Campos LM, Puerta JM (2001) Stochastic local algorithms for learning belief networks: searching in the space of the orderings. In: 6th European conference on symbolic and quantitative approaches to reasoning with uncertainty (ECSQARU’01), pp 228–239
de Campos LM, Fernández-Luna JM, Gámez JA, Puerta JM (2002) Ant colony optimization for learning bayesian networks. Int J Approx Reason 31(3): 291–311
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200): 675–701
Friedman N, Nachman I, Pe’er D (1999) Learning Bayesian network structure from massive datasets: the “sparse candidate” algorithm. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence (UAI’99), pp 206–215
Friedman N, Linial M, Nachman I, Pe’er D (2000) Using bayesian network to analyze expression data. Comput Biol 7: 601–620
Gámez JA, Puerta JM (2005) Constrained score+(local)search methods for learning bayesian networks. In: 8th European conference on symbolic and quantitative approaches to reasoning with uncertainty (ECSQARU-05). LNCS, vol. 3571, pp 161–173
Geiger D, Heckerman D, King H, Meek C (2001) Stratified exponential families: graphical models and model selection. Ann Stat 29(2): 505–529
Haughton DMA (1988) On the choice of a model to fit data from an exponential family. Ann Stat 16(1): 342–355
Heckerman D (1997) Bayesian networks for data mining. Data Min Knowl Disc 1: 79–119
Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20(3): 197–243
Holm S (1979) A simple sequential rejective multiple Bonferroni procedures to pairwise multiple comparisons in balanced repeated measures designs. Comput Stat Q 6: 219–231
Jensen CS (1997) Blocking Gibbs sampling for inference in large and complex Bayesian networks with applications in genetics. PhD thesis, Aalborg University, Denmark
Jensen A, Jensen F (1996) Midas–an influence diagram for management of mildew in winter wheat. In: Proceedings of the 12th annual conference on uncertainty in artificial intelligence (UAI-96), pp 349–356
Jensen FV, Nielsen TD (2007) Bayesian networks and decision graphs, 2nd edn. Springer, New York
Kristensen K, Rasmussen IA (2002) The use of a Bayesian network in the design of a decision support system for growing malting barley without use of pesticides. Comput Electron Agric 33: 197–217
Larrañaga P, Poza M, Yurramendi Y, Murga RH, Kuijpers CMH (1996) Structure learning of Bayesian networks by genetic algorithms: a performance analysis of control parameters. IEEE Trans Pattern Anal Mach Intell 18(9): 912–926
Margaritis D (2003) Learning bayesian model structure from data. PhD thesis, Carnegie Mellon University
Moral S (2004) An empirical comparison of score measures for independence. In: Proceedings of the 10th IPMU international conference, pp 1307–1314
Nägele A, Dejori M, Stetter M (2007) Bayesian substructure learning—approximate learning of very large network structures. In: Proceedings of the 18th European conference on machine learning (ECML ’07), pp 238–249
Neapolitan R (2003) Learning Bayesian networks. Prentice Hall, Upper Saddle River
Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Francisco
Peña JM, Nilsson R, Björkegren J, Tegnér J (2007) Towards scalable and data efficient learning of Markov boundaries. Int J Approx Reason 45(2): 211–232
Robinson R (1977) Counting unlabeled acyclic digraphs. In: Combinatorial mathematics, vol 622. Springer-Verlag, Berlin, pp 28–43
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2): 461–464
Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen M, Brown P, Botstein D, Futcher B (1998) Comprehensive identification of cell cycle-regulated genes of the yeast sacccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9: 3273–3297
Spirtes P, Glymour C, Scheines R (1993) Causation, prediction and search. In: Lecture notes in statistics, vol 81. Springer Verlag, New York
Statnikov A, Tsamardinos I, Aliferis CF (2003) An algorithm for generation of large Bayesian networks. Tech Rep DSL TR-03-01, Vanderbilt University
Tsamardinos I, Brown LE, Aliferis CF (2006a) The max- min hill-climbing bayesian network structure learning algorithm. Mach Learn 65(1): 31–78
Tsamardinos I, Statnikov A, Brown LE, Aliferis CF (2006b) Generating realistic large Bayesian networks by tiling. In: Proceedings of the Nineteenth International Florida Artificial Intelligence Research Society FLAIRS conference, pp 592–597
van Dijk S, van der Gaag LC, Thierens D (2003) A skeleton-based approach to learning bayesian networks from data. In: In proceedings of the 7th European conference on principles and practice of knowledge discovery in databases (PKDD’03), pp 132–143
Verma T, Pearl J (1991) Equivalence and synthesis of causal models. In: Proceedings of the sixth annual conference on uncertainty in artificial intelligence (UAI’90). Elsevier Science Inc., pp 255–270
WenChen X, Anantha G, Lin X (2008) Improving Bayesian network structure learning with mutual information-based node ordering in the k2 algorithm. IEEE Trans Knowl Data Eng 20(5): 628–640
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Wong ML, Leung KS (2004) An efficient data mining method for learning Bayesian networks using an evolutionary algorithm-based hybrid approach. IEEE Trans Evol Comput 8(4): 378–404
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Charles Elkan.
Rights and permissions
About this article
Cite this article
Gámez, J.A., Mateo, J.L. & Puerta, J.M. Learning Bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood. Data Min Knowl Disc 22, 106–148 (2011). https://doi.org/10.1007/s10618-010-0178-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-010-0178-6