Abstract
This paper addresses the problem of efficiently finding an optimal Bayesian network structure for maximizing the posterior probability. In particular, we focus on the B& B strategy to save the computational effort associated with finding the largest score. To make the search more efficient, we need a tighter upper bound so that the current score can exceed it more easily. We find two upper bounds and prove that they are tighter than the existing one (Campos and Ji, J Mach Learn Res 12(3):663–689, 2011). Finally, we demonstrate that the proposed two bounds render the search to be much more efficient using the Alarm and Insurance data sets. For example, the search is twice to three times faster for \(n=100\) and almost twice faster for \(n=500\). We also experimentally verify that the overhead due to replacing the existing pruning rule by the proposed one is negligible.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Beinlich, I.A., Suermondt, H.J., Chavez, R.M., Cooper, G.F.: The ALARM monitoring system: a case study with two probabilistic inference techniques for belief networks. In: The 2nd European Conference on Artificial Intelligence in Medicine, pp. 247–256. Springer, London (1989)
Binder, J., Koller, D., Russell, S., Kanazawa, K.: Adaptive probabilistic networks with hidden variables. Mach. Learn. 29(2–3):213–244 (1997)
Buntine, W.: Theory refinement on Bayesian networks. In: Uncertainty in Artificial Intelligence, pp. 52–60. Morgan Kaufmann, Los Angels (1991)
Campos, C.P., Ji, Q.: Efficient structure learning of Bayesian networks using constraints. J. Mach. Learn. Res. 12(3), 663–689 (2011)
Chickering, D.M., Meek, C., Heckerman, D.: Large-sample learning of Bayesian networks is NP-hard. In: Uncertainty in Artificial Intelligence, pp. 124–133. Morgan Kaufmann, Acapulco (2003)
Cooper, G.F., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9(4), 309–347 (1992)
Cussens, J., Bartlett, M.: GOBNILP 1.6.2 User/Developer Manual1. University of York, York (2015)
Fan, X., Malone, B., Yuan, C.: Finding optimal Bayesian network structures with constraints learned from data. In: Uncertainty in Artificial Intelligence, pp. 200–209. AUAI Press, Corvallis (2014)
Jeffreys, H.: Theory of Probability. Oxford University Press, Oxford (1939)
Krichevsky, R.E., Trofimov, V.K.: The performance of universal encoding. IEEE Trans. Inf. Theory IT-27(2), 199–207 (1981)
Ott, S., Imoto, S., Miyano, S.: Finding optimal models for small gene networks. Pac. Symp. Biocomput. 9, 557–567 (2004)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Representation and Reasoning), 2nd ed. Morgan Kaufmann, Burlington (1988)
Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Silander, T., Myllymaki, P.: A simple approach for finding the globally optimal Bayesian network structure. In: Uncertainty in Artificial Intelligence, pp. 445–452. Morgan Kaufmann, Arlington (2006)
Singh, A.P., Moore, A.W.: Finding optimal Bayesian networks by dynamic programming. Technical Report, Carnegie Mellon University (2005)
Spirtes, P., Glymour, C., Scheines, R.: Causation. Prediction and Search. Springer, Berlin (1993)
Suzuki, J.: A construction of Bayesian networks from databases based on an MDL principle. In: Uncertainty in Artificial Intelligence, pp. 266–273. Morgan Kaufmann, Washington DC (1993)
Suzuki, J.: Learning Bayesian belief networks based on the minimum description length principle: an efficient algorithm using the b & b technique. In: International Conference on Machine Learning, pp. 462–470. Morgan Kaufmann, Bari (1996)
Suzuki, J.: Efficiently learning Bayesian network structures based on the b&b strategy: a theoretical analysis. In: Advanced Methodologies for Bayesian Networks, Yokohama, Japan (2015). Published also as Lecture Notes on Artificial Intelligence 9095. Springer, Berlin (2016)
Tian, J.: A branch-and-bound algorithm for MDL learning Bayesian networks. In: Uncertainty in Artificial Intelligence, pp. 580–588. Morgan Kaufmann, Stanford (2000)
Ueno, M.: Learning networks determined by the ratio of prior and data. In: Uncertainty in Artificial Intelligence, pp. 598–605 (2010)
Acknowledgements
The author wishes to express his gratitude to Dr. Jun Kawahara, Nara Institute of Science and Technology for correcting the program of the proposed algorithm.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
We assume that z takes on a value in \(\{1,\ldots ,\gamma \}\) with \(\gamma \ge 2\).
Proof of Lemma 1
In general,
for \(j=1,\ldots ,c(z)\) and \(z=1,\ldots ,\gamma \). By multiplying both sides over all \(j=1,\ldots ,c(z)\), we have
for \(z=1,\ldots ,\gamma \). By further multiplying both sides over all \(z=1,\ldots ,\gamma \), we obtain the lemma.
Proof of Lemma 2
Note that in general, for \(0<p<q\), the function \(f_{p,q}(u):=\frac{u+p}{u+q}<1\) is monotonously increasing with \(u\ge 0\). Thus,
for \(j=1,\ldots ,c(z)\) and \(z=1,\ldots ,\gamma \). By multiplying both sides over all \(j=1,\ldots ,c(z)\), we have
for \(z=1,\ldots ,\gamma \). By further multiplying both sides over all \(z=1,\ldots ,\gamma \), we obtain the lemma.
Proof of Lemma 3
Similar to the proof of Lemma 2, we have
for \(j=1,\ldots ,c(z)\) and \(z=1,\ldots ,\gamma \). By multiplying both sides over all \(j=1,\ldots ,c(z)\), we have
for \(z=1,\ldots ,\gamma \). By further multiplying both sides over all \(z=1,\ldots ,\gamma \), we obtain the lemma.
Proof of Theorem 5
We prove
which is equivalent to
We regard (27) as a function of \(\alpha \ge 1\). We find that the both sides are equal when \(\alpha =1\), that \(\Gamma (1/2)^{\alpha \beta }/\Gamma (\alpha \beta )\) decreases with \(\alpha \ge 1\), and that \(B(x+r_1,\ldots ,x+r_m)\) with constants \(r_1,\ldots ,r_m>0\) decreases with \(x>0\), where \(B(r_1,\ldots ,r_n)\) is the Beta function defined by \(\frac{\Gamma (\sum _{i=1}^m r_i)}{\prod _{i=1}^m \Gamma (r_i)}\). Those three facts imply the theorem.
About this article
Cite this article
Suzuki, J. An Efficient Bayesian Network Structure Learning Strategy. New Gener. Comput. 35, 105–124 (2017). https://doi.org/10.1007/s00354-016-0007-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00354-016-0007-6