Skip to main content
Log in

An Efficient Bayesian Network Structure Learning Strategy

  • Special Feature
  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

This paper addresses the problem of efficiently finding an optimal Bayesian network structure for maximizing the posterior probability. In particular, we focus on the B& B strategy to save the computational effort associated with finding the largest score. To make the search more efficient, we need a tighter upper bound so that the current score can exceed it more easily. We find two upper bounds and prove that they are tighter than the existing one (Campos and Ji, J Mach Learn Res 12(3):663–689, 2011). Finally, we demonstrate that the proposed two bounds render the search to be much more efficient using the Alarm and Insurance data sets. For example, the search is twice to three times faster for \(n=100\) and almost twice faster for \(n=500\). We also experimentally verify that the overhead due to replacing the existing pruning rule by the proposed one is negligible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Beinlich, I.A., Suermondt, H.J., Chavez, R.M., Cooper, G.F.: The ALARM monitoring system: a case study with two probabilistic inference techniques for belief networks. In: The 2nd European Conference on Artificial Intelligence in Medicine, pp. 247–256. Springer, London (1989)

  2. Binder, J., Koller, D., Russell, S., Kanazawa, K.: Adaptive probabilistic networks with hidden variables. Mach. Learn. 29(2–3):213–244 (1997)

  3. Buntine, W.: Theory refinement on Bayesian networks. In: Uncertainty in Artificial Intelligence, pp. 52–60. Morgan Kaufmann, Los Angels (1991)

  4. Campos, C.P., Ji, Q.: Efficient structure learning of Bayesian networks using constraints. J. Mach. Learn. Res. 12(3), 663–689 (2011)

  5. Chickering, D.M., Meek, C., Heckerman, D.: Large-sample learning of Bayesian networks is NP-hard. In: Uncertainty in Artificial Intelligence, pp. 124–133. Morgan Kaufmann, Acapulco (2003)

  6. Cooper, G.F., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9(4), 309–347 (1992)

    MATH  Google Scholar 

  7. Cussens, J., Bartlett, M.: GOBNILP 1.6.2 User/Developer Manual1. University of York, York (2015)

  8. Fan, X., Malone, B., Yuan, C.: Finding optimal Bayesian network structures with constraints learned from data. In: Uncertainty in Artificial Intelligence, pp. 200–209. AUAI Press, Corvallis (2014)

  9. Jeffreys, H.: Theory of Probability. Oxford University Press, Oxford (1939)

  10. Krichevsky, R.E., Trofimov, V.K.: The performance of universal encoding. IEEE Trans. Inf. Theory IT-27(2), 199–207 (1981)

  11. Ott, S., Imoto, S., Miyano, S.: Finding optimal models for small gene networks. Pac. Symp. Biocomput. 9, 557–567 (2004)

    Google Scholar 

  12. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Representation and Reasoning), 2nd ed. Morgan Kaufmann, Burlington (1988)

  13. Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)

    Article  MATH  Google Scholar 

  14. Silander, T., Myllymaki, P.: A simple approach for finding the globally optimal Bayesian network structure. In: Uncertainty in Artificial Intelligence, pp. 445–452. Morgan Kaufmann, Arlington (2006)

  15. Singh, A.P., Moore, A.W.: Finding optimal Bayesian networks by dynamic programming. Technical Report, Carnegie Mellon University (2005)

  16. Spirtes, P., Glymour, C., Scheines, R.: Causation. Prediction and Search. Springer, Berlin (1993)

    Book  MATH  Google Scholar 

  17. Suzuki, J.: A construction of Bayesian networks from databases based on an MDL principle. In: Uncertainty in Artificial Intelligence, pp. 266–273. Morgan Kaufmann, Washington DC (1993)

  18. Suzuki, J.: Learning Bayesian belief networks based on the minimum description length principle: an efficient algorithm using the b & b technique. In: International Conference on Machine Learning, pp. 462–470. Morgan Kaufmann, Bari (1996)

  19. Suzuki, J.: Efficiently learning Bayesian network structures based on the b&b strategy: a theoretical analysis. In: Advanced Methodologies for Bayesian Networks, Yokohama, Japan (2015). Published also as Lecture Notes on Artificial Intelligence 9095. Springer, Berlin (2016)

  20. Tian, J.: A branch-and-bound algorithm for MDL learning Bayesian networks. In: Uncertainty in Artificial Intelligence, pp. 580–588. Morgan Kaufmann, Stanford (2000)

  21. Ueno, M.: Learning networks determined by the ratio of prior and data. In: Uncertainty in Artificial Intelligence, pp. 598–605 (2010)

Download references

Acknowledgements

The author wishes to express his gratitude to Dr. Jun Kawahara, Nara Institute of Science and Technology for correcting the program of the proposed algorithm.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joe Suzuki.

Appendix

Appendix

We assume that z takes on a value in \(\{1,\ldots ,\gamma \}\) with \(\gamma \ge 2\).

Proof of Lemma 1

In general,

$$\begin{aligned} c(z)+a-j\le \sum _{z'=z}^{\gamma }c(z')+a-j \end{aligned}$$

for \(j=1,\ldots ,c(z)\) and \(z=1,\ldots ,\gamma \). By multiplying both sides over all \(j=1,\ldots ,c(z)\), we have

$$\begin{aligned} \frac{\Gamma (c(z)+a)}{\Gamma (a)}\le \frac{\Gamma (\sum _{z'=z}^{\gamma }c(z')+a)}{\Gamma (\sum _{z'=z+1}^{\gamma }c(z')+a)} \end{aligned}$$

for \(z=1,\ldots ,\gamma \). By further multiplying both sides over all \(z=1,\ldots ,\gamma \), we obtain the lemma.

Proof of Lemma 2

Note that in general, for \(0<p<q\), the function \(f_{p,q}(u):=\frac{u+p}{u+q}<1\) is monotonously increasing with \(u\ge 0\). Thus,

$$\begin{aligned} \frac{c(z)+a-j}{c(z)+b-j}\le \frac{\sum _{z'=z}^\gamma c(z')+a-j}{\sum _{z'=z}^\gamma c(z')+b-j} \end{aligned}$$

for \(j=1,\ldots ,c(z)\) and \(z=1,\ldots ,\gamma \). By multiplying both sides over all \(j=1,\ldots ,c(z)\), we have

$$\begin{aligned}&\frac{\Gamma (c(z)+a)}{\Gamma (a)}\cdot \frac{\Gamma (b)}{\Gamma (c(z)+b)}\\&\quad \le \frac{\Gamma (\sum _{z'=z}^\gamma c(z')+a)}{\Gamma (\sum _{z'=z+1}^\gamma c(z')+a)}\cdot \frac{\Gamma (\sum _{z'=z+1}^\gamma c(z')+b)}{\Gamma (\sum _{z'=z}^\gamma c(z')+b)} \\ \end{aligned}$$

for \(z=1,\ldots ,\gamma \). By further multiplying both sides over all \(z=1,\ldots ,\gamma \), we obtain the lemma.

Proof of Lemma 3

Similar to the proof of Lemma 2, we have

$$\begin{aligned} \frac{c(z)+a(z)-j}{c(z)+\sum _{z'=1}^\gamma a(z')-j}\le & {} \frac{\sum _{z'=z}^\gamma c(z')+a(z)-j}{\sum _{z'=z}^\gamma c(z')+\sum _{z'=1}^\gamma a(z')-j}\\\le & {} \frac{\sum _{z'=z}^\gamma c(z')+\max _{z'} a(z')-j}{\sum _{z'=z}^\gamma c(z')+\sum _{z'=1}^\gamma a(z')-j} \end{aligned}$$

for \(j=1,\ldots ,c(z)\) and \(z=1,\ldots ,\gamma \). By multiplying both sides over all \(j=1,\ldots ,c(z)\), we have

$$\begin{aligned}&\frac{\Gamma (c(z)+a(z))}{\Gamma (a(z))}\cdot \frac{\Gamma (\sum _{z'=1}^\gamma a(z'))}{\Gamma (c(z)+\sum _{z'=1}^\gamma a(z'))}\\&\quad \le \frac{\sum _{z'=z}^\gamma c(z')+\max _{z'} a(z')-1}{\sum _{z'=z}^\gamma c(z')+\sum _{z'} a(z')-1}\cdots \frac{\sum _{z'=z+1}^\gamma c(z')+\max _{z'} a(z')}{\sum _{z'=z+1}^\gamma c(z')+\sum _{z'=1}^\gamma a(z')}\\&\quad = \frac{\Gamma (\sum _{z'=z}^\gamma c(z')+\max _{z'}a(z'))}{\Gamma (\sum _{z'=z+1}^{\gamma } c(z')+\max _{z'} a(z'))} \frac{\Gamma (\sum _{z'=z+1}^{\gamma } c(z')+\sum _{z'=1}^\gamma a(z'))}{\Gamma (\sum _{z'=z}^\gamma c(z')+\sum _{z'=1}^\gamma a(z'))} \end{aligned}$$

for \(z=1,\ldots ,\gamma \). By further multiplying both sides over all \(z=1,\ldots ,\gamma \), we obtain the lemma.

Proof of Theorem 5

We prove

$$\begin{aligned}&\frac{\Gamma (\beta /2)}{\Gamma (n+\beta /2)}\prod _y \frac{\Gamma (c(y)+1/2)}{\Gamma (1/2)}\cdot \prod _y \left\{ \frac{\Gamma (\alpha /2)}{\Gamma (c(y)+\alpha /2)} \prod _x \frac{\Gamma (c(x,y)+1/2)}{\Gamma (1/2)}\right\} \\&\quad \ge \frac{\Gamma (\alpha \beta /2)}{\Gamma (n+\alpha \beta /2)}\prod _x\prod _y \frac{\Gamma (c(x,y)+1/2)}{\Gamma (1/2)}, \end{aligned}$$

which is equivalent to

$$\begin{aligned} \frac{\prod _y \Gamma (c(y)+1/2)}{\Gamma (n+\beta /2)}\cdot \frac{\Gamma (\beta /2)}{\Gamma (1/2)^\beta } \ge \frac{\prod _y \Gamma (c(y)+\alpha /2)}{\Gamma (n+\alpha \beta /2)}\cdot \frac{\Gamma (\alpha \beta /2)}{\Gamma (1/2)^{\alpha \beta }} \end{aligned}$$
(27)

We regard (27) as a function of \(\alpha \ge 1\). We find that the both sides are equal when \(\alpha =1\), that \(\Gamma (1/2)^{\alpha \beta }/\Gamma (\alpha \beta )\) decreases with \(\alpha \ge 1\), and that \(B(x+r_1,\ldots ,x+r_m)\) with constants \(r_1,\ldots ,r_m>0\) decreases with \(x>0\), where \(B(r_1,\ldots ,r_n)\) is the Beta function defined by \(\frac{\Gamma (\sum _{i=1}^m r_i)}{\prod _{i=1}^m \Gamma (r_i)}\). Those three facts imply the theorem.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Suzuki, J. An Efficient Bayesian Network Structure Learning Strategy. New Gener. Comput. 35, 105–124 (2017). https://doi.org/10.1007/s00354-016-0007-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00354-016-0007-6

Keywords

Navigation