On Model Selection, Bayesian Networks, and the Fisher Information Integral

Zou, Yuan; Roos, Teemu

doi:10.1007/s00354-016-0002-y

On Model Selection, Bayesian Networks, and the Fisher Information Integral

Special Feature
Published: 19 December 2016

Volume 35, pages 5–27, (2017)
Cite this article

New Generation Computing Aims and scope Submit manuscript

Yuan Zou¹ &
Teemu Roos¹

539 Accesses
2 Citations
Explore all metrics

Abstract

We study BIC-like model selection criteria and in particular, their refinements that include a constant term involving the Fisher information matrix. We perform numerical simulations that enable increasingly accurate approximation of this constant in the case of Bayesian networks. We observe that for complex Bayesian network models, the constant term is a negative number with a very large absolute value that dominates the other terms for small and moderate sample sizes. For networks with a fixed number of parameters, d, the leading term in the complexity penalty, which is proportional to d, is the same. However, as we show, the constant term can vary significantly depending on the network structure even if the number of parameters is fixed. Based on our experiments, we conjecture that the distribution of the nodes’ outdegree is a key factor. Furthermore, we demonstrate that the constant term can have a dramatic effect on model selection performance for small sample sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Bayesian networks from big data with greedy search: computational complexity and efficient implementation

Article Open access 15 February 2019

Multi-parameters Model Selection for Network Inference

Model selection for network data based on spectral information

Article Open access 10 July 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

We denote the binary (base-2) logarithm by $\log $ and the natural logarithm by $\ln $.
Based on the observations in Sect. 4, which make it clear that Bayesian networks with a fixed number of parameters can have large differences in FII values, we evaluate the constants for individual networks instead of using the same complexity penalty for all networks with a fixed number of parameters.

References

Clarke, B.S., Barron, A.R.: Jeffreys prior is asymptotically least favorable under entropy risk. J. Stat. Plan. Inference 41(1), 37–61 (1994)
Article MathSciNet MATH Google Scholar
Grünwald, P.D.: The minimum description length principle. MIT Press, Cambridge (2007)
Google Scholar
Han, C., Carlin, B.P.: Markov chain Monte Carlo methods for computing Bayes factors. J. Am. Stat. Assoc. 96(455), 1122–1132 (2001)
Article Google Scholar
Jeffreys, H.: An invariant form for the prior probability in estimation problems. J. R. Stat. Soc. A. 186(1007), 453–461 (1946)
MathSciNet MATH Google Scholar
Kass, R.E., Raftery, A.E.: Bayes factors. J. Am. Stat. Assoc. 90(430), 773–795 (1995)
Article MathSciNet MATH Google Scholar
Kontkanen, P., Myllymäki, P.: A linear-time algorithm for computing the multinomial stochastic complexity. Inf. Process. Lett. 103(6), 227–233 (2007)
Article MathSciNet MATH Google Scholar
Kontkanen, P., Myllymäki, P., Silander, T., Tirri, H., Grünwald, P.: On predictive distributions and Bayesian networks. Stat. Comput. 10, 39–54 (2000)
Article Google Scholar
Krichevsky, R., Trofimov, V.: The performance of universal coding. IEEE Trans. Inf. Theory 27(2), 199–207 (1981)
Article MathSciNet MATH Google Scholar
Navarro, D.: A note on the applied use of MDL approximations. Neural Comput. 16(9), 1763–1768 (2004)
Article MATH Google Scholar
Rasmussen, C. E., Ghahramani, Z.: “Occam’s razor”. In: Adv. Neural Inf. Process. Syst. (Leen, T., Dietterich T., Tresp, V.), pp. 294–300 (2001)
Rissanen, J.: Fisher information and stochastic complexity. IEEE Trans. Inf. Theory 42(1), 40–47 (1996)
Article MathSciNet MATH Google Scholar
Rissanen, J.: Information and complexity in statistical modeling. Springer, New York (2007)
MATH Google Scholar
Roos, T.: Monte Carlo estimation of minimax regret with an application to MDL model selection. In: Proc. IEEE Information Theory Workshop, IEEE Press, pp. 284–288 (2008)
Roos, T., Rissanen, J: On sequentially normalized maximum likelihood models. In: Proc. Workshop on Information Theoretic Methods in Science and Engineering (WITMSE-08) (Rissanen, J., Liski, E., Tabus, I., Myllymäki, P., Kontoyiannis, I., Heikkonen, J.), Tampere, Finland (2008)
Roos, T., Zou, Y.: Keep it simple stupid—on the effect of lower-order terms in BIC-like criteria. In: Information Theory and Applications Workshop (ITA), IEEE Press, pp. 1–7 (2013)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Article MathSciNet MATH Google Scholar
Shtarkov, Y.M.: Universal sequential coding of single messages. Prob. Inf. Transm. 23(3), 3–17 (1987)
MathSciNet Google Scholar
Silander, T., Roos, T., Kontkanen, P., Myllymäki, P.: Factorized normalized maximum likelihood criterion for learning Bayesian network structures. In: Proc. 4th European Workshop on Probabilistic Graphical Models (PGM-08) (Jaeger, M., Nielsen, T. D.), pp. 257–272 (2008)
Silander, T., Roos, T., Myllymäki, P.: Learning locally minimax optimal Bayesian networks. Int. J. Approx. Reason 51(5), 544–557 (2010)
Article MathSciNet Google Scholar
Ueno, M.: Robust learning Bayesian networks for prior belief. In: Proc. Uncertainty in Artificial Intelligence (UAI-2011) (Cozman, F.G., Pfeffer, A.), Barcelona, Spain, pp. 698–707 (2011)
Xie, Q., Barron, A.R.: Asymptotic minimax regret for data compression, gambling, and prediction. IEEE Trans. Inf. Theory 46(2), 431–445 (2000)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

An earlier version of this paper was presented at the Second Workshop on Advanced Methodologies for Bayesian Networks (AMBN 2015) in Yokohama. The authors thank the anonymous reviewers for insightful comments and suggestions and the organizers of AMBN-2015 for their invitation to submit this work to this special issue. This work was funded in part by the Academy of Finland (Centre-of-Excellence COIN).

Author information

Authors and Affiliations

Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Gustaf Hällströmin katu 2b, 00014, Helsinki, Finland
Yuan Zou & Teemu Roos

Authors

Yuan Zou
View author publications
You can also search for this author in PubMed Google Scholar
Teemu Roos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuan Zou.

About this article

Cite this article

Zou, Y., Roos, T. On Model Selection, Bayesian Networks, and the Fisher Information Integral. New Gener. Comput. 35, 5–27 (2017). https://doi.org/10.1007/s00354-016-0002-y

Download citation

Received: 19 March 2016
Accepted: 20 July 2016
Published: 19 December 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s00354-016-0002-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Model Selection, Bayesian Networks, and the Fisher Information Integral

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Bayesian networks from big data with greedy search: computational complexity and efficient implementation

Multi-parameters Model Selection for Network Inference

Model selection for network data based on spectral information

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

On Model Selection, Bayesian Networks, and the Fisher Information Integral

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Bayesian networks from big data with greedy search: computational complexity and efficient implementation

Multi-parameters Model Selection for Network Inference

Model selection for network data based on spectral information

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation