Efficient identification of independence networks using mutual information

Bacciu, Davide; Etchells, Terence A.; Lisboa, Paulo J. G.; Whittaker, Joe

doi:10.1007/s00180-012-0320-6

Efficient identification of independence networks using mutual information

Original Paper
Published: 20 March 2012

Volume 28, pages 621–646, (2013)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Davide Bacciu¹,
Terence A. Etchells²,
Paulo J. G. Lisboa² &
…
Joe Whittaker³

428 Accesses
16 Citations
Explore all metrics

Abstract

Conditional independence graphs are now widely applied in science and industry to display interactions between large numbers of variables. However, the computational load of structure identification grows with the number of nodes in the network and the sample size. A tailored version of the PC algorithm is proposed which is based on mutual information tests with a specified testing order, combined with false negative reduction and false positive control. It is found to be competitive with current structure identification methodologies for both estimation accuracy and computational speed and outperforms these in large scale scenarios. The methodology is also shown to approximate dense networks. The comparisons are made on standard benchmarking data sets and an anonymized large scale real life example.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimating psychological networks and their accuracy: A tutorial paper

Article Open access 24 March 2017

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

A survey of Bayesian Network structure learning

Article Open access 17 January 2023

References

Aliferis CF, Tsamardinos I, Statnikov AR, Brown LE (2003) Causal explorer: a causal probabilistic network learning toolkit for biomedical discovery. In: Proceedings of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS’03), pp 371–376
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57(1): 289–300
MathSciNet MATH Google Scholar
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29(4): 1165–1188
Article MathSciNet MATH Google Scholar
Bishop YMM, Fienberg SE, Holland PW (1975) Discrete multivariate analyses: theory and practice. MIT Press, Cambridge, MA
MATH Google Scholar
Cheng J, Greiner R, Kelly J, Bell D, Liu W (2002) Learning Bayesian networks from data: an information-theory based approach. Artif Intell 137(1–2): 43–90
Article MathSciNet MATH Google Scholar
Dawid AP (1979) Conditional independence in statistical theory (with discussion). J R Stat Soc Ser B 41: 1–31
MathSciNet MATH Google Scholar
Fast A, Hay M, Jensen D (2008) Improving accuracy of constraint-based structure learning. Technical report 08-48, University of Massachusetts Amherst, Computer Science Department
Friedman N, Nachman I, Pe’er D (1999) Learning Bayesian network structure from massive datasets: the “sparse candidate” algorithm. In: Proceedings of the Fifteenth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-99), Morgan Kaufmann, San Francisco, CA, pp 206–221
Goebel B, Dawy Z, Hagenauer J, Mueller J (2005) An approximation to the distribution of finite sample size mutual information estimates. In: ICC 2005. 2005 IEEE International Conference on Communications, vol 2, pp 1102–1106
Jensen F, Nielsen T (2007) Bayesian networks and decision graphs. Springer, Berlin
Book MATH Google Scholar
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, Cambridge, MA
Google Scholar
Lauritzen S (1996) Graphical models. Oxford University Press, Oxford
Google Scholar
Margolin A, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera R, Califano A (2006) ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinf 7(Suppl 1): S7
Article Google Scholar
Meek C (1997) Graphical models: selecting causal and statistical models. Ph.D. thesis, Carnegie Mellon University
Murphy K (1997) Bayes net toolbox for matlab. http://people.cs.ubc.ca/~murphyk/Software/BNT/bnt.html
Spirtes P, Meek C (1995) Learning Bayesian networks with discrete variables from data. In: Usama M, Fayyad, Ramasamy Uthurusamy (eds). Proceedings of the first international conference on knowledge discovery and data mining. AAI Press, Navrangpura, Ahmedabad, pp 294–299
Spirtes P, Glymour C, Scheines R (1993) Causation, prediction, and search. Springer, New York
Book MATH Google Scholar
Spirtes P, Glymour C, Scheines R (2000) Causation, prediction and search, 2nd edn. MIT Press, New York, NY
Google Scholar
Tsamardinos I, Brown LE (2008) Bounding the false discovery rate in local Bayesian network learning. In: Proceedings of the Twenty-third AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, IL, USA, 13–17 July 2008, pp 1100–1105
Tsamardinos I, Brown LE, Aliferis CF (2006) The max- min hill-climbing Bayesian network structure learning algorithm. Mach Learn 65(1): 31–78
Article Google Scholar
Whittaker J (1990) Graphical models in applied multivariate statistics. Wiley, Chichester
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Università di Pisa, Pisa, Italy
Davide Bacciu
School of Computing and Mathematical Sciences, Liverpool John Moores University, Liverpool, UK
Terence A. Etchells & Paulo J. G. Lisboa
Department of Mathematics and Statistics, Lancaster University, Lancaster, UK
Joe Whittaker

Authors

Davide Bacciu
View author publications
You can also search for this author in PubMed Google Scholar
Terence A. Etchells
View author publications
You can also search for this author in PubMed Google Scholar
Paulo J. G. Lisboa
View author publications
You can also search for this author in PubMed Google Scholar
Joe Whittaker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Davide Bacciu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bacciu, D., Etchells, T.A., Lisboa, P.J.G. et al. Efficient identification of independence networks using mutual information. Comput Stat 28, 621–646 (2013). https://doi.org/10.1007/s00180-012-0320-6

Download citation

Received: 28 July 2011
Accepted: 09 March 2012
Published: 20 March 2012
Issue Date: April 2013
DOI: https://doi.org/10.1007/s00180-012-0320-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient identification of independence networks using mutual information

Abstract

Access this article

Similar content being viewed by others

Estimating psychological networks and their accuracy: A tutorial paper

Violating the normality assumption may be the lesser of two evils

A survey of Bayesian Network structure learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient identification of independence networks using mutual information

Abstract

Access this article

Similar content being viewed by others

Estimating psychological networks and their accuracy: A tutorial paper

Violating the normality assumption may be the lesser of two evils

A survey of Bayesian Network structure learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation