Abstract
Conditional independence graphs are now widely applied in science and industry to display interactions between large numbers of variables. However, the computational load of structure identification grows with the number of nodes in the network and the sample size. A tailored version of the PC algorithm is proposed which is based on mutual information tests with a specified testing order, combined with false negative reduction and false positive control. It is found to be competitive with current structure identification methodologies for both estimation accuracy and computational speed and outperforms these in large scale scenarios. The methodology is also shown to approximate dense networks. The comparisons are made on standard benchmarking data sets and an anonymized large scale real life example.
Similar content being viewed by others
References
Aliferis CF, Tsamardinos I, Statnikov AR, Brown LE (2003) Causal explorer: a causal probabilistic network learning toolkit for biomedical discovery. In: Proceedings of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS’03), pp 371–376
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57(1): 289–300
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29(4): 1165–1188
Bishop YMM, Fienberg SE, Holland PW (1975) Discrete multivariate analyses: theory and practice. MIT Press, Cambridge, MA
Cheng J, Greiner R, Kelly J, Bell D, Liu W (2002) Learning Bayesian networks from data: an information-theory based approach. Artif Intell 137(1–2): 43–90
Dawid AP (1979) Conditional independence in statistical theory (with discussion). J R Stat Soc Ser B 41: 1–31
Fast A, Hay M, Jensen D (2008) Improving accuracy of constraint-based structure learning. Technical report 08-48, University of Massachusetts Amherst, Computer Science Department
Friedman N, Nachman I, Pe’er D (1999) Learning Bayesian network structure from massive datasets: the “sparse candidate” algorithm. In: Proceedings of the Fifteenth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-99), Morgan Kaufmann, San Francisco, CA, pp 206–221
Goebel B, Dawy Z, Hagenauer J, Mueller J (2005) An approximation to the distribution of finite sample size mutual information estimates. In: ICC 2005. 2005 IEEE International Conference on Communications, vol 2, pp 1102–1106
Jensen F, Nielsen T (2007) Bayesian networks and decision graphs. Springer, Berlin
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, Cambridge, MA
Lauritzen S (1996) Graphical models. Oxford University Press, Oxford
Margolin A, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera R, Califano A (2006) ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinf 7(Suppl 1): S7
Meek C (1997) Graphical models: selecting causal and statistical models. Ph.D. thesis, Carnegie Mellon University
Murphy K (1997) Bayes net toolbox for matlab. http://people.cs.ubc.ca/~murphyk/Software/BNT/bnt.html
Spirtes P, Meek C (1995) Learning Bayesian networks with discrete variables from data. In: Usama M, Fayyad, Ramasamy Uthurusamy (eds). Proceedings of the first international conference on knowledge discovery and data mining. AAI Press, Navrangpura, Ahmedabad, pp 294–299
Spirtes P, Glymour C, Scheines R (1993) Causation, prediction, and search. Springer, New York
Spirtes P, Glymour C, Scheines R (2000) Causation, prediction and search, 2nd edn. MIT Press, New York, NY
Tsamardinos I, Brown LE (2008) Bounding the false discovery rate in local Bayesian network learning. In: Proceedings of the Twenty-third AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, IL, USA, 13–17 July 2008, pp 1100–1105
Tsamardinos I, Brown LE, Aliferis CF (2006) The max- min hill-climbing Bayesian network structure learning algorithm. Mach Learn 65(1): 31–78
Whittaker J (1990) Graphical models in applied multivariate statistics. Wiley, Chichester
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bacciu, D., Etchells, T.A., Lisboa, P.J.G. et al. Efficient identification of independence networks using mutual information. Comput Stat 28, 621–646 (2013). https://doi.org/10.1007/s00180-012-0320-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-012-0320-6