Abstract
We consider a problem of detecting the conditional dependence between multiple discrete variables. This is a generalization of well-known and widely studied problem of testing the conditional independence between two variables given a third one. The issue is important in various applications. For example, in the context of supervised learning, such test can be used to verify model adequacy of the popular Naive Bayes classifier. In epidemiology, there is a need to verify whether the occurrences of multiple diseases are dependent. However, focusing solely on occurrences of diseases may be misleading, as one has to take into account the confounding variables (such as gender or age) and preferably consider the conditional dependencies between diseases given the confounding variables. To address the aforementioned problem, we propose to use conditional multiinformation (CMI), which is a measure derived from information theory. We prove some new properties of CMI. To account for the uncertainty associated with a given data sample, we propose a formal statistical test of conditional independence based on the empirical version of CMI. The main contribution of the work is determination of the asymptotic distribution of empirical CMI, which leads to construction of the asymptotic test for conditional independence. The asymptotic test is compared with the permutation test and the scaled chi squared test. Simulation experiments indicate that the asymptotic test achieves larger power than the competitive methods thus leading to more frequent detection of conditional dependencies when they occur. We apply the method to detect dependencies in medical data set MIMIC-III.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bellot, A., van der Schaar, M.: Conditional independence testing using generative adversarial networks. In: Advances in Neural Information Processing Systems, vol. 32, pp. 2199–2208 (2019)
Berrett, T.B., Wang, Y., Barber, R.F., Samworth, R.J.: The conditional permutation test for independence while controlling for confounders. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 82(1), 175–197 (2020)
Bühlmann, P., van de Geer, S.: Statistics for High-Dimensional Data, 1st edn. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-642-20192-9
Candès, E., Fan, Y., Janson, L., Lv, J.: Panning for gold: model-x knockoffs for high-dimensional controlled variable selection. J. Roy. Stat. Soc. B 80, 551–577 (2018)
Chanda, P., et al.: Ambience: a novel approach and efficient algorithm for identifying informative genetic and environmental associations with complex phenotypes. Genetics 180, 1191–2010 (2008)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Series in Telecommunications and Signal Processing. Wiley-Interscience (2006)
Dawid, A.P.: Conditional independence in statistical theory. J. Roy. Stat. Soc.: Ser. B (Methodol.) 41(1), 1–15 (1979)
Johnson, A.E.W., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 1–9 (2016)
Kubkowski, M., Mielniczuk, J.: Asymptotic distributions of interaction information. Methodol. Comput. Appl. Probab. 23, 291–315 (2020)
Kullback, S.: Information Theory and Statistics. Peter Smith (1978)
Li, C., Fan, X.: On nonparametric conditional independence tests for continuous variables. WIREs Comput. Stat. 12, 1–11 (2020)
Pearl, J.: Causality. Cambridge University Press, Cambridge (2009)
Rowe, T., Troy, D.: The sampling distribution of the total correlation for multivariate gaussian random variables. Entropy 21, 921 (2019)
Runge, J.: Conditional independence testing based on a nearest neighbour estimator of conditional mutual information. In: Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, PMLR, vol. 84, pp. 938–947 (2018)
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn. MIT Press (2000)
Studený, M.: Asymptotic behaviour of empirical multiinformation. Kybernetika 23, 124–135 (1987)
Studený, M., Vejnarová, J.: The multiinformation as a tool for measuring stochastic dependence. In: Learning in Graphical Models, pp. 66–82. MIT Press (1999)
Tsamardinos, I., Aliferis, C., Statnikov, A.: Algorithms for large scale Markov Blanket discovery. In: FLAIRS Conference, pp. 376–381 (2003)
Tsamardinos, I., Borboudakis, G.: Permutation testing improves Bayesian network learning. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6323, pp. 322–337. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15939-8_21
Tsybakov, A.: Introduction to Nonparametric Estimation, 1st edn. Springer, New York (2009). https://doi.org/10.1007/b13794
Watanabe, S.: Information theoretical analysis of multivariate correlation. IBM J. Res. Dev. 4, 66–82 (1960)
Zhang, K., Peters, J., Janzing, D., Schölkopf, B.: Kernel-based conditional independence test and application in causal discovery. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, UAI 2011, pp. 804–813 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Mielniczuk, J., Teisseyre, P. (2021). Detection of Conditional Dependence Between Multiple Variables Using Multiinformation. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12747. Springer, Cham. https://doi.org/10.1007/978-3-030-77980-1_51
Download citation
DOI: https://doi.org/10.1007/978-3-030-77980-1_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77979-5
Online ISBN: 978-3-030-77980-1
eBook Packages: Computer ScienceComputer Science (R0)