Abstract
An important question in microbiology is whether treatment causes changes in gut flora, and whether it also affects metabolism. The reconstruction of causal relations purely from non-temporal observational data is challenging. We address the problem of causal inference in a bivariate case, where the joint distribution of two variables is observed. The state-of-the-art causality inference methods for continuous data suffer from high computational complexity. Some modern approaches are not suitable for categorical data, and others need to estimate and fix multiple hyper-parameters.
In this contribution, we focus on data on discrete domains, and we introduce a novel method of causality discovering which is based on the widely used assumption that if X causes Y, then P(X) and P(Y|X) are independent. We propose to explore a semi-supervised approach where P(Y|X) and P(X) are estimated from labeled and unlabeled data respectively, whereas the marginal probability is estimated potentially from much more (cheap unlabeled) data than the conditional distribution. We validate the proposed method on the standard cause-effect pairs. We illustrate by experiments on several benchmarks of biological network reconstruction that the proposed approach is very competitive in terms of computational time and accuracy compared to the state-of-the-art methods. Finally, we apply the proposed method to an original medical task where we study whether drugs confound human metagenome.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Affeldt, S., Verny, L., Isambert, H.: 3off2: a network reconstruction algorithm based on 2-point and 3-point information statistics. BMC Bioinform. 17(S–2), 12 (2016)
Budhathoki, K., Vreeken, J.: Causal inference by compression. In: ICDM (2016)
Bühlmann, P., Peters, J., Ernest, J.: CAM: causal additive models, high-dimensional order search and penalized regression. Ann. Stat. 42, 2526–2556 (2014)
Forslund, K., Hildebrand, F., Nielsen, T., Falony, G., Le Chatelier, E., Sunagawa, S., Prifti, E., Viera-Silva, S., Gudmundsdottir, V., Pedersen, H.K., Arumugam, M., Kristiansen, K., Voigt, A.Y., Vestergaard, H., Hercog, R., Costea, P.I., Kultima, J.R., Li, J., Jorgensen, T., Levenez, F., Dore, J., MetaHIT consortium, Nielsen, H.B., Brunak, S., Raes, J., Hansen, T., Wang, J., Ehrlich, S.D., Bork, P., Pedersen, O.: Disentangling the effects of type 2 diabetes and metformin on the human gut microbiota. Nature 528(7581), 262–266 (2015)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Hauser, A., Bühlmann, P.: Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. JMLR 13, 2409–2464 (2012)
Hoyer, P., Janzing, D., Mooij, J., Peters, J., Schölkopf, B.: Nonlinear causal discovery with additive noise models. In: NIPS (2009)
Janzing, D., Mooij, J., Zhang, K., Lemeire, J., Zscheischler, J., Daniusis, P., Streudel, B., Schölkopf, B.: Information-geometric approach to inferring causal directions. Artif. Intell. 182–183, 1–31 (2012)
Janzing, D., Schölkopf, B.: Causal inference using the algorithmic Markov condition. IEEE Trans. Inf. Theory 56, 5168–5194 (2010)
Lauritzen, S., Spiegelhalter, D.: Local computation with probabilities on graphical structures and their application to expert systems. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 2(50), 157–224 (1988)
Liu, F., Chan, L.: Causal inference on discrete data via estimating distance correlations. Neural Comput. 28, 807–814 (2016)
Madiraju, A.K., et al.: Metformin suppresses gluconeogenesis by inhibiting mitochondrial glycerophosphate dehydrogenase. Nature 510, 542–546 (2014)
Margolin, A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Favera, R.D., Califano, F.: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinform. 7, S7 (2006)
McCreight, L.J., Bailey, C.J., Pearson, E.R.: Metformin and the gastrointestinal tract. Diabetologia 59, 426–435 (2016)
Mooij, J.M., Peters, J., Janzing, D., Zscheischler, J., Schölkopf, B.: Distinguishing cause from effect using observational data: methods and benchmarks. JMLR 17, 1–102 (2016)
Nguyen, T.D., Phung, D., Huynh, V., Lee, T.: Supervised restricted Boltzmann machines. In: UAI (2017)
Pearl, J.: Causality: Models, Reasoning and Inference, 2nd edn. Cambridge University Press, Cambridge (2009)
Pearson, K.: Notes on the history of correlation. Biometrika 13, 25–45 (1920)
Peters, J., Mooij, J., Janzing, D., Schölkopf, B.: Causal discovery with continuous additive noise models. JMLR 1(15), 2009–2053 (2014)
Ramsey, J., Zhang, J., Spirtes, P.: Adjacency-faithfulness and conservative causal inference. In: UAI (2006)
Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D.A., Nolan, G.P.: Causal protein-signaling networks derived from multiparameter single-cell data. Science 308, 523–529 (2005)
Salakhutdinov, R., Hinton, G.: Deep Boltzmann machines. In: AISTATS (2009)
Salakhutdinov, R., Larochelle, H.: Efficient learning of deep Boltzmann machines. In: AISTATS (2010)
Salakhutdinov, R., Murray, I.: On the qualitative analysis of deep belief networks. In: ICML (2008)
Schölkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K.: On causal and anticausal learning. In: ICML (2012)
Sgouritsa, E., Janzing, D., Hennig, P., Schölkopf, B.: Inference of cause and effect with unsupervised inverse regression. In: AISTATS (2015)
Shaw, R.J., et al.: The kinase LKB1 mediates glucose homeostasis in liver and therapeutic effects of metformin. Science 310, 1642–1646 (2005)
Shimizu, S., Hoyer, O., Hyvärinen, A., Kerminen, J.: A linear non-Gaussian acyclic model for causal discovery. JMLR 7, 2003–2030 (2006)
Sokolovska, N., Cappé, O., Yvon, F.: The asymptotics of semi-supervised learning in discriminative probabilistic models. In: ICML (2008)
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search. MIT Press, Cambridge (2000)
Wu, H., Esteve, E., Tremaroli, V., Khan, M.T., Caesar, R., Manneras-Holm, L., Stahlman, M., Olsson, L.M., Serino, M., Planas-Felix, M., Xifra, G., Mercader, J.M., Torrents, D., Burcelin, R., Ricart, W., Perkins, R., Fernandez-Real, J.M., Backhed, F.: Metformin alters the gut microbiome of individuals with treatment-naive type 2 diabetes, contributing to the therapeutic effects of the drug. Nat. Med. 7(23), 850–858 (2017)
Zhang, K., Hyvärinen, A.: On the identifiability of the post-nonlinear causal models. In: UAI (2009)
Zscheischler, J., Janzing, D., Zhang, K.: Testing whether linear equations are causal: a free probability theory approach. In: UAI (2009)
Acknowledgements
This work was supported by PEPS (CNRS, France), project MaLeFHYCe.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Sokolovska, N., Permiakova, O., Forslund, S.K., Zucker, JD. (2018). A Semi-supervised Approach to Discover Bivariate Causality in Large Biological Data. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2018. Lecture Notes in Computer Science(), vol 10934. Springer, Cham. https://doi.org/10.1007/978-3-319-96136-1_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-96136-1_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96135-4
Online ISBN: 978-3-319-96136-1
eBook Packages: Computer ScienceComputer Science (R0)