Skip to main content

A Semi-supervised Approach to Discover Bivariate Causality in Large Biological Data

  • Conference paper
  • First Online:
Machine Learning and Data Mining in Pattern Recognition (MLDM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10934))

Abstract

An important question in microbiology is whether treatment causes changes in gut flora, and whether it also affects metabolism. The reconstruction of causal relations purely from non-temporal observational data is challenging. We address the problem of causal inference in a bivariate case, where the joint distribution of two variables is observed. The state-of-the-art causality inference methods for continuous data suffer from high computational complexity. Some modern approaches are not suitable for categorical data, and others need to estimate and fix multiple hyper-parameters.

In this contribution, we focus on data on discrete domains, and we introduce a novel method of causality discovering which is based on the widely used assumption that if X causes Y, then P(X) and P(Y|X) are independent. We propose to explore a semi-supervised approach where P(Y|X) and P(X) are estimated from labeled and unlabeled data respectively, whereas the marginal probability is estimated potentially from much more (cheap unlabeled) data than the conditional distribution. We validate the proposed method on the standard cause-effect pairs. We illustrate by experiments on several benchmarks of biological network reconstruction that the proposed approach is very competitive in terms of computational time and accuracy compared to the state-of-the-art methods. Finally, we apply the proposed method to an original medical task where we study whether drugs confound human metagenome.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.cs.toronto.edu/~rsalakhu/DBM.html.

  2. 2.

    https://www.cs.toronto.edu/~rsalakhu/rbm_ais.html.

  3. 3.

    http://bnlearn.com/bnrepository/.

  4. 4.

    http://www.math.ku.dk/~peters/code.html.

References

  1. Affeldt, S., Verny, L., Isambert, H.: 3off2: a network reconstruction algorithm based on 2-point and 3-point information statistics. BMC Bioinform. 17(S–2), 12 (2016)

    Article  Google Scholar 

  2. Budhathoki, K., Vreeken, J.: Causal inference by compression. In: ICDM (2016)

    Google Scholar 

  3. Bühlmann, P., Peters, J., Ernest, J.: CAM: causal additive models, high-dimensional order search and penalized regression. Ann. Stat. 42, 2526–2556 (2014)

    Article  MathSciNet  Google Scholar 

  4. Forslund, K., Hildebrand, F., Nielsen, T., Falony, G., Le Chatelier, E., Sunagawa, S., Prifti, E., Viera-Silva, S., Gudmundsdottir, V., Pedersen, H.K., Arumugam, M., Kristiansen, K., Voigt, A.Y., Vestergaard, H., Hercog, R., Costea, P.I., Kultima, J.R., Li, J., Jorgensen, T., Levenez, F., Dore, J., MetaHIT consortium, Nielsen, H.B., Brunak, S., Raes, J., Hansen, T., Wang, J., Ehrlich, S.D., Bork, P., Pedersen, O.: Disentangling the effects of type 2 diabetes and metformin on the human gut microbiota. Nature 528(7581), 262–266 (2015)

    Article  Google Scholar 

  5. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  6. Hauser, A., Bühlmann, P.: Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. JMLR 13, 2409–2464 (2012)

    MathSciNet  MATH  Google Scholar 

  7. Hoyer, P., Janzing, D., Mooij, J., Peters, J., Schölkopf, B.: Nonlinear causal discovery with additive noise models. In: NIPS (2009)

    Google Scholar 

  8. Janzing, D., Mooij, J., Zhang, K., Lemeire, J., Zscheischler, J., Daniusis, P., Streudel, B., Schölkopf, B.: Information-geometric approach to inferring causal directions. Artif. Intell. 182–183, 1–31 (2012)

    Article  MathSciNet  Google Scholar 

  9. Janzing, D., Schölkopf, B.: Causal inference using the algorithmic Markov condition. IEEE Trans. Inf. Theory 56, 5168–5194 (2010)

    Article  MathSciNet  Google Scholar 

  10. Lauritzen, S., Spiegelhalter, D.: Local computation with probabilities on graphical structures and their application to expert systems. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 2(50), 157–224 (1988)

    MathSciNet  MATH  Google Scholar 

  11. Liu, F., Chan, L.: Causal inference on discrete data via estimating distance correlations. Neural Comput. 28, 807–814 (2016)

    MathSciNet  Google Scholar 

  12. Madiraju, A.K., et al.: Metformin suppresses gluconeogenesis by inhibiting mitochondrial glycerophosphate dehydrogenase. Nature 510, 542–546 (2014)

    Article  Google Scholar 

  13. Margolin, A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Favera, R.D., Califano, F.: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinform. 7, S7 (2006)

    Article  Google Scholar 

  14. McCreight, L.J., Bailey, C.J., Pearson, E.R.: Metformin and the gastrointestinal tract. Diabetologia 59, 426–435 (2016)

    Article  Google Scholar 

  15. Mooij, J.M., Peters, J., Janzing, D., Zscheischler, J., Schölkopf, B.: Distinguishing cause from effect using observational data: methods and benchmarks. JMLR 17, 1–102 (2016)

    MathSciNet  MATH  Google Scholar 

  16. Nguyen, T.D., Phung, D., Huynh, V., Lee, T.: Supervised restricted Boltzmann machines. In: UAI (2017)

    Google Scholar 

  17. Pearl, J.: Causality: Models, Reasoning and Inference, 2nd edn. Cambridge University Press, Cambridge (2009)

    Book  Google Scholar 

  18. Pearson, K.: Notes on the history of correlation. Biometrika 13, 25–45 (1920)

    Article  Google Scholar 

  19. Peters, J., Mooij, J., Janzing, D., Schölkopf, B.: Causal discovery with continuous additive noise models. JMLR 1(15), 2009–2053 (2014)

    MathSciNet  MATH  Google Scholar 

  20. Ramsey, J., Zhang, J., Spirtes, P.: Adjacency-faithfulness and conservative causal inference. In: UAI (2006)

    Google Scholar 

  21. Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D.A., Nolan, G.P.: Causal protein-signaling networks derived from multiparameter single-cell data. Science 308, 523–529 (2005)

    Article  Google Scholar 

  22. Salakhutdinov, R., Hinton, G.: Deep Boltzmann machines. In: AISTATS (2009)

    Google Scholar 

  23. Salakhutdinov, R., Larochelle, H.: Efficient learning of deep Boltzmann machines. In: AISTATS (2010)

    Google Scholar 

  24. Salakhutdinov, R., Murray, I.: On the qualitative analysis of deep belief networks. In: ICML (2008)

    Google Scholar 

  25. Schölkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K.: On causal and anticausal learning. In: ICML (2012)

    Google Scholar 

  26. Sgouritsa, E., Janzing, D., Hennig, P., Schölkopf, B.: Inference of cause and effect with unsupervised inverse regression. In: AISTATS (2015)

    Google Scholar 

  27. Shaw, R.J., et al.: The kinase LKB1 mediates glucose homeostasis in liver and therapeutic effects of metformin. Science 310, 1642–1646 (2005)

    Article  Google Scholar 

  28. Shimizu, S., Hoyer, O., Hyvärinen, A., Kerminen, J.: A linear non-Gaussian acyclic model for causal discovery. JMLR 7, 2003–2030 (2006)

    MathSciNet  MATH  Google Scholar 

  29. Sokolovska, N., Cappé, O., Yvon, F.: The asymptotics of semi-supervised learning in discriminative probabilistic models. In: ICML (2008)

    Google Scholar 

  30. Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search. MIT Press, Cambridge (2000)

    MATH  Google Scholar 

  31. Wu, H., Esteve, E., Tremaroli, V., Khan, M.T., Caesar, R., Manneras-Holm, L., Stahlman, M., Olsson, L.M., Serino, M., Planas-Felix, M., Xifra, G., Mercader, J.M., Torrents, D., Burcelin, R., Ricart, W., Perkins, R., Fernandez-Real, J.M., Backhed, F.: Metformin alters the gut microbiome of individuals with treatment-naive type 2 diabetes, contributing to the therapeutic effects of the drug. Nat. Med. 7(23), 850–858 (2017)

    Article  Google Scholar 

  32. Zhang, K., Hyvärinen, A.: On the identifiability of the post-nonlinear causal models. In: UAI (2009)

    Google Scholar 

  33. Zscheischler, J., Janzing, D., Zhang, K.: Testing whether linear equations are causal: a free probability theory approach. In: UAI (2009)

    Google Scholar 

Download references

Acknowledgements

This work was supported by PEPS (CNRS, France), project MaLeFHYCe.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nataliya Sokolovska .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sokolovska, N., Permiakova, O., Forslund, S.K., Zucker, JD. (2018). A Semi-supervised Approach to Discover Bivariate Causality in Large Biological Data. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2018. Lecture Notes in Computer Science(), vol 10934. Springer, Cham. https://doi.org/10.1007/978-3-319-96136-1_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96136-1_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96135-4

  • Online ISBN: 978-3-319-96136-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics