Skip to main content
Log in

Causal discovery on high dimensional data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Existing causal discovery algorithms are usually not effective and efficient enough on high dimensional data. Because the high dimensionality reduces the discovered accuracy and increases the computation complexity. To alleviate these problems, we present a three-phase approach to learn the structure of nonlinear causal models by taking the advantage of feature selection method and two state of the art causal discovery methods. In the first phase, a greedy search method based on Max-Relevance and Min-Redundancy is employed to discover the candidate causal set, a rough skeleton of the causal network is generated accordingly. In the second phase, constraint-based method is explored to discover the accurate skeleton from the rough skeleton. In the third phase, direction learning algorithm IGCI is conducted to distinguish the direction of causalities from the accurate skeleton. The experimental results show that the proposed approach is both effective and scalable, particularly with interesting findings on the high dimensional data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Pearl J (2009) Causality: models, reasoning and inference, 2nd edn. Cambridge University Press, Cambridge, UK

    Book  Google Scholar 

  2. Spirtes P, Glymour CN, Scheines R (2001) Causation, prediction, and search, 2nd edn. MIT Press, Cambridge, MA

    MATH  Google Scholar 

  3. Tsamardinos I, Brown LE, Aliferis CF (2006) The max-min hill-climbing bayesian network structure learning algorithm. Mach Learn 65(1):31–78

    Article  Google Scholar 

  4. Chickering DM (2003) Optimal structure identication with greedy search. J Mach Learn Res 3:507–554

    MATH  MathSciNet  Google Scholar 

  5. Shimizu S, Hoyer PO, Hyvärinen A et al (2006) A linear non-Gaussian acyclic model for causal discovery. J Mach Learn Res 7:2003–2030

    MATH  MathSciNet  Google Scholar 

  6. Hoyer PO, Janzing D, Mooij J et al (2008) Nonlinear causal discovery with additive noise models. NIPS 21:689–696

    Google Scholar 

  7. Peters J, Janzing D, Schölkopf B (2010) Identifying cause and effect on discrete data using additive noise models. In: International Conference on Artificial Intelligence and Statistics, pp 597–604

  8. Janzing D, Mooij J, Zhang K et al (2012) Information-geometric approach to inferring causal directions. Artif Intell 182:1–31

    Article  MathSciNet  Google Scholar 

  9. Herskovits E (1991) Computer-Based Probabilistic-Network Construction. PhD dissertation, Stanford University, Stanford, CA

  10. Chickering DM (2002) Optimal structure identification with greedy search. J Mach Learn Res 3:507–554

    MathSciNet  Google Scholar 

  11. Meek C (1997) Graphical models: selecting causal and statistical models. PhD thesis, Carnegie Mellon University, Pittsburgh, PA

  12. Schwarz GE (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    Article  MATH  Google Scholar 

  13. Peng H, Long F, Ding C (2005) Variable selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  14. Zhang K, Peters J, Janzing D et al (2012) Kernel-based conditional independence test and application in causal discovery. arXiv preprint arXiv: 1202.3775

  15. Yeung RW (2002) A first course in information theory. Springer, Berlin Heidelberg New York

    Book  Google Scholar 

  16. Kraskov A, Stögbauer H, Grassberger P (2004) Estimating mutual information. Phys Rev E 69(3):066138–066154

    Article  MathSciNet  Google Scholar 

  17. Hadley SW, Pelizzari C, Chen GTY (1996) Registrationof localization images by maximization of mutual information. In: Proceedings of Annual Meeting of the American Association Physicists in Medicine

  18. Kwak N, Choi CH (2002) Input feature selection by mutual information based on parzen window. IEEE Trans Pattern Anal Mach Intell 24(12):1667–1671

    Article  Google Scholar 

  19. http://www.cs.huji.ac.il/site/labs/compbio/Repository/

  20. Kelly L, Clark J, Gilliland DG (2002) Comprehensive genotypic analysis of leukemia: clinical and therapeutic implications. Curr Opin Oncol 14(1):10–18

    Article  Google Scholar 

  21. Wong ETL, Jenne DE, Zimmer M et al (1999) Changes in chromatin organization at the neutrophil elastase locus associated with myeloid cell differentiation. Blood 94(11):3730–3736

    Google Scholar 

  22. Gullberg M, Noreus K, Brattsand G et al (1990) Purification an characterization of a 19-kilodalton intracellular protein. An activation-regulated putative protein kinase C substrate of T lymphocytes. J Biol Chem 265(29):17499–17505

    Google Scholar 

  23. Tang LJ, Jiang JH, Wu HL et al (2009) Variable selection using probability density function similarity for support vector machine classification of high dimensional microarray data. Talanta 79(2):260–267

    Article  Google Scholar 

Download references

Acknowledgments

This work is financially supported by Natural Science Foundation of China (61100148, 61202269, 61472089), Foundation for Distinguished Young Talents in Higher Education of Guangdong, China (LYM11060), Key Technology Research and Development Programs of Guangdong Province (2012B01010029), Science and Technology Plan Project of Guangzhou City(12C42111607, 201200000031, 2013Y2-00034, 2014Y2-00027), Specialized Research Fund for the Doctoral Program of Higher Education (20134420110010), Opening Project of the State Key Laboratory for Novel Software Technology (KFKT2014B03), Discipline Construction and Quality Engineering of Higher Education in Guangdong Province(PT2011JSJ).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hao, Z., Zhang, H., Cai, R. et al. Causal discovery on high dimensional data. Appl Intell 42, 594–607 (2015). https://doi.org/10.1007/s10489-014-0607-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-014-0607-0

Keywords

Navigation