skip to main content
10.1145/1835804.1835944acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

An efficient causal discovery algorithm for linear models

Authors Info & Claims
Published:25 July 2010Publication History

ABSTRACT

Bayesian network learning algorithms have been widely used for causal discovery since the pioneer work [13,18]. Among all existing algorithms, three-phase dependency analysis algorithm (TPDA) [5] is the most efficient one in the sense that it has polynomial-time complexity. However, there are still some limitations to be improved. First, TPDA depends on mutual information-based conditional independence (CI) tests, and so is not easy to be applied to continuous data. In addition, TPDA uses two phases to get approximate skeletons of Bayesian networks, which is not efficient in practice. In this paper, we propose a two-phase algorithm with partial correlation-based CI tests: the first phase of the algorithm constructs a Markov random field from data, which provides a close approximation to the structure of the true Bayesian network; at the second phase, the algorithm removes redundant edges according to CI tests to get the true Bayesian network. We show that two-phase algorithm with partial correlation-based CI tests can deal with continuous data following arbitrary distributions rather than only Gaussian distribution.

References

  1. S. Andreassen, A. Rosenfalck, B. Falck, K. G. Olesen, and S. K. Andersen. Evaluation of the diagnostic performance of the expert emg assistant munin. Electroencephalography and Clinical Neurophysiology/Electromyography and Motor Control, 101(2):129--144, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  2. K. Baba, R. Shibata, and M. Sibuya. Partial correlation and conditional correlation as measures of conditional independence. Australian & New Zealand Journal of Statistics, 46(4):657--664, December 2004.Google ScholarGoogle ScholarCross RefCross Ref
  3. I. A. Beinlich, H. J. Suermondt, R. M. Chavez, and G. F. Cooper. The ALARM Monitoring System: A Case Study with Two Probabilistic Inference Techniques for Belief Networks. In Second European Conf. on Artif. Intell. in Medicine, volume 38, pages 247--256, London, Great Britain, 1989.Google ScholarGoogle Scholar
  4. J. Binder, D. Koller, S. Russell, and K. Kanazawa. Adaptive probabilistic networks with hidden variables. Machine Learning, 29(2):213--244, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Cheng, R. Greiner, J. Kelly, D. A. Bell, and W. Liu. Learning bayesian networks from data: An information-theory based approach. Artif. Intell., 137(1-2):43--90, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Hoyer, A. Hyvarinen, R. Scheines, P. Spirtes, J. Ramsey, G. Lacerda, and S. Shimizu. Causal discovery of linear acyclic models with arbitrary distributions. In Proc. 24th Conf. on Uncertainty in Artif. Intell. (UAI-08), pages 282--289, Corvallis, Oregon, 2008. AUAI Press.Google ScholarGoogle Scholar
  7. A. Hyvarinen, S. Shimizu, and P. Hoyer. Causal modelling combining instantaneous and lagged effects: an identifiable model based on non-gaussianity. In Proc. of the 25th Int. Conf. on Mach. learn., pages 424--431, Helsinki, Finland, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. L. Jensen and F. V. Jensen. Midas: An influence diagram for management of mildew in winter wheat. In Proc. of the Twelfth Annual Conf. on Uncertainty in Artif. Intell., pages 349--356, San Francisco, CA, USA, 1996. Morgan Kaufmann Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Kindermann and J. L. Snell. Markov Random Fields and Their Applications. American Mathematical Society, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  10. K. Kristensen and I. A. Rasmussen. The use of a bayesian network in the design of a decision support system for growing malting barley without use of pesticides. Computers and Electronics in Agriculture, 33(3):197--217, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  11. R. Opgen-Rhein and K. Strimmer. From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data. BMC Systems Biology, 1(37):334--353, 2007.Google ScholarGoogle Scholar
  12. J. Pearl. Causality : Models, Reasoning, and Inference. Cambridge University Press, March 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Pearl and T. Verma. A theory of inferred causation. In Proc. of the Second Int. Conf. on Principles of Knowledge Representation and Reasoning, 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J.-P. Pellet and A. Elisseeff. A partial correlation-based algorithm for causal structure discovery with continuous variables. In IDA, pages 229--239, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Shimizu, P. O. Hoyer, A. Hyvarinen, and A. Kerminen. A linear non-gaussian acyclic model for causal discovery. J. Mach. Learn. Res., 7:2003--2030, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Spirtes and C. Glymour. An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review, 9(1):62--72, October 1991.Google ScholarGoogle ScholarCross RefCross Ref
  17. P. Spirtes, C. Glymour, and R. Scheines. From probability to causality. In Proc. of Advanced Computing for the Social Sciences, 1990.Google ScholarGoogle Scholar
  18. P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. Springer Verlag, Berlin, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  19. Z. Wang and L. Chan. A heuristic partial correlation-based algorithm for causal relationship discovery. In Intell. Data Engineering and Automated Learning - IDEAL 2009, pages 234--241, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An efficient causal discovery algorithm for linear models

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
      July 2010
      1240 pages
      ISBN:9781450300551
      DOI:10.1145/1835804

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 July 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader