skip to main content
10.1145/1553374.1553470acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Regression by dependence minimization and its application to causal inference in additive noise models

Authors Info & Claims
Published:14 June 2009Publication History

ABSTRACT

Motivated by causal inference problems, we propose a novel method for regression that minimizes the statistical dependence between regressors and residuals. The key advantage of this approach to regression is that it does not assume a particular distribution of the noise, i.e., it is non-parametric with respect to the noise distribution. We argue that the proposed regression method is well suited to the task of causal inference in additive noise models. A practical disadvantage is that the resulting optimization problem is generally non-convex and can be difficult to solve. Nevertheless, we report good results on one of the tasks of the NIPS 2008 Causality Challenge, where the goal is to distinguish causes from effects in pairs of statistically dependent variables. In addition, we propose an algorithm for efficiently inferring causal models from observational data for more than two variables. The required number of regressions and independence tests is quadratic in the number of variables, which is a significant improvement over the simple method that tests all possible DAGs.

References

  1. Bollen, K. A. (1989). Structural equations with latent variables. John Wiley & Sons.Google ScholarGoogle Scholar
  2. Geiger, D., & Heckerman, D. (1994). Learning Gaussian networks. Proc. of the 10th Annual Conference on Uncertainty in Artificial Intelligence (pp. 235--243).Google ScholarGoogle ScholarCross RefCross Ref
  3. Gretton, A., Bousquet, O., Smola, A., & Schöölkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. Algorithmic Learning Theory: 16th International Conference (ALT 2005) (pp. 63--78). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Györfi, L., Kohler, M., Krzyżak, A., & Walk, H. (2002). A distribution - free theory of nonparametric regression. New York: Springer Verlag.Google ScholarGoogle Scholar
  5. Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J., & Schöölkopf, B. (2009). Nonlinear causal discovery with additive noise models. In D. Koller, D. Schuurmans, Y. Bengio and L. Bottou (Eds.), Advances in Neural Information Processing Systems 21 (NIPS* 2008), 689--696.Google ScholarGoogle Scholar
  6. Liu, D. C., & Nocedal, J. (1989). On the limited memory method for large scale optimization. Mathematical Programming B, 45, 503--528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mooij, J., Janzing, D., & Schöölkopf, B. (2008). Distinguishing between cause and effect. http://www.kyb.tuebingen.mpg.de/bs/people/jorism/causality-data/.Google ScholarGoogle Scholar
  8. Okazaki, N., & Nocedal, J. (2008). libLBFGS: C library of limited-memory BFGS (L-BFGS). http://www.chokkan.org/software/liblbfgs/.Google ScholarGoogle Scholar
  9. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Rasmussen, C. E., & Williams, C. (2006). Gaussian Processes for Machine Learning. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Rasmussen, C. E., & Williams, C. (2007). GPML code. http://www.gaussianprocess.org/gpml/code.Google ScholarGoogle Scholar
  12. Schölkopf, B., & Smola, A. (2002). Learning with kernels. MIT Press.Google ScholarGoogle Scholar
  13. Shimizu, S., Hoyer, P. O., Hyvärinen, A., & Kerminen, A. J. (2006). A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7, 2003--2030. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search. Springer-Verlag. (2nd ed. MIT Press 2000).Google ScholarGoogle Scholar
  15. Steinwart, I. (2002). On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 2, 67--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Zhang, K., & Hyväärinen, A. (2008). Distinguishing causes from effects using nonlinear acyclic causal models. http://videolectures.net/coa08_zhang_hyvarinen_dcfeu/. Talk at the NIPS 2008 Workshop on Causality: objectives and assessment.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
    June 2009
    1331 pages
    ISBN:9781605585161
    DOI:10.1145/1553374

    Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 June 2009

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate140of548submissions,26%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader