ABSTRACT
Motivated by causal inference problems, we propose a novel method for regression that minimizes the statistical dependence between regressors and residuals. The key advantage of this approach to regression is that it does not assume a particular distribution of the noise, i.e., it is non-parametric with respect to the noise distribution. We argue that the proposed regression method is well suited to the task of causal inference in additive noise models. A practical disadvantage is that the resulting optimization problem is generally non-convex and can be difficult to solve. Nevertheless, we report good results on one of the tasks of the NIPS 2008 Causality Challenge, where the goal is to distinguish causes from effects in pairs of statistically dependent variables. In addition, we propose an algorithm for efficiently inferring causal models from observational data for more than two variables. The required number of regressions and independence tests is quadratic in the number of variables, which is a significant improvement over the simple method that tests all possible DAGs.
- Bollen, K. A. (1989). Structural equations with latent variables. John Wiley & Sons.Google Scholar
- Geiger, D., & Heckerman, D. (1994). Learning Gaussian networks. Proc. of the 10th Annual Conference on Uncertainty in Artificial Intelligence (pp. 235--243).Google ScholarCross Ref
- Gretton, A., Bousquet, O., Smola, A., & Schöölkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. Algorithmic Learning Theory: 16th International Conference (ALT 2005) (pp. 63--78). Google ScholarDigital Library
- Györfi, L., Kohler, M., Krzyżak, A., & Walk, H. (2002). A distribution - free theory of nonparametric regression. New York: Springer Verlag.Google Scholar
- Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J., & Schöölkopf, B. (2009). Nonlinear causal discovery with additive noise models. In D. Koller, D. Schuurmans, Y. Bengio and L. Bottou (Eds.), Advances in Neural Information Processing Systems 21 (NIPS* 2008), 689--696.Google Scholar
- Liu, D. C., & Nocedal, J. (1989). On the limited memory method for large scale optimization. Mathematical Programming B, 45, 503--528. Google ScholarDigital Library
- Mooij, J., Janzing, D., & Schöölkopf, B. (2008). Distinguishing between cause and effect. http://www.kyb.tuebingen.mpg.de/bs/people/jorism/causality-data/.Google Scholar
- Okazaki, N., & Nocedal, J. (2008). libLBFGS: C library of limited-memory BFGS (L-BFGS). http://www.chokkan.org/software/liblbfgs/.Google Scholar
- Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge University Press. Google ScholarDigital Library
- Rasmussen, C. E., & Williams, C. (2006). Gaussian Processes for Machine Learning. MIT Press. Google ScholarDigital Library
- Rasmussen, C. E., & Williams, C. (2007). GPML code. http://www.gaussianprocess.org/gpml/code.Google Scholar
- Schölkopf, B., & Smola, A. (2002). Learning with kernels. MIT Press.Google Scholar
- Shimizu, S., Hoyer, P. O., Hyvärinen, A., & Kerminen, A. J. (2006). A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7, 2003--2030. Google ScholarDigital Library
- Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search. Springer-Verlag. (2nd ed. MIT Press 2000).Google Scholar
- Steinwart, I. (2002). On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 2, 67--93. Google ScholarDigital Library
- Zhang, K., & Hyväärinen, A. (2008). Distinguishing causes from effects using nonlinear acyclic causal models. http://videolectures.net/coa08_zhang_hyvarinen_dcfeu/. Talk at the NIPS 2008 Workshop on Causality: objectives and assessment.Google Scholar
Recommendations
A Survey on Causal Inference
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy, and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research ...
A Survey of Learning Causality with Data: Problems and Methods
This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from—or the same as—the traditional one? To answer ...
Comments