Abstract
In recent years a lot of research was conducted within the area of causal inference and causal learning. Many methods were developed to identify the cause-effect pairs. These methods also proved their ability to successfully determine the direction of causal relationships from observational real-world data. Yet in bivariate situations, causal discovery problems remain challenging. A class of methods, that also allows tackling the bivariate case, is based on Additive Noise Models (ANMs). Unfortunately, one aspect of these methods has not received much attention until now: what is the impact of different noise levels on the ability of these methods to identify the direction of the causal relationship? This work aims to bridge this gap with the help of an empirical study. We consider a bivariate case and two specific methods Regression with Subsequent Independence Test and Identification using Conditional Variances. We perform a set of experiments with an exhaustive range of ANMs where the additive noises’ levels gradually change from 1% to 10000% of the causes’ noise level (the latter remains fixed). Additionally, we consider several different types of distributions as well as linear and non-linear ANMs. The results of the experiments show that these causal discovery methods can fail to capture the true causal direction for some levels of noise.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Resit method is described in Sect. 3.2.
- 2.
The residuals are defined as the difference between the actual output and the predicted output.
- 3.
Full identifiability means that not only the skeleton of the causal graph is recoverable but also the arrows are.
- 4.
Markov equivalence class refers to the class of graphs in which all graphs have the same skeleton.
- 5.
Uncertainty Scoring method is described in Sect. 3.3.
- 6.
Source: https://github.com/amber0309/HSIC.
- 7.
Low rank decomposition of Gram matrices, which permits an accurate approximation to HSIC as long as the kernel has a fast decaying spectrum.
- 8.
The true direction of the causal relationship is known as we generate synthetic data.
- 9.
- 10.
A quick test in python shell, with \(i = 57\), \(X \sim \mathcal {L}\) and \(N_y \sim \mathcal {U}\) and 100 repetitions showed that in these runs the ordering was always correct but only in 35 runs (from the 100 repetitions) the independence tests were correct.
- 11.
Source: https://github.com/amber0309/HSIC.
References
Chen, W., Drton, M., Wang, Y.S.: On causal discovery with an equal-variance assumption. Biometrika 106(4), 973–980 (2019)
Daniusis, P., et al.: Inferring deterministic causal relations (2012). http://arxiv.org/abs/1203.3475
Friedman, N., Nachman, I.: Gaussian process networks. CoRR abs/1301.3857 (2013). http://arxiv.org/abs/1301.3857
Hoyer, P., Janzing, D., Mooij, J.M., Peters, J., Schölkopf, B.: Nonlinear causal discovery with additive noise models. Adv. Neural. Inf. Process. Syst. 21, 689–696 (2009)
Hyvärinen, A., Smith, S.M.: Pairwise likelihood ratios for estimation of non-gaussian structural equation models. J. Mach. Learn. Res. 14, 111–152 (2013)
Janzing, D., Hoyer, P.O., Schoelkopf, B.: Telling cause from effect based on high-dimensional observations (2009). http://arxiv.org/abs/0909.4386
Janzing, D., et al.: Information-geometric approach to inferring causal directions. Artif. Intell. 182, 1–31 (2012)
Judea, P.: Causality: Models, Reasoning, and Inference. Cambridge University Press (2000). ISBN 0 521(77362)
Kano, Y., Shimizu, S.: Causal inference using nonnormality. In: Proceedings of the International Symposium on Science of Modeling, the 30th Anniversary of the Information Criterion, pp. 261–270 (2003)
Kap, B.: The effect of noise level on causal identification with additive noise models (2021). https://arxiv.org/abs/2108.11320
Kohavi, R., Longbotham, R.: Online controlled experiments and A/B testing. Encyclopedia Mach. Learn. Data Mining 7(8), 922–929 (2017)
Kpotufe, S., Sgouritsa, E., Janzing, D., Schölkopf, B.: Consistency of causal inference under the additive noise model. In: International Conference on Machine Learning, pp. 478–486. PMLR (2014)
Mooij, J., Janzing, D., Peters, J., Schölkopf, B.: Regression by dependence minimization and its application to causal inference in additive noise models. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 745–752 (2009)
Mooij, J.M., Janzing, D., Heskes, T., Schölkopf, B.: On causal discovery with cyclic additive noise models. In: Proceedings of the 24th International Conference on Neural Information Processing Systems, pp. 639–647 (2011)
Mooij, J.M., Peters, J., Janzing, D., Zscheischler, J., Schölkopf, B.: Distinguishing cause from effect using observational data: methods and benchmarks. J. Mach. Learn. Res. 17(1), 1103–1204 (2016)
Nowzohour, C., Bühlmann, P.: Score-based causal learning in additive noise models. Statistics 50(3), 471–485 (2016)
Park, G.: Identifiability of additive noise models using conditional variances. J. Mach. Learn. Res. 21(75), 1–34 (2020)
Park, G., Kim, Y.: Identifiability of gaussian structural equation models with homogeneous and heterogeneous error variances (2019). http://arxiv.org/abs/1901.10134
Peters, J., Bühlmann, P.: Identifiability of Gaussian structural equation models with equal error variances. Biometrika 101(1), 219–228 (2013). https://doi.org/10.1093/biomet/ast043
Peters, J., Mooij, J., Janzing, D., Schölkopf, B.: Causal discovery with continuous additive noise models. J. Mach. Learn. Res. 15(1), 2009–2053 (2014)
Rebane, G., Pearl, J.: The recovery of causal poly-trees from statistical data. CoRR abs/1304.2736 (2013). http://arxiv.org/abs/1304.2736
Sgouritsa, E., Janzing, D., Hennig, P., Schölkopf, B.: Inference of cause and effect with unsupervised inverse regression. In: Artificial Intelligence and Statistics, pp. 847–855. PMLR (2015)
Shimizu, S.: LiNGAM: non-gaussian methods for estimating causal structures. Behaviormetrika 41(1), 65–98 (2014)
Shimizu, S., Hoyer, P.O., Hyvärinen, A., Kerminen, A., Jordan, M.: A linear non-gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7(10), 2003–2030 (2006)
Shimizu, S., Hyvarinen, A., Kawahara, Y.: A direct method for estimating a causal ordering in a linear non-gaussian acyclic model (2014). http://arxiv.org/abs/1408.2038
Silva, R., Scheines, R., Glymour, C., Spirtes, P., Chickering, D.M.: Learning the structure of linear latent variable models. J. Mach. Learn. Res. 7(2), 191–246 (2006)
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, vol. 81. Springer, New York (2012). https://doi.org/10.1007/978-1-4612-2748-9
Stegle, O., Janzing, D., Zhang, K., Mooij, J.M., Schölkopf, B.: Probabilistic latent variable models for distinguishing between cause and effect. Adv. Neural. Inf. Process. Syst. 23, 1687–1695 (2010)
Sun, X., Janzing, D., Schölkopf, B.: Causal inference by choosing graphs with most plausible Markov kernels. In: Ninth International Symposium on Artificial Intelligence and Mathematics (AIMath 2006), pp. 1–11 (2006)
Sun, X., Janzing, D., Schölkopf, B.: Causal reasoning by evaluating the complexity of conditional densities with kernel methods. Neurocomputing 71(7–9), 1248–1256 (2008)
Szabó, Z.: Information theoretical estimators toolbox. J. Mach. Learn. Res. 15, 283–287 (2014)
Thase, M.E., Parikh, S.V., et al.: Impact of pharmacogenomics on clinical outcomes for patients taking medications with gene-drug interactions in a randomized controlled trial. J. Clin. Psychiatry 80(6), 19m12910 (2019)
Wright, S.: Correlation and causation. J. Agric. Res. 20, 557–580 (1921)
Young, S.W.: Improving library user experience with A/B testing: principles and process. Weave J. Libr. User Exp. 1(1) (2014)
Zhang, K., Hyvarinen, A.: On the identifiability of the post-nonlinear causal model (2012). http://arxiv.org/abs/1205.2599
Acknowledgments
This work was partially supported by the European Union Horizon 2020 research programme within the project CITIES2030 “Co-creating resilient and sustainable food towards FOOD2030”, grant 101000640.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 Detailed Description of Estimators
-
1.
HSIC: Hilbert-Schmidt Independence Criterion with RBF KernelFootnote 11
$$I_{HSIC}(x,y) := ||C_{xy}||^2_{HS}$$where \(C_{xy}\) is the cross-covariance operator and HS the squared Hilbert-Schmidt norm.
-
2.
HSIC_IC: Hilbert-Schmidt Independence Criterion using incomplete Cholesky decomposition (low rank decomposition of the Gram matrices, which permits an accurate approximation to HSIC as long as the kernel has a fast decaying spectrum) which has \(\eta = 1*10^{-6}\) precision in the incomplete cholesky decomposition.
-
3.
HSIC_IC2: Same as HSIC_IC but with \(\eta = 1*10^{-2}\).
-
4.
DISTCOV: Distance covariance estimator using pairwise distances. This is simply the \(L^2_w\) norm of the characteristic functions \(\varphi _{12}\) and \(\varphi _1 \varphi _2\) of input x, y:
$$\varphi _{12}(\boldsymbol{u}^1,\boldsymbol{u}^2) = \mathbb {E}[e^{i\langle \boldsymbol{u}^1, \boldsymbol{x} \rangle + i\langle \boldsymbol{u}^2, \boldsymbol{y} \rangle }],$$$$\varphi _1(\boldsymbol{u}^1) = \mathbb {E}[e^{i\langle \boldsymbol{u}^1, \boldsymbol{x} \rangle }],$$$$\varphi _2(\boldsymbol{u}^2) = \mathbb {E}[e^{i\langle \boldsymbol{u}^2, \boldsymbol{y} \rangle }].$$With \(i = \sqrt{-1}\), \(\langle \cdot , \cdot \rangle \) the standard Euclidean inner product and \(\mathbb {E}\) the expectation. Finally, we have:
$$I_{dCov}(x,y) = ||\varphi _{12} - \varphi _1 \varphi _2||_{L^2_w}$$ -
5.
DISTCORR: Distance correlation estimator using pairwise distances. It is simply the standardized version of the distance covariance:
$$I_{dCor}(x,y) = {\left\{ \begin{array}{ll} \frac{I_{dCov}(x,y)}{\sqrt{I_{dVar}(x,x)I_{dVar}(y,y)}}, &{}\text {if } I_{dVar}(x,x)I_{dVar}(y,y) > 0 \\ 0, &{} \text {otherwise,} \end{array}\right. }$$with
$$I_{dVar}(x,x) = ||\varphi _{11} - \varphi _1 \varphi _1||_{L^2_w},\, I_{dVar}(y,y) = ||\varphi _{22} - \varphi _2 \varphi _2||_{L^2_w}$$(see characteristic functions under 4. DISTCOV)
-
6.
HOEFFDING: Hoeffding’s Phi
$$I_{\varPhi }(x,y) = I_{\varPhi }(C) = \left( h_2(d) \int _{[0,1]^d} [C(\boldsymbol{u}) - \Pi (\boldsymbol{u})]^2d\boldsymbol{u}\right) ^{\frac{1}{2}}$$with C standing for the copula of the input and \(\Pi \) standing for the product copula.
-
7.
SH_KNN: Shannon differential entropy estimator using kNNs (k-nearest neighbors)
$$H(\boldsymbol{Y}_{1:T}) = log(T-1) - \psi (k) + log(V_d) + \frac{d}{T}\sum ^T_{t=1}log(\rho _k(t))$$with T standing for the number of samples, \(\rho _k(t)\) - the Euclidean distance of the \(k^{th}\) nearest neighbour of \(\boldsymbol{y}_t\) in the sample \(\boldsymbol{Y}_{1:T}\backslash \{\boldsymbol{y}_t\}\) and \(V \subseteq \mathbb {R}^d\) a finite set.
-
8.
SH_KNN_2: Same as SH_KNN but using kd-tree for quick nearest-neighbour lookup
-
9.
SH_KNN_3: Same as SH_KNN but with \(k=5\)
-
10.
SH_MAXENT1: Maximum entropy distribution-based Shannon entropy estimator
$$H(\boldsymbol{Y}_{1:T}) = H(n) - \left[ k_1 \left( \frac{1}{T}\sum ^T_{t=1}G_1(y'_t)\right) ^2 + k_2 \left( \frac{1}{T}\sum ^T_{t=1}G_2(y'_t)-\sqrt{\frac{2}{\pi }}\right) ^2\right] +\, log(\hat{\sigma }),$$with
$$\hat{\sigma } = \hat{\sigma }(\boldsymbol{Y}_{1:T}) = \sqrt{\frac{1}{T-1}\sum ^T_{t=1}(y_t)^2},$$$$y'_t = \frac{y_t}{\hat{\sigma }}, (t= 1, \dots , T)$$$$G_1(z) = ze^{\frac{-z^2}{2}},$$$$G_2(z) = |z|,$$$$k_1 = \frac{36}{8\sqrt{3}-9},$$$$k_2 = \frac{1}{2-\frac{6}{\pi }},$$ -
11.
SH_MAXENT2: Maximum entropy distribution-based Shannon entropy estimator, same as SH_MAXENT1 with the following changes:
$$G_2(z) = e^{\frac{-z^2}{2}},$$$$k_2 = \frac{24}{16\sqrt{3}-27},$$ -
12.
SH_SPACING_V: Shannon entropy estimator using Vasicek’s spacing method.
$$H(\boldsymbol{Y}_{1:T}) = \frac{1}{T}\sum ^T_t=log\left( \frac{T}{2m}[y_{(t+m)}-y_{(t-m)}]\right) $$with T number of samples, the convention that \(y_{(t)} := y_{(1)}\) if \(t-m< 1\) and \(y_{(t)} := y_{(T)}\) if \(t+m > T\) and \(m = \lfloor \sqrt{T} \rfloor \).
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Kap, B., Aleksandrova, M., Engel, T. (2022). The Effect of Noise Level on the Accuracy of Causal Discovery Methods with Additive Noise Models. In: Leiva, L.A., Pruski, C., Markovich, R., Najjar, A., Schommer, C. (eds) Artificial Intelligence and Machine Learning. BNAIC/Benelearn 2021. Communications in Computer and Information Science, vol 1530. Springer, Cham. https://doi.org/10.1007/978-3-030-93842-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-93842-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93841-3
Online ISBN: 978-3-030-93842-0
eBook Packages: Computer ScienceComputer Science (R0)