The Effect of Noise Level on the Accuracy of Causal Discovery Methods with Additive Noise Models

Kap, Benjamin; Aleksandrova, Marharyta; Engel, Thomas

doi:10.1007/978-3-030-93842-0_7

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1530))

Included in the following conference series:

Benelux Conference on Artificial Intelligence

739 Accesses
1 Citations
1 Altmetric

Abstract

In recent years a lot of research was conducted within the area of causal inference and causal learning. Many methods were developed to identify the cause-effect pairs. These methods also proved their ability to successfully determine the direction of causal relationships from observational real-world data. Yet in bivariate situations, causal discovery problems remain challenging. A class of methods, that also allows tackling the bivariate case, is based on Additive Noise Models (ANMs). Unfortunately, one aspect of these methods has not received much attention until now: what is the impact of different noise levels on the ability of these methods to identify the direction of the causal relationship? This work aims to bridge this gap with the help of an empirical study. We consider a bivariate case and two specific methods Regression with Subsequent Independence Test and Identification using Conditional Variances. We perform a set of experiments with an exhaustive range of ANMs where the additive noises’ levels gradually change from 1% to 10000% of the causes’ noise level (the latter remains fixed). Additionally, we consider several different types of distributions as well as linear and non-linear ANMs. The results of the experiments show that these causal discovery methods can fail to capture the true causal direction for some levels of noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Causal Discovery with Hidden Variables Based on Non-Gaussianity and Nonlinearity

Disentangling causality: assumptions in causal discovery and inference

Article Open access 27 February 2023

Causal Inference for Mixed-Type Data in Additive Noise Models

Notes

1.
Resit method is described in Sect. 3.2.
2.
The residuals are defined as the difference between the actual output and the predicted output.
3.
Full identifiability means that not only the skeleton of the causal graph is recoverable but also the arrows are.
4.
Markov equivalence class refers to the class of graphs in which all graphs have the same skeleton.
5.
Uncertainty Scoring method is described in Sect. 3.3.
6.
Source: https://github.com/amber0309/HSIC.
7.
Low rank decomposition of Gram matrices, which permits an accurate approximation to HSIC as long as the kernel has a fast decaying spectrum.
8.
The true direction of the causal relationship is known as we generate synthetic data.
9.
https://gitlab.com/Shinkaiika/noise-level-causal-identification-additive-noise-models.
10.
A quick test in python shell, with $i = 57$, $X \sim \mathcal {L}$ and $N_y \sim \mathcal {U}$ and 100 repetitions showed that in these runs the ordering was always correct but only in 35 runs (from the 100 repetitions) the independence tests were correct.
11.
Source: https://github.com/amber0309/HSIC.

References

Chen, W., Drton, M., Wang, Y.S.: On causal discovery with an equal-variance assumption. Biometrika 106(4), 973–980 (2019)
Article MathSciNet Google Scholar
Daniusis, P., et al.: Inferring deterministic causal relations (2012). http://arxiv.org/abs/1203.3475
Friedman, N., Nachman, I.: Gaussian process networks. CoRR abs/1301.3857 (2013). http://arxiv.org/abs/1301.3857
Hoyer, P., Janzing, D., Mooij, J.M., Peters, J., Schölkopf, B.: Nonlinear causal discovery with additive noise models. Adv. Neural. Inf. Process. Syst. 21, 689–696 (2009)
MATH Google Scholar
Hyvärinen, A., Smith, S.M.: Pairwise likelihood ratios for estimation of non-gaussian structural equation models. J. Mach. Learn. Res. 14, 111–152 (2013)
Google Scholar
Janzing, D., Hoyer, P.O., Schoelkopf, B.: Telling cause from effect based on high-dimensional observations (2009). http://arxiv.org/abs/0909.4386
Janzing, D., et al.: Information-geometric approach to inferring causal directions. Artif. Intell. 182, 1–31 (2012)
Article MathSciNet Google Scholar
Judea, P.: Causality: Models, Reasoning, and Inference. Cambridge University Press (2000). ISBN 0 521(77362)
Google Scholar
Kano, Y., Shimizu, S.: Causal inference using nonnormality. In: Proceedings of the International Symposium on Science of Modeling, the 30th Anniversary of the Information Criterion, pp. 261–270 (2003)
Google Scholar
Kap, B.: The effect of noise level on causal identification with additive noise models (2021). https://arxiv.org/abs/2108.11320
Kohavi, R., Longbotham, R.: Online controlled experiments and A/B testing. Encyclopedia Mach. Learn. Data Mining 7(8), 922–929 (2017)
Article Google Scholar
Kpotufe, S., Sgouritsa, E., Janzing, D., Schölkopf, B.: Consistency of causal inference under the additive noise model. In: International Conference on Machine Learning, pp. 478–486. PMLR (2014)
Google Scholar
Mooij, J., Janzing, D., Peters, J., Schölkopf, B.: Regression by dependence minimization and its application to causal inference in additive noise models. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 745–752 (2009)
Google Scholar
Mooij, J.M., Janzing, D., Heskes, T., Schölkopf, B.: On causal discovery with cyclic additive noise models. In: Proceedings of the 24th International Conference on Neural Information Processing Systems, pp. 639–647 (2011)
Google Scholar
Mooij, J.M., Peters, J., Janzing, D., Zscheischler, J., Schölkopf, B.: Distinguishing cause from effect using observational data: methods and benchmarks. J. Mach. Learn. Res. 17(1), 1103–1204 (2016)
MathSciNet MATH Google Scholar
Nowzohour, C., Bühlmann, P.: Score-based causal learning in additive noise models. Statistics 50(3), 471–485 (2016)
Article MathSciNet Google Scholar
Park, G.: Identifiability of additive noise models using conditional variances. J. Mach. Learn. Res. 21(75), 1–34 (2020)
MathSciNet MATH Google Scholar
Park, G., Kim, Y.: Identifiability of gaussian structural equation models with homogeneous and heterogeneous error variances (2019). http://arxiv.org/abs/1901.10134
Peters, J., Bühlmann, P.: Identifiability of Gaussian structural equation models with equal error variances. Biometrika 101(1), 219–228 (2013). https://doi.org/10.1093/biomet/ast043
Peters, J., Mooij, J., Janzing, D., Schölkopf, B.: Causal discovery with continuous additive noise models. J. Mach. Learn. Res. 15(1), 2009–2053 (2014)
MathSciNet MATH Google Scholar
Rebane, G., Pearl, J.: The recovery of causal poly-trees from statistical data. CoRR abs/1304.2736 (2013). http://arxiv.org/abs/1304.2736
Sgouritsa, E., Janzing, D., Hennig, P., Schölkopf, B.: Inference of cause and effect with unsupervised inverse regression. In: Artificial Intelligence and Statistics, pp. 847–855. PMLR (2015)
Google Scholar
Shimizu, S.: LiNGAM: non-gaussian methods for estimating causal structures. Behaviormetrika 41(1), 65–98 (2014)
Article Google Scholar
Shimizu, S., Hoyer, P.O., Hyvärinen, A., Kerminen, A., Jordan, M.: A linear non-gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7(10), 2003–2030 (2006)
MathSciNet MATH Google Scholar
Shimizu, S., Hyvarinen, A., Kawahara, Y.: A direct method for estimating a causal ordering in a linear non-gaussian acyclic model (2014). http://arxiv.org/abs/1408.2038
Silva, R., Scheines, R., Glymour, C., Spirtes, P., Chickering, D.M.: Learning the structure of linear latent variable models. J. Mach. Learn. Res. 7(2), 191–246 (2006)
MathSciNet MATH Google Scholar
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, vol. 81. Springer, New York (2012). https://doi.org/10.1007/978-1-4612-2748-9
Book MATH Google Scholar
Stegle, O., Janzing, D., Zhang, K., Mooij, J.M., Schölkopf, B.: Probabilistic latent variable models for distinguishing between cause and effect. Adv. Neural. Inf. Process. Syst. 23, 1687–1695 (2010)
Google Scholar
Sun, X., Janzing, D., Schölkopf, B.: Causal inference by choosing graphs with most plausible Markov kernels. In: Ninth International Symposium on Artificial Intelligence and Mathematics (AIMath 2006), pp. 1–11 (2006)
Google Scholar
Sun, X., Janzing, D., Schölkopf, B.: Causal reasoning by evaluating the complexity of conditional densities with kernel methods. Neurocomputing 71(7–9), 1248–1256 (2008)
Article Google Scholar
Szabó, Z.: Information theoretical estimators toolbox. J. Mach. Learn. Res. 15, 283–287 (2014)
MATH Google Scholar
Thase, M.E., Parikh, S.V., et al.: Impact of pharmacogenomics on clinical outcomes for patients taking medications with gene-drug interactions in a randomized controlled trial. J. Clin. Psychiatry 80(6), 19m12910 (2019)
Google Scholar
Wright, S.: Correlation and causation. J. Agric. Res. 20, 557–580 (1921)
Google Scholar
Young, S.W.: Improving library user experience with A/B testing: principles and process. Weave J. Libr. User Exp. 1(1) (2014)
Google Scholar
Zhang, K., Hyvarinen, A.: On the identifiability of the post-nonlinear causal model (2012). http://arxiv.org/abs/1205.2599

Download references

Acknowledgments

This work was partially supported by the European Union Horizon 2020 research programme within the project CITIES2030 “Co-creating resilient and sustainable food towards FOOD2030”, grant 101000640.

Author information

Authors and Affiliations

University of Luxembourg, 2, Avenue de l’Université, 4365, Esch-sur-Alzette, Luxembourg
Benjamin Kap, Marharyta Aleksandrova & Thomas Engel

Authors

Benjamin Kap
View author publications
You can also search for this author in PubMed Google Scholar
Marharyta Aleksandrova
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Engel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin Kap .

Editor information

Editors and Affiliations

University of Luxembourg, Esch-sur-Alzette, Luxembourg
Luis A. Leiva
Luxembourg Institute of Science and Technology, Esch-sur-Alzette, Luxembourg
Cédric Pruski
University of Luxembourg, Esch-sur-Alzette, Luxembourg
Réka Markovich
University of Luxembourg, Esch-sur-Alzette, Luxembourg
Amro Najjar
University of Luxembourg, Esch-sur-Alzette, Luxembourg
Christoph Schommer

Appendix

1.1 Detailed Description of Estimators

1.
HSIC: Hilbert-Schmidt Independence Criterion with RBF Kernel^{Footnote 11}
$$I_{HSIC}(x,y) := ||C_{xy}||^2_{HS}$$
where $C_{xy}$ is the cross-covariance operator and HS the squared Hilbert-Schmidt norm.
2.
HSIC_IC: Hilbert-Schmidt Independence Criterion using incomplete Cholesky decomposition (low rank decomposition of the Gram matrices, which permits an accurate approximation to HSIC as long as the kernel has a fast decaying spectrum) which has $\eta = 1*10^{-6}$ precision in the incomplete cholesky decomposition.
3.
HSIC_IC2: Same as HSIC_IC but with $\eta = 1*10^{-2}$.
4.
DISTCOV: Distance covariance estimator using pairwise distances. This is simply the $L^2_w$ norm of the characteristic functions $\varphi _{12}$ and $\varphi _1 \varphi _2$ of input x, y:
$$\varphi _{12}(\boldsymbol{u}^1,\boldsymbol{u}^2) = \mathbb {E}[e^{i\langle \boldsymbol{u}^1, \boldsymbol{x} \rangle + i\langle \boldsymbol{u}^2, \boldsymbol{y} \rangle }],$$

$$\varphi _1(\boldsymbol{u}^1) = \mathbb {E}[e^{i\langle \boldsymbol{u}^1, \boldsymbol{x} \rangle }],$$

$$\varphi _2(\boldsymbol{u}^2) = \mathbb {E}[e^{i\langle \boldsymbol{u}^2, \boldsymbol{y} \rangle }].$$
With $i = \sqrt{-1}$, $\langle \cdot , \cdot \rangle $ the standard Euclidean inner product and $\mathbb {E}$ the expectation. Finally, we have:
$$I_{dCov}(x,y) = ||\varphi _{12} - \varphi _1 \varphi _2||_{L^2_w}$$
5.
DISTCORR: Distance correlation estimator using pairwise distances. It is simply the standardized version of the distance covariance:
$$I_{dCor}(x,y) = {\left\{ \begin{array}{ll} \frac{I_{dCov}(x,y)}{\sqrt{I_{dVar}(x,x)I_{dVar}(y,y)}}, &{}\text {if } I_{dVar}(x,x)I_{dVar}(y,y) > 0 \\ 0, &{} \text {otherwise,} \end{array}\right. }$$
with
$$I_{dVar}(x,x) = ||\varphi _{11} - \varphi _1 \varphi _1||_{L^2_w},\, I_{dVar}(y,y) = ||\varphi _{22} - \varphi _2 \varphi _2||_{L^2_w}$$
(see characteristic functions under 4. DISTCOV)
6.
HOEFFDING: Hoeffding’s Phi
$$I_{\varPhi }(x,y) = I_{\varPhi }(C) = \left( h_2(d) \int _{[0,1]^d} [C(\boldsymbol{u}) - \Pi (\boldsymbol{u})]^2d\boldsymbol{u}\right) ^{\frac{1}{2}}$$
with C standing for the copula of the input and $\Pi $ standing for the product copula.
7.
SH_KNN: Shannon differential entropy estimator using kNNs (k-nearest neighbors)
$$H(\boldsymbol{Y}_{1:T}) = log(T-1) - \psi (k) + log(V_d) + \frac{d}{T}\sum ^T_{t=1}log(\rho _k(t))$$
with T standing for the number of samples, $\rho _k(t)$ - the Euclidean distance of the $k^{th}$ nearest neighbour of $\boldsymbol{y}_t$ in the sample $\boldsymbol{Y}_{1:T}\backslash \{\boldsymbol{y}_t\}$ and $V \subseteq \mathbb {R}^d$ a finite set.
8.
SH_KNN_2: Same as SH_KNN but using kd-tree for quick nearest-neighbour lookup
9.
SH_KNN_3: Same as SH_KNN but with $k=5$
10.
SH_MAXENT1: Maximum entropy distribution-based Shannon entropy estimator
$$H(\boldsymbol{Y}_{1:T}) = H(n) - \left[ k_1 \left( \frac{1}{T}\sum ^T_{t=1}G_1(y'_t)\right) ^2 + k_2 \left( \frac{1}{T}\sum ^T_{t=1}G_2(y'_t)-\sqrt{\frac{2}{\pi }}\right) ^2\right] +\, log(\hat{\sigma }),$$
with
$$\hat{\sigma } = \hat{\sigma }(\boldsymbol{Y}_{1:T}) = \sqrt{\frac{1}{T-1}\sum ^T_{t=1}(y_t)^2},$$

$$y'_t = \frac{y_t}{\hat{\sigma }}, (t= 1, \dots , T)$$

$$G_1(z) = ze^{\frac{-z^2}{2}},$$

$$G_2(z) = |z|,$$

$$k_1 = \frac{36}{8\sqrt{3}-9},$$

$$k_2 = \frac{1}{2-\frac{6}{\pi }},$$
11.
SH_MAXENT2: Maximum entropy distribution-based Shannon entropy estimator, same as SH_MAXENT1 with the following changes:
$$G_2(z) = e^{\frac{-z^2}{2}},$$

$$k_2 = \frac{24}{16\sqrt{3}-27},$$
12.
SH_SPACING_V: Shannon entropy estimator using Vasicek’s spacing method.
$$H(\boldsymbol{Y}_{1:T}) = \frac{1}{T}\sum ^T_t=log\left( \frac{T}{2m}[y_{(t+m)}-y_{(t-m)}]\right) $$
with T number of samples, the convention that $y_{(t)} := y_{(1)}$ if $t-m< 1$ and $y_{(t)} := y_{(T)}$ if $t+m > T$ and $m = \lfloor \sqrt{T} \rfloor $.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kap, B., Aleksandrova, M., Engel, T. (2022). The Effect of Noise Level on the Accuracy of Causal Discovery Methods with Additive Noise Models. In: Leiva, L.A., Pruski, C., Markovich, R., Najjar, A., Schommer, C. (eds) Artificial Intelligence and Machine Learning. BNAIC/Benelearn 2021. Communications in Computer and Information Science, vol 1530. Springer, Cham. https://doi.org/10.1007/978-3-030-93842-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-93842-0_7
Published: 11 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93841-3
Online ISBN: 978-3-030-93842-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Effect of Noise Level on the Accuracy of Causal Discovery Methods with Additive Noise Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Causal Discovery with Hidden Variables Based on Non-Gaussianity and Nonlinearity

Disentangling causality: assumptions in causal discovery and inference

Causal Inference for Mixed-Type Data in Additive Noise Models

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

1.1 Detailed Description of Estimators

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

The Effect of Noise Level on the Accuracy of Causal Discovery Methods with Additive Noise Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Causal Discovery with Hidden Variables Based on Non-Gaussianity and Nonlinearity

Disentangling causality: assumptions in causal discovery and inference

Causal Inference for Mixed-Type Data in Additive Noise Models

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Detailed Description of Estimators

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation