Skip to main content

The Effect of Noise Level on the Accuracy of Causal Discovery Methods with Additive Noise Models

  • Conference paper
  • First Online:
Artificial Intelligence and Machine Learning (BNAIC/Benelearn 2021)

Abstract

In recent years a lot of research was conducted within the area of causal inference and causal learning. Many methods were developed to identify the cause-effect pairs. These methods also proved their ability to successfully determine the direction of causal relationships from observational real-world data. Yet in bivariate situations, causal discovery problems remain challenging. A class of methods, that also allows tackling the bivariate case, is based on Additive Noise Models (ANMs). Unfortunately, one aspect of these methods has not received much attention until now: what is the impact of different noise levels on the ability of these methods to identify the direction of the causal relationship? This work aims to bridge this gap with the help of an empirical study. We consider a bivariate case and two specific methods Regression with Subsequent Independence Test and Identification using Conditional Variances. We perform a set of experiments with an exhaustive range of ANMs where the additive noises’ levels gradually change from 1% to 10000% of the causes’ noise level (the latter remains fixed). Additionally, we consider several different types of distributions as well as linear and non-linear ANMs. The results of the experiments show that these causal discovery methods can fail to capture the true causal direction for some levels of noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Resit method is described in Sect. 3.2.

  2. 2.

    The residuals are defined as the difference between the actual output and the predicted output.

  3. 3.

    Full identifiability means that not only the skeleton of the causal graph is recoverable but also the arrows are.

  4. 4.

    Markov equivalence class refers to the class of graphs in which all graphs have the same skeleton.

  5. 5.

    Uncertainty Scoring method is described in Sect. 3.3.

  6. 6.

    Source: https://github.com/amber0309/HSIC.

  7. 7.

    Low rank decomposition of Gram matrices, which permits an accurate approximation to HSIC as long as the kernel has a fast decaying spectrum.

  8. 8.

    The true direction of the causal relationship is known as we generate synthetic data.

  9. 9.

    https://gitlab.com/Shinkaiika/noise-level-causal-identification-additive-noise-models.

  10. 10.

    A quick test in python shell, with \(i = 57\), \(X \sim \mathcal {L}\) and \(N_y \sim \mathcal {U}\) and 100 repetitions showed that in these runs the ordering was always correct but only in 35 runs (from the 100 repetitions) the independence tests were correct.

  11. 11.

    Source: https://github.com/amber0309/HSIC.

References

  1. Chen, W., Drton, M., Wang, Y.S.: On causal discovery with an equal-variance assumption. Biometrika 106(4), 973–980 (2019)

    Article  MathSciNet  Google Scholar 

  2. Daniusis, P., et al.: Inferring deterministic causal relations (2012). http://arxiv.org/abs/1203.3475

  3. Friedman, N., Nachman, I.: Gaussian process networks. CoRR abs/1301.3857 (2013). http://arxiv.org/abs/1301.3857

  4. Hoyer, P., Janzing, D., Mooij, J.M., Peters, J., Schölkopf, B.: Nonlinear causal discovery with additive noise models. Adv. Neural. Inf. Process. Syst. 21, 689–696 (2009)

    MATH  Google Scholar 

  5. Hyvärinen, A., Smith, S.M.: Pairwise likelihood ratios for estimation of non-gaussian structural equation models. J. Mach. Learn. Res. 14, 111–152 (2013)

    Google Scholar 

  6. Janzing, D., Hoyer, P.O., Schoelkopf, B.: Telling cause from effect based on high-dimensional observations (2009). http://arxiv.org/abs/0909.4386

  7. Janzing, D., et al.: Information-geometric approach to inferring causal directions. Artif. Intell. 182, 1–31 (2012)

    Article  MathSciNet  Google Scholar 

  8. Judea, P.: Causality: Models, Reasoning, and Inference. Cambridge University Press (2000). ISBN 0 521(77362)

    Google Scholar 

  9. Kano, Y., Shimizu, S.: Causal inference using nonnormality. In: Proceedings of the International Symposium on Science of Modeling, the 30th Anniversary of the Information Criterion, pp. 261–270 (2003)

    Google Scholar 

  10. Kap, B.: The effect of noise level on causal identification with additive noise models (2021). https://arxiv.org/abs/2108.11320

  11. Kohavi, R., Longbotham, R.: Online controlled experiments and A/B testing. Encyclopedia Mach. Learn. Data Mining 7(8), 922–929 (2017)

    Article  Google Scholar 

  12. Kpotufe, S., Sgouritsa, E., Janzing, D., Schölkopf, B.: Consistency of causal inference under the additive noise model. In: International Conference on Machine Learning, pp. 478–486. PMLR (2014)

    Google Scholar 

  13. Mooij, J., Janzing, D., Peters, J., Schölkopf, B.: Regression by dependence minimization and its application to causal inference in additive noise models. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 745–752 (2009)

    Google Scholar 

  14. Mooij, J.M., Janzing, D., Heskes, T., Schölkopf, B.: On causal discovery with cyclic additive noise models. In: Proceedings of the 24th International Conference on Neural Information Processing Systems, pp. 639–647 (2011)

    Google Scholar 

  15. Mooij, J.M., Peters, J., Janzing, D., Zscheischler, J., Schölkopf, B.: Distinguishing cause from effect using observational data: methods and benchmarks. J. Mach. Learn. Res. 17(1), 1103–1204 (2016)

    MathSciNet  MATH  Google Scholar 

  16. Nowzohour, C., Bühlmann, P.: Score-based causal learning in additive noise models. Statistics 50(3), 471–485 (2016)

    Article  MathSciNet  Google Scholar 

  17. Park, G.: Identifiability of additive noise models using conditional variances. J. Mach. Learn. Res. 21(75), 1–34 (2020)

    MathSciNet  MATH  Google Scholar 

  18. Park, G., Kim, Y.: Identifiability of gaussian structural equation models with homogeneous and heterogeneous error variances (2019). http://arxiv.org/abs/1901.10134

  19. Peters, J., Bühlmann, P.: Identifiability of Gaussian structural equation models with equal error variances. Biometrika 101(1), 219–228 (2013). https://doi.org/10.1093/biomet/ast043

  20. Peters, J., Mooij, J., Janzing, D., Schölkopf, B.: Causal discovery with continuous additive noise models. J. Mach. Learn. Res. 15(1), 2009–2053 (2014)

    MathSciNet  MATH  Google Scholar 

  21. Rebane, G., Pearl, J.: The recovery of causal poly-trees from statistical data. CoRR abs/1304.2736 (2013). http://arxiv.org/abs/1304.2736

  22. Sgouritsa, E., Janzing, D., Hennig, P., Schölkopf, B.: Inference of cause and effect with unsupervised inverse regression. In: Artificial Intelligence and Statistics, pp. 847–855. PMLR (2015)

    Google Scholar 

  23. Shimizu, S.: LiNGAM: non-gaussian methods for estimating causal structures. Behaviormetrika 41(1), 65–98 (2014)

    Article  Google Scholar 

  24. Shimizu, S., Hoyer, P.O., Hyvärinen, A., Kerminen, A., Jordan, M.: A linear non-gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7(10), 2003–2030 (2006)

    MathSciNet  MATH  Google Scholar 

  25. Shimizu, S., Hyvarinen, A., Kawahara, Y.: A direct method for estimating a causal ordering in a linear non-gaussian acyclic model (2014). http://arxiv.org/abs/1408.2038

  26. Silva, R., Scheines, R., Glymour, C., Spirtes, P., Chickering, D.M.: Learning the structure of linear latent variable models. J. Mach. Learn. Res. 7(2), 191–246 (2006)

    MathSciNet  MATH  Google Scholar 

  27. Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, vol. 81. Springer, New York (2012). https://doi.org/10.1007/978-1-4612-2748-9

    Book  MATH  Google Scholar 

  28. Stegle, O., Janzing, D., Zhang, K., Mooij, J.M., Schölkopf, B.: Probabilistic latent variable models for distinguishing between cause and effect. Adv. Neural. Inf. Process. Syst. 23, 1687–1695 (2010)

    Google Scholar 

  29. Sun, X., Janzing, D., Schölkopf, B.: Causal inference by choosing graphs with most plausible Markov kernels. In: Ninth International Symposium on Artificial Intelligence and Mathematics (AIMath 2006), pp. 1–11 (2006)

    Google Scholar 

  30. Sun, X., Janzing, D., Schölkopf, B.: Causal reasoning by evaluating the complexity of conditional densities with kernel methods. Neurocomputing 71(7–9), 1248–1256 (2008)

    Article  Google Scholar 

  31. Szabó, Z.: Information theoretical estimators toolbox. J. Mach. Learn. Res. 15, 283–287 (2014)

    MATH  Google Scholar 

  32. Thase, M.E., Parikh, S.V., et al.: Impact of pharmacogenomics on clinical outcomes for patients taking medications with gene-drug interactions in a randomized controlled trial. J. Clin. Psychiatry 80(6), 19m12910 (2019)

    Google Scholar 

  33. Wright, S.: Correlation and causation. J. Agric. Res. 20, 557–580 (1921)

    Google Scholar 

  34. Young, S.W.: Improving library user experience with A/B testing: principles and process. Weave J. Libr. User Exp. 1(1) (2014)

    Google Scholar 

  35. Zhang, K., Hyvarinen, A.: On the identifiability of the post-nonlinear causal model (2012). http://arxiv.org/abs/1205.2599

Download references

Acknowledgments

This work was partially supported by the European Union Horizon 2020 research programme within the project CITIES2030 “Co-creating resilient and sustainable food towards FOOD2030”, grant 101000640.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin Kap .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Detailed Description of Estimators

  1. 1.

    HSIC: Hilbert-Schmidt Independence Criterion with RBF KernelFootnote 11

    $$I_{HSIC}(x,y) := ||C_{xy}||^2_{HS}$$

    where \(C_{xy}\) is the cross-covariance operator and HS the squared Hilbert-Schmidt norm.

  2. 2.

    HSIC_IC: Hilbert-Schmidt Independence Criterion using incomplete Cholesky decomposition (low rank decomposition of the Gram matrices, which permits an accurate approximation to HSIC as long as the kernel has a fast decaying spectrum) which has \(\eta = 1*10^{-6}\) precision in the incomplete cholesky decomposition.

  3. 3.

    HSIC_IC2: Same as HSIC_IC but with \(\eta = 1*10^{-2}\).

  4. 4.

    DISTCOV: Distance covariance estimator using pairwise distances. This is simply the \(L^2_w\) norm of the characteristic functions \(\varphi _{12}\) and \(\varphi _1 \varphi _2\) of input xy:

    $$\varphi _{12}(\boldsymbol{u}^1,\boldsymbol{u}^2) = \mathbb {E}[e^{i\langle \boldsymbol{u}^1, \boldsymbol{x} \rangle + i\langle \boldsymbol{u}^2, \boldsymbol{y} \rangle }],$$
    $$\varphi _1(\boldsymbol{u}^1) = \mathbb {E}[e^{i\langle \boldsymbol{u}^1, \boldsymbol{x} \rangle }],$$
    $$\varphi _2(\boldsymbol{u}^2) = \mathbb {E}[e^{i\langle \boldsymbol{u}^2, \boldsymbol{y} \rangle }].$$

    With \(i = \sqrt{-1}\), \(\langle \cdot , \cdot \rangle \) the standard Euclidean inner product and \(\mathbb {E}\) the expectation. Finally, we have:

    $$I_{dCov}(x,y) = ||\varphi _{12} - \varphi _1 \varphi _2||_{L^2_w}$$
  5. 5.

    DISTCORR: Distance correlation estimator using pairwise distances. It is simply the standardized version of the distance covariance:

    $$I_{dCor}(x,y) = {\left\{ \begin{array}{ll} \frac{I_{dCov}(x,y)}{\sqrt{I_{dVar}(x,x)I_{dVar}(y,y)}}, &{}\text {if } I_{dVar}(x,x)I_{dVar}(y,y) > 0 \\ 0, &{} \text {otherwise,} \end{array}\right. }$$

    with

    $$I_{dVar}(x,x) = ||\varphi _{11} - \varphi _1 \varphi _1||_{L^2_w},\, I_{dVar}(y,y) = ||\varphi _{22} - \varphi _2 \varphi _2||_{L^2_w}$$

    (see characteristic functions under 4. DISTCOV)

  6. 6.

    HOEFFDING: Hoeffding’s Phi

    $$I_{\varPhi }(x,y) = I_{\varPhi }(C) = \left( h_2(d) \int _{[0,1]^d} [C(\boldsymbol{u}) - \Pi (\boldsymbol{u})]^2d\boldsymbol{u}\right) ^{\frac{1}{2}}$$

    with C standing for the copula of the input and \(\Pi \) standing for the product copula.

  7. 7.

    SH_KNN: Shannon differential entropy estimator using kNNs (k-nearest neighbors)

    $$H(\boldsymbol{Y}_{1:T}) = log(T-1) - \psi (k) + log(V_d) + \frac{d}{T}\sum ^T_{t=1}log(\rho _k(t))$$

    with T standing for the number of samples, \(\rho _k(t)\) - the Euclidean distance of the \(k^{th}\) nearest neighbour of \(\boldsymbol{y}_t\) in the sample \(\boldsymbol{Y}_{1:T}\backslash \{\boldsymbol{y}_t\}\) and \(V \subseteq \mathbb {R}^d\) a finite set.

  8. 8.

    SH_KNN_2: Same as SH_KNN but using kd-tree for quick nearest-neighbour lookup

  9. 9.

    SH_KNN_3: Same as SH_KNN but with \(k=5\)

  10. 10.

    SH_MAXENT1: Maximum entropy distribution-based Shannon entropy estimator

    $$H(\boldsymbol{Y}_{1:T}) = H(n) - \left[ k_1 \left( \frac{1}{T}\sum ^T_{t=1}G_1(y'_t)\right) ^2 + k_2 \left( \frac{1}{T}\sum ^T_{t=1}G_2(y'_t)-\sqrt{\frac{2}{\pi }}\right) ^2\right] +\, log(\hat{\sigma }),$$

    with

    $$\hat{\sigma } = \hat{\sigma }(\boldsymbol{Y}_{1:T}) = \sqrt{\frac{1}{T-1}\sum ^T_{t=1}(y_t)^2},$$
    $$y'_t = \frac{y_t}{\hat{\sigma }}, (t= 1, \dots , T)$$
    $$G_1(z) = ze^{\frac{-z^2}{2}},$$
    $$G_2(z) = |z|,$$
    $$k_1 = \frac{36}{8\sqrt{3}-9},$$
    $$k_2 = \frac{1}{2-\frac{6}{\pi }},$$
  11. 11.

    SH_MAXENT2: Maximum entropy distribution-based Shannon entropy estimator, same as SH_MAXENT1 with the following changes:

    $$G_2(z) = e^{\frac{-z^2}{2}},$$
    $$k_2 = \frac{24}{16\sqrt{3}-27},$$
  12. 12.

    SH_SPACING_V: Shannon entropy estimator using Vasicek’s spacing method.

    $$H(\boldsymbol{Y}_{1:T}) = \frac{1}{T}\sum ^T_t=log\left( \frac{T}{2m}[y_{(t+m)}-y_{(t-m)}]\right) $$

    with T number of samples, the convention that \(y_{(t)} := y_{(1)}\) if \(t-m< 1\) and \(y_{(t)} := y_{(T)}\) if \(t+m > T\) and \(m = \lfloor \sqrt{T} \rfloor \).

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kap, B., Aleksandrova, M., Engel, T. (2022). The Effect of Noise Level on the Accuracy of Causal Discovery Methods with Additive Noise Models. In: Leiva, L.A., Pruski, C., Markovich, R., Najjar, A., Schommer, C. (eds) Artificial Intelligence and Machine Learning. BNAIC/Benelearn 2021. Communications in Computer and Information Science, vol 1530. Springer, Cham. https://doi.org/10.1007/978-3-030-93842-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93842-0_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93841-3

  • Online ISBN: 978-3-030-93842-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics