Skip to main content
Log in

A Wiener Causality Defined by Divergence

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Discovering causal relationships is a fundamental task in investigating the dynamics of complex systems (Pearl in Stat Surv 3:96–146, 2009). Traditional approaches like Granger causality or transfer entropy fail to capture all the interdependence of the statistic moments, which might lead to wrong causal conclusions. In the previous papers (Chen et al. in 25th international conference, ICONIP 2018, Siem Reap, Cambodia, proceedings, Part II, 2018), the authors proposed a novel definition of Wiener causality for measuring the intervene between time series based on relative entropy, providing an integrated description of statistic causal intervene. In this work, we show that relative entropy is a special case of an existing more general divergence estimation. We argue that any Bregman divergences can be used for detecting the causal relations and in theory remedies the information dropout problem. We discuss the benefits of various choices of divergence functions on causal inferring and the quality of the obtained causal models. As a byproduct, we also obtain the robustness analysis and elucidate that RE causality achieves a faster convergence rate among BD causalities. To substantiate our claims, we provide experimental evidence on how BD causalities improve detection accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. For the case of high dimensions, the similar results can be derived by the same fashion.

  2. Also as \(GC_{y\rightarrow x}=\ln \{\mathrm{tr}[\varSigma (x|x^{p})]/\mathrm{tr}[\varSigma (x|x^{p}\oplus y^{q})]\}\).

References

  1. Pearl J (2009) Causal inference in statistics: an overview. Stat Surv 3:96–146

    Article  MathSciNet  Google Scholar 

  2. Chen JY, Feng JF, Lu WL (2018) A Wiener causality defined by relative entropy. In: 25th International conference, ICONIP 2018, Siem Reap, Cambodia, proceedings, Part II

  3. Aristotle (2018) Metaphysics: book iota. Clarendon Press, UK

  4. Brock W (1991) Causality, chaos, explanation and prediction in economics and finance. In: Beyond belief: randomness, prediction and explanation in science, pp 230–279

  5. Anscombe E (2018) Causality and determination. Agency and responsibility. Routledge, pp 57–73

  6. Pearl J (1999) Causality: models, reasoning, and inference. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  7. Wiener N (1956) The theory of prediction, modern mathematics for engineers. McGraw-Hill, New York

    Google Scholar 

  8. Granger C (1969) Investigating causal relations by econometric models and cross-spectral methods. Econ J Econ Soc 37(3):424–438

    MATH  Google Scholar 

  9. Schreiber T (2000) Measuring information transfer. Phys Rev Lett 85(2):461

    Article  Google Scholar 

  10. Valenza G, Faes L, Citi L, Orini M, Barbieri R (2018) Instantaneous transfer entropy for the study of cardiovascular and cardiorespiratory nonstationary dynamics. IEEE Trans Biomed Eng 65(5):1077–1085

    Google Scholar 

  11. Liang XS, Kleeman R (2005) Information transfer between dynamical system components. Phys Rev Lett 95(24):244101

    Article  Google Scholar 

  12. Bernett L, Barrett AB, Seth AK (2009) Granger causality and transfer entropy are equivalent for Gaussian variables. Phys Rev Lett 103(23):238701

    Article  Google Scholar 

  13. Ding M, Chen Y, Bressler S (2006) Handbook of time series analysis: recent theoretical developments and applications. Wiley, Wienheim

    Google Scholar 

  14. Seth AK, Barrett AB, Barnett L (2015) Granger causality analysis in neuroscience and neuroimaging. J Neurosci 35(8):3293–3297

    Article  Google Scholar 

  15. Bregman LM (1967) The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput Math Math Phys 7:3

    Article  MathSciNet  Google Scholar 

  16. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79

    Article  MathSciNet  Google Scholar 

  17. Cliff OM, Prokopenko M, Fitch R (2018) Minimising the Kullback–Leibler divergence for model selection in distributed nonlinear systems. Entropy 20(2):51

    Article  MathSciNet  Google Scholar 

  18. Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New York

    Book  Google Scholar 

  19. Si S, Tao D, Geng B (2010) Bregman divergence-based regularization for transfer subspace learning. IEEE Trans Knowl Data Eng 22(7):929–942

    Article  Google Scholar 

  20. Amari SI (2009) Divergence, optimization and geometry. In: International conference on neural information processing, pp 185–193

  21. Oseledec VI (1968) A multiplicative ergodic theorem: Liapunov characteristic number for dynamical systems. Trans Mosc Math Soc 19:197

    MathSciNet  Google Scholar 

  22. Hosoya Y (2001) Elimination of third-series effect and defining partial measures of causality. J Time Ser 22:537

    Article  MathSciNet  Google Scholar 

  23. Rissanen JJ (1996) Fisher information and stochastic complexity. IEEE Trans Inf Theory 42(1):40–47

    Article  MathSciNet  Google Scholar 

  24. He SY (1998) Parameter estimation of hidden periodic model in random fields. Sci China A 42(3):238

    MathSciNet  MATH  Google Scholar 

  25. Ma HF, Leng SY, Tao CY, Ying X, Kurths J, Lai YC, Lin W (2017) Detection of time delays and directional interactions based on time series from complex dynamical systems. Phys Rev E 96(1):012221

    Article  MathSciNet  Google Scholar 

  26. Reynolds D (2015) Gaussian mixture models. Encyclopedia of biometrics. Springer, Berlin, pp 827–832

    Google Scholar 

  27. Goldberger J, Gordon S, Greenspan H (2003) An efficient image similarity measure based on approximations of KL-divergence between two Gaussian mixtures. In: Proceedings ninth IEEE international conference on computer vision, Nice, France, pp 487–493

  28. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol 39(1):1–22

    MathSciNet  MATH  Google Scholar 

  29. Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305

    MathSciNet  MATH  Google Scholar 

  30. Geweke J (1989) Baysian inference in econometric models using Monte Carlo integration. Econometrica 57:1317–1339

    Article  MathSciNet  Google Scholar 

  31. Burda Y, Grosse R, Salakhutdinov R (2015) Importance weighted autoencoders. arXiv preprint arXiv:1509.00519

  32. Chen LQ, Tao CY, Zhang RY, Henao R, Carin L (2018) Variational inference and model selection with generalized evidence bounds. In: International conference on machine learning, pp 892–901

  33. Tao CY, Chen LQ, Henao R, Feng JF, Carin L (2018) Chi-square generative adversarial network. In: International conference on machine learning, pp 4894–4903

  34. Tao CY, Dai SY, Chen LQ, Bai K, Chen JY, Liu C, Zhang RY, Georgiy Bobashev G, Carin L (2019) Variational annealing of GANs: a Langevin perspective. In: International conference on machine learning, pp 6176–6185

  35. Liu P, Zeng Z, Wang J (2016) Multistability analysis of a general class of recurrent neural networks with non-monotonic activation functions and time-varying delays. Neural Netw 79:117–127

    Article  Google Scholar 

  36. Liu P, Zeng Z, Wang J (2017) Multiple Mittag–Leffler stability of fractional-order recurrent neural networks. IEEE Trans Syst Man Cybern Syst 47(8):2279–2288

    Article  Google Scholar 

  37. Wu A, Zeng Z (2013) Lagrange stability of memristive neural networks with discrete and distributed delays. IEEE Trans Neural Netw Learn Syst 25(4):690–703

    Article  Google Scholar 

  38. Fahrmeir L, Kaufmann H (1985) Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann Stat 13(1):342–368

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenlian Lu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is jointly supported by National Natural Sciences Foundation of China under Grant No. 61673119, the Key Program of the National Science Foundation of China under Grant No. 91630314, the 111 Project (No. B18015), the key project of Shanghai Science and Technology No. 16JC1420402, Shanghai Municipal Science and Technology Major Project No. 2018SHZDZX01 and ZJ LAB.

Appendices

Appendices

1.1 Proof of Theorem 1

Proof

By the LMS approach, we solve

$$\begin{aligned} b_T = \left[ \frac{1}{T}\sum _{i = 1}^T\left( y_i^\top y_i\right) \right] ^{-1}\left[ \frac{1}{T}\sum _{i = 1}^T\left( y_i^\top x_i\right) \right] \end{aligned}$$

which converges to \(b = {\mathbb {E}}\left[ (y^\top y)^{-1}\right] {\mathbb {E}}\left[ y^\top y\right] \) as T goes infinity.

Then, let \(\epsilon = x-y\cdot b_T\), where \(y\cdot b_T\) works as a predictor. Let p(xy) be the joint distribution of (xy), which is Gaussian as we assumed. Then, the joint density function of \((\epsilon , y)\) should be \(p(\epsilon +y\cdot b_T, y)\), which can be written as:

$$\begin{aligned} \begin{aligned} p(\epsilon +y\cdot b_T, y)&= (2\pi )^{-(n+m/2)}\det \left[ \varSigma (x\oplus y)\right] ^{-1/2}\\&\exp \Bigg \{-\frac{1}{2}\left[ \epsilon +y\cdot b_T-{\mathbb {E}}\left[ x\right] , y-{\mathbb {E}}\left[ y\right] \right] \\&\varSigma (x\oplus y)^{-1}\left[ \epsilon +y\cdot b_T - {\mathbb {E}}\left[ x\right] , y-{\mathbb {E}}\left[ y\right] ^\top \right] \Bigg \} \end{aligned} \end{aligned}$$

and the density function of y should be

$$\begin{aligned} p(y) = (2\pi )^{-m/2}\det \left[ \varSigma (y)\right] ^{-1/2}\exp \left\{ -\frac{1}{2}\left[ y-{\mathbb {E}}(y)\right] \varSigma (y)^{-1}\left[ y-{\mathbb {E}}(y)\right] ^\top \right\} \end{aligned}$$

Note

$$\begin{aligned} \varSigma (x\oplus y) = \begin{bmatrix} \varSigma (x)&{} \varSigma (x,y)\\ \varSigma (x,y)^\top &{} \varSigma (y) \end{bmatrix} \end{aligned}$$

of which the inverse becomes:

$$\begin{aligned} \varSigma (x\oplus y)^{-1} = \begin{bmatrix} A &{} B\\ B^\top &{} C \end{bmatrix} \end{aligned}$$

with

$$\begin{aligned} \begin{aligned} A&= \left[ \varSigma (x) - \varSigma (x,y)\varSigma (y^{-1}\varSigma (x,y)^\top )\right] ^{-1} = \varSigma (x|y)^{-1}\\ B&= -\varSigma (x|y)^{-1}\varSigma (x,y)\varSigma (y)^{-1}\\ C&= \varSigma (y)^{-1} + \varSigma (y)^{-1}\varSigma (x,y)^\top \varSigma (x|y)^{-1}\varSigma (x,y)\varSigma (y)^{-1}. \end{aligned} \end{aligned}$$

Then, the logarithm of condition density of \(\epsilon \) w.r.t. y is:

$$\begin{aligned} \begin{aligned} \ln \left[ p(\epsilon +y\cdot c_T,y)/p(y)\right] \sim -\frac{1}{2}\left\{ \epsilon - \left[ \left( y-{\mathbb {E}}\left[ y\right] \right) B^\top A^{-1}+{\mathbb {E}}\left[ x\right] - y\cdot b_t\right] \right\} \\ \varSigma (x|y)^{-1}\left\{ \epsilon -\left[ \left( y-{\mathbb {E}}\left[ y\right] \right) B^\top A^{-1}+{\mathbb {E}}\left[ x\right] -y\cdot b_T\right] \right\} \end{aligned} \end{aligned}$$

by neglecting the terms without \(\epsilon \). Given the sample data X and Y, let \(\Psi = [\epsilon _1, \ldots , \epsilon _T]\) with \(\epsilon _i = x_i-b_Ty_i\). By the maximum likelihood, the conditional density of \(\epsilon \) w.r.t. Y becomes

$$\begin{aligned} p_T(\epsilon |Y)\sim N\left( \left( \frac{1}{T}\sum _{i = 1}^Ty_i - {\mathbb {E}}\left[ y\right] \right) B^\top A^{-1}+{\mathbb {E}}\left[ x\right] - y\cdot b_T, \varSigma (x|y)\right) \end{aligned}$$

which converges to

$$\begin{aligned} p(\epsilon |Y)\sim N\left( {\mathbb {E}}\left[ x\right] - {\mathbb {E}}\left[ y\right] b_T, \varSigma (x|y)\right) . \end{aligned}$$

\(\square \)

1.2 Proof of Theorem 2

We start this section by a straightforward proposition:

Proposition 1

The expected value of the score function is zero.

$$\begin{aligned} \begin{aligned} {\mathbb {E}} \left[ s(\theta |X)\right]&= \int p(x;\theta ) \frac{\partial }{\partial \theta }\log p(x;\theta ) dx\\&= \int \frac{\partial p(x;\theta )}{\partial \theta } dx = \frac{\partial }{\partial \theta }\int p(x;\theta )dx = 0. \end{aligned} \end{aligned}$$

From an information perspective, MLE can be justified as asymptotically minimizing KL divergence between p(x) actually describing the data and the parameterized (approximating) pdf \(p_\theta (x)\).

MLE satisfies the following two appealing properties called consistency and asymptotic normality [38].

  • Consistency The estimator \(\theta _n\rightarrow \theta _0\) in probability as \(n\rightarrow \infty \), where \(\theta _0\) is the target unknown parameter of the sample distribution.

  • Asymptotic normality The estimator \(\theta _n\) and its target parameter \(\theta _0\) enjoys the following relation:

    $$\begin{aligned} \sqrt{n}(\theta _n - \theta _0)\xrightarrow {D}N(0, I^{-1}(\theta _0)) \end{aligned}$$
    (18)

\(\theta _0\) finds the maximizer of the likelihood function. Namely,

$$\begin{aligned} \theta _0 = \arg \max _{\theta } {\mathbb {E}}\left[ \log p(X;\theta )\right] . \end{aligned}$$

Let \(\theta _n\) be the MLE. Note that the MLE solves the equation:

$$\begin{aligned} \sum _{i=1}^n s(\theta _n|X_i) = 0 \end{aligned}$$

Since \(\theta _0\) is the maximizer of \({\mathbb {E}}\left[ \log p(X;\theta )\right] \), it also satisfies \({\mathbb {E}}\left[ s(\theta |X))\right] = 0\). Recall that \({\tilde{\theta }}_n\) is MLE of the perturbed data, consider the following expansion:

$$\begin{aligned} \begin{aligned} 0=&\frac{1}{n}\sum _{i = 1}^n s({\tilde{\theta }}_n|{\tilde{X}}_i) - s(\theta _n|X_i)\\ =&\frac{1}{n}\sum _{i = 1}^n s({\tilde{\theta }}_n|X_i) - s(\theta _n|X_i) - s({\tilde{\theta }}_n|X_i) + s({\tilde{\theta }}_n|{\tilde{X}}_i) \\ =&\frac{1}{n}\sum _{i = 1}^n({\tilde{\theta }}_n - \theta _n)s_\theta (\theta _n|X_i) - (X_i-{\tilde{X}}_i)s_X({\tilde{\theta }}_n|\tilde{X_i}).\\ \end{aligned} \end{aligned}$$
(19)

For the first term,

$$\begin{aligned} \frac{1}{n}\sum _{i = 1}^n({\tilde{\theta }}_n - \theta _n)s_\theta (\theta _n|X_i) \approx ({\tilde{\theta }}_n - \theta _n){\mathbb {E}}\left[ s_\theta (\theta _0|X_i)\right] \end{aligned}$$

as \({\tilde{\theta }}_n\rightarrow \theta _0\).

By central limit theorem, the second term

$$\begin{aligned} \begin{aligned} \frac{1}{n}\sum _{i = 1}^n (X_i - {\tilde{X}}_i)s_X({\tilde{\theta }}_n|\tilde{X_i})&= \frac{1}{n}\sum _{i = 1}^n (X_i - {\tilde{X}}_i)s_X({\tilde{\theta }}_n|\tilde{X_i}) - {\mathbb {E}}\left[ \epsilon s_X({\tilde{\theta }}_0|\tilde{X_i})\right] \\&\xrightarrow {D} N\left( 0, \frac{\sigma ^2}{n}\right) . \end{aligned} \end{aligned}$$

where

$$\begin{aligned} s^2 = Var\left[ \epsilon s_X({\tilde{\theta }}_0|{\tilde{X}}_i)\right] . \end{aligned}$$

Therefore, rearranging the quantities in Eq. (19),

$$\begin{aligned} \begin{aligned} \sqrt{n}({\tilde{\theta }}_n - \theta _n)&= \frac{\sqrt{n}}{{\mathbb {E}}\left[ s_\theta (\theta _0|X_i)\right] }\left[ \frac{1}{n}\sum _{i = 1}^n(X_i-{\tilde{X}}_i)s_X({\tilde{\theta }}_n|{\tilde{X}}_i) - {\mathbb {E}}\left[ \epsilon s_X({\tilde{\theta }}_0|{\tilde{X}}_i)\right] \right] \\&\xrightarrow {D}N\left( 0, \frac{{\tilde{\sigma }}^2}{{\mathbb {E}}^2\left[ s_\theta (\theta _0|X_i)\right] }\right) \end{aligned} \end{aligned}$$
(20)

Now we have already derived the asymptotic normality. The next step is to simplify the asymptotic variance.

First we focus on \(s_\theta (\theta _0|X_i)\):

$$\begin{aligned} \begin{aligned} s_\theta (\theta _0|X_i)&= \frac{\partial ^2}{\partial \theta ^2} \log p(X_i;\theta _0) =\frac{\partial }{\partial \theta }\frac{\frac{\partial }{\partial \theta }p(X_i;\theta _0)}{p(X_i;\theta _0)} \\&= \frac{\frac{\partial }{\partial \theta ^2}p(X_i;\theta _0)}{p(X_i;\theta _0)} - \left( \frac{\frac{\partial }{\partial \theta }p(X_i;\theta _0)}{p(X_i;\theta _0)} \right) ^2\\&= \frac{\frac{\partial }{\partial \theta ^2}p(X_i;\theta _0)}{p(X_i;\theta _0)} -\left( \frac{\partial }{\partial \theta } \log p(X_i; \theta _0)\right) ^2\\&= \frac{\frac{\partial }{\partial \theta ^2}p(X_i;\theta _0)}{p(X_i;\theta _0)} - s^2(\theta _0|X_i). \end{aligned} \end{aligned}$$

For the first quantity,

$$\begin{aligned} {\mathbb {E}}\left[ \frac{\frac{\partial }{\partial \theta ^2}p(X_i;\theta _0)}{p(X_i;\theta _0)} \right] = \int \frac{\frac{\partial }{\partial \theta ^2}p(x;\theta _0)}{p(x;\theta _0)} p(x;\theta _0)dx = \frac{\partial ^2}{\partial \theta ^2}\int p(x;\theta _0)dx = 0 \end{aligned}$$

Exchange the positions of the derivative and the integration, we have:

$$\begin{aligned} {\mathbb {E}}\left[ s(\theta _0|X_i)\right] = {\mathbb {E}}\left[ \frac{\frac{\partial }{\partial \theta ^2}p(X_i;\theta _0)}{p(X_i;\theta _0)} \right] - {\mathbb {E}}\left[ s^2(\theta _0|X_i)\right] = -I(\theta _0). \end{aligned}$$
(21)

Plugging (21) into Eq. (20) gives

$$\begin{aligned} \sqrt{n}\left( {\tilde{\theta }}_n -\theta _n\right) \xrightarrow {D}N\left( 0, \frac{{\tilde{\sigma }}^2}{I^2(\theta _0)}\right) \end{aligned}$$

Together with (18), \({\tilde{\theta }}_n\) is a consistent estimator of \(\theta _0\).

1.3 Proof of Theorem 3

Proof

Consider the Taylor expansion of \(p(\theta _n)\), we arrive at:

$$\begin{aligned} p(\theta _n) = p(\theta _0) + \varDelta \theta ^\top _n \frac{\partial p(\theta _0)}{\partial \theta _n} +\frac{1}{2}\varDelta \theta _n^\top \frac{\partial ^2 p(\theta _0)}{\partial \theta _n^2} \varDelta \theta _n + o\left( (\varDelta \theta _n)^2\right) \end{aligned}$$

with \(\varDelta \theta _n = \theta _n - \theta _0\).

We first derive the asymptotic normality under the KL-divergence framework. Since the parameters \(\theta _n\) are obtained via MLE, which is equivalent to minimizing the KL-divergence, i.e, the first derivatives of KL-divergence vanish:

$$\begin{aligned} \frac{\partial }{\partial \theta _n}\Big |_{\theta _n = \theta _0}KL\left( p(\theta _n)||p(\theta _0)\right) = 0 \end{aligned}$$

By Taylor expansion we obtain up to second order:

$$\begin{aligned} KL(p(\theta _n)||p(\theta _0)) = \frac{1}{2}\varDelta \theta _n^\top H(\theta _0) \varDelta \theta _n^\top +o\left( (\varDelta \theta _n)^2\right) \end{aligned}$$

where H stands for the Hessian matrix of KL divergence.

Note that \(H(\theta _0)\) is equivalent to \(I(\theta _0)\):

$$\begin{aligned} \begin{aligned} H(\theta _0)&= \frac{\partial ^2}{\partial \theta ^2}\Bigg |_{\theta = \theta _0}KL(p(\theta )||p(\theta _0))\\&=\int \frac{\partial }{\partial \theta }\left( \log p(\theta _0)+1\right) \frac{\partial p(\theta _0)}{\partial \theta }dx - \int \frac{\partial ^2 p(\ theta_0)}{\partial \theta ^2}\log p(\theta _0)dx\\&=\int \frac{1}{p(\theta _0)}\left( \frac{\partial p(\theta _0)}{\partial \theta }\right) ^2+\frac{\partial ^2 p(\theta _0)}{\partial \theta ^2}+ \frac{\partial ^2 p(\theta _0)}{\partial \theta ^2}\log p(\theta _0) - \frac{\partial ^2p(\theta _0)}{\partial \theta ^2}\log p (\theta _0)dx\\&= {\mathbb {E}}\left[ \left( \frac{\partial \log p(\theta _0)}{\partial \theta }\right) ^2\right] = I(\theta _0). \end{aligned} \end{aligned}$$

where the equality in the last line arises from Proposition 1. Together with the asymptotic normality of \(\varDelta \theta _n\) in (18), we have:

$$\begin{aligned} D_{KL}\left( p(\theta _n)||p(\theta _0)\right) = \frac{1}{2n}\left( (\varDelta \theta _n)^\top I^{1/2}\sqrt{n}\right) \cdot \left( \sqrt{n} I^{1/2}\varDelta \theta _n\right) \sim \frac{1}{2n}\chi ^2(n-1), \end{aligned}$$
(22)

as n goes infinity.

While under other (non-logarithm form) Bregman divergence framework, Taylor expansion yields:

$$\begin{aligned} \begin{aligned} \triangle _\psi \left( p(\theta _n)||p(\theta _0)\right)&= \left( \varDelta \theta _n\right) ^\top \frac{\partial \psi }{\partial \theta _n}+\frac{1}{2}\left( \varDelta \theta _n\right) ^\top \frac{\partial ^2\psi }{\partial \theta _n^2}\varDelta \theta _n +o\left( (\varDelta \theta _n)^2\right) \\&=\left( \varDelta \theta _n\right) ^\top \frac{\partial \psi }{\partial \theta _n} \sim N\left( 0, \frac{1}{n}\frac{\partial \psi }{\partial \theta } I^{-1}\left( \frac{\partial \psi }{\partial \theta }\right) ^\top \right) , \end{aligned} \end{aligned}$$
(23)

as n goes infinity.

By (22) and (23) we have BD causality between \(p(\theta _n)\) and \(p(\theta _0)\) converge to zero asymptotically. However, the convergence rate are different, 1/n for relative entropy causality and \(1/\sqrt{n}\) for Bregman divergence causality respectively. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Feng, J. & Lu, W. A Wiener Causality Defined by Divergence. Neural Process Lett 53, 1773–1794 (2021). https://doi.org/10.1007/s11063-019-10187-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-019-10187-6

Keywords

Navigation