1 Introduction

Resting-state functional magnetic resonance imaging (rsfMRI) provides important information about the functional connectivity of brain networks. Correlation analysis is one of the main approaches for analyzing functional connectivity. Since rsfMRI signals from one brain region may be influenced by other regions, the correlation coefficient between any two regions may not reflect the true intrinsic functional connectivity between those regions. As a result, partial correlation coefficient is typically used to compute the true functional connectivity of brain networks [9, 10, 14]. The partial correlation coefficient is the conditional correlation coefficient between two signals of interest after regressing out the influence from other signals. It is related to the Gaussian graphical model [19] in network analysis and has been used to regress out dynamical dependence between time-series data [2, 4] using frequency-domain representations. It is also closely related to the concept of Granger causality [6, 7], which provides a way to estimate the linear dependence between different time series data.

In this paper, we revisit the partial correlation and causality analysis of Geweke and Granger using a general model of dynamic regression. In this model, the signal of interest is decomposed into two components; one term represents the influence on the signal from other regions-of-interest and the second residual term is the intrinsic fluctuation of the signal. In this work, we propose an optimal decomposition of the signal with minimal entropy rate (or minimum uncertainty) for the residual signal. We derive several novel measures and theoretical results for partial correlation and causality analysis by utilizing the theory of optimal prediction [8, 11, 17]. In particular, we introduce a novel causality measure that provides information about the frequency-domain distribution of the explained variance of the regressor. This information is not provided by the popular Geweke causality measure [6]. We applied the proposed methods to analyze rsfMRI HCP data from 100 subjects. Our experimental results show that the proposed method is able to explain a significant portion of the variance in rsfMRI data in the 0–0.1 Hz low-frequency range, which is not explained using existing methods. Thus, our method provides a new tool for investigating functional brain networks.

2 Theory

2.1 Causal-Filter Regression

Consider a zero-mean, wide-sense stationary Gaussian process \(\{{u}(t)\}_{t\in {\mathbb Z}}\), where \({u}(t)\in {\mathbb R}^{n}\) represents a n-dimensional brain functional activity measure. We assume that \({u}(t)\) can be decomposed as \({u}(t)^T=[{x}(t)^T\; {y}(t)^T]\) where \({x}(t)\in {\mathbb R}^k\) represents the signal of interest and \({y}(t)\in {\mathbb R}^{n-k}\) represents the signal from a complementary set of brain regions. To investigate the dynamical linear dependence between \({x}(t)\) and \({y}(t)\), we consider the following linear model:

$$\begin{aligned} {x}(t)=\sum _{\ell =1}^{\infty }F_\ell {y}(t-\ell )+\tilde{x}(t), \end{aligned}$$
(1)

where \(F_{\ell }\in {\mathbb R}^{k\times (n-k)}\) are the regression coefficients and the residual \(\{\tilde{x}(t)\}_{t\in {\mathbb Z}}\) represents the intrinsic signal of interest. To simplify notation, we let z denote the lag operator such that \(z^\ell {y}(t)={y}(t-\ell )\), and let \(F(z)=\sum _{\ell =1}^{\infty }F_\ell z^\ell \). Then, Eq. (1) can be rewritten as:

$$\begin{aligned} {x}(t)=F(z){y}(t)+\tilde{x}(t). \end{aligned}$$
(2)

We denote the power spectral density (PSD) function of the joint process \(\{{u}(t)\}_{t\in {\mathbb Z}}\) as \( S_{{u}{u}}(\theta )=\left[ \begin{matrix}S_{{x}{x}}(\theta )&{} S_{{x}{y}}(\theta )\\ S_{{y}{x}}(\theta ) &{}S_{{y}{y}}(\theta )\end{matrix} \right] , \) for \(\theta \in [-\pi , \pi ]\). Then, the PSD of the residual \(\{\tilde{x}(t)\}_{t\in {\mathbb Z}}\) is given by:

$$\begin{aligned} S_{\tilde{x}\tilde{x}}(\theta )=\left[ I\;\;\; -F(e^{i\theta })\right] \left[ \begin{matrix}S_{{x}{x}}(\theta )&{} S_{{x}{y}}(\theta )\\ S_{{y}{x}}(\theta ) &{}S_{{y}{y}}(\theta )\end{matrix} \right] \left[ \begin{matrix}I\\ -F(e^{i\theta })^*\end{matrix} \right] . \end{aligned}$$
(3)

We assume that \(S_{{u}{u}}(\theta )\) is bounded and positive definite and \(\Vert F(e^{i\theta })\Vert \) is finite, so that \(S_{\tilde{x}\tilde{x}}(\theta )\) is also positive definite and bounded. Thus, \(\log \det (S_{\tilde{x}\tilde{x}}(\theta ))\) is integrable, and \(S_{\tilde{x}\tilde{x}}(\theta )\) admits a spectral factorization [11] given by:

$$ S_{\tilde{x}\tilde{x}}(\theta )=P_{\tilde{x}}(e^{i\theta })^{-1}\varOmega _{\tilde{x}\tilde{x}}P_{\tilde{x}}(e^{i\theta })^{-*},\quad \text {s.t.} \quad P_{\tilde{x}}(z)=I-\sum _{\ell =1}^{\infty } P_\ell z^\ell , $$

where \(P_{\tilde{x}}(z)\) is the optimal one-step-ahead prediction filter for \(\{\tilde{x}(t)\}_{t\in {\mathbb Z}}\) and \(\varOmega _{\tilde{x}\tilde{x}}\) is minimum prediction error covariance [11]. Thus, for any other filter \(\hat{P}_{\tilde{x}}(z)\), we have

$$ \int _{-\pi }^\pi \hat{P}_{\tilde{x}}(e^{i\theta })S_{\tilde{x}\tilde{x}}(\theta )\hat{P}_{\tilde{x}}(e^{i\theta })^{-*}\frac{d\theta }{2\pi }\ge \varOmega _{\tilde{x}\tilde{x}}, $$

in the sense of positive definiteness, with equality when \(\hat{P}_{\tilde{x}}(z)=P_{\tilde{x}}(z)\) [11]. The entropy of the process \(\{\tilde{x}(t)\}_{t\in {\mathbb Z}}\) is defined as:

$$ h(\tilde{x}):=\int _{-\pi }^{\pi }\log \det (S_{\tilde{x}\tilde{x}}(\theta ))\frac{d\theta }{2\pi }, $$

which reflects the differential entropy of the probability distribution of \(\{\tilde{x}(t)\}_{t\in {\mathbb Z}}\) [12, 13]. It is also related to the optimal one-step-ahead prediction error variance via the Szegö-Kolmogorov formula [11]:

$$\begin{aligned} \int _{-\pi }^{\pi } \log \det (S_{\tilde{x}\tilde{x}}(\theta )) \frac{d\theta }{2\pi }=\log \det (\varOmega _{\tilde{x}\tilde{x}}). \end{aligned}$$
(4)

The entropy \(h(\tilde{x})\) provides a measure of the stochastic uncertainty of the process \(\{\tilde{x}(t)\}_{t\in {\mathbb Z}}\). We propose to design an optimal regression filter F(z) that minimizes the uncertainty of \(\{\tilde{x}(t)\}_{t\in {\mathbb Z}}\) by solving the following minimization problem:

$$\begin{aligned} \min _{F(z)}\left\{ h(\tilde{x}) \mid {x}(t)=F(z){y}(t)+\tilde{x}(t) \right\} . \end{aligned}$$
(5)

2.2 Optimal Filter

The solution to the optimization problem in Eq. (5) can be obtained using the optimal prediction theory based on a multivariate-autoregressive representation of the joint process \(\{{u}(t)\}_{t\in {\mathbb Z}}\). To introduce the solution, we let \(P_{{u}}(z)\) denote the optimal one-step-ahead prediction filter for \(\{{u}(t)\}_{t\in {\mathbb Z}}\) with the corresponding innovation process being \(\{{\varvec{\varepsilon }}(t):=P_{{u}}(z){u}(t) = (I - \sum _\ell P_\ell z^\ell ){u}(t)\}_{t\in {\mathbb Z}}\) whose variance is denoted by \(\varOmega _{{u}{u}}\). By decomposing \({u}(t)\) into \([{x}(t)^T, {y}(t)^T]^T\), we obtain the corresponding decomposition of \({\varvec{\varepsilon }}(t)\) and \(P_{{u}}(z)\) as:

$$\begin{aligned} {\varvec{\varepsilon }}_{1}(t)&=P_{11}(z){x}(t)+P_{12}(z){y}(t),\end{aligned}$$
(6)
$$\begin{aligned} {\varvec{\varepsilon }}_{2}(t)&=P_{21}(z){x}(t)+P_{22}(z){y}(t), \end{aligned}$$
(7)

and the corresponding decomposition of the variance of \({\varvec{\varepsilon }}(t)\) as: \( \varOmega _{uu}=\left[ \begin{matrix}\varOmega _{11}&{} \varOmega _{12}\\ \varOmega _{21}&{} \varOmega _{22}\end{matrix}\right] . \) We note that \(\varOmega _{{u}{u}}\) is the minimum among all the variances (measured in terms of positive definiteness) and the minimal variance of \({\varvec{\varepsilon }}_{1}(t)\) is given by \(\varOmega _{11}\). Equation (6) can be rewritten as:

$$ {\varvec{\varepsilon }}_{1}(t)=P_{11}(z)\left[ {x}(t)+P_{11}(z)^{-1}P_{12}(z){y}(t)\right] . $$

Comparing the above equation with Eq. (2), we see that, if \(P_{11}(z)^{-1}\) is stably invertible, the optimal solution to Eq. (5) is given by \(\hat{F}(z)=-P_{11}(z)^{-1}P_{12}(z)\), and the optimal one-step-ahead prediction filter of \(\{\tilde{x}(t) \}_{t\in {\mathbb Z}}\) is \(P_{11}(z)\) with the corresponding error variance given by \(\varOmega _{\tilde{x}\tilde{x}}=\varOmega _{11}\).

2.3 Partial Correlation Analysis

The PSD of \(\{\tilde{x}(t)\}_{t\in {\mathbb Z}}\) corresponding to the optimal filter is given by:

$$ S_{\tilde{x}\tilde{x}}(\theta )=P_{11}(e^{i\theta })^{-1}\varOmega _{\tilde{x}\tilde{x}}P_{11}(e^{i\theta })^{-*}, $$

where \(\varOmega _{\tilde{x}\tilde{x}}=\varOmega _{11}\). This result provides several novel measures for partial correlation analysis. We define a static partial correlation measure between the ith and jth time series of \({x}(t)\) after regressing out \(\{{y}(t)\}_{t\in {\mathbb Z}}\) as: \(C_{ij}^\mathrm{pc}=\frac{[R_{\tilde{x}\tilde{x}}]_{ij}}{\sqrt{[R_{\tilde{x}\tilde{x}}]_{ii}[R_{\tilde{x}\tilde{x}}]_{jj}}},\) where \(R_{\tilde{x}\tilde{x}}=\int _{-\pi }^{\pi }S_{\tilde{x}\tilde{x}}(\theta )\frac{d\theta }{2\pi }\) is the covariance of \(\tilde{x}(t)\). Moreover, we also define a frequency-domain correlation function as:

$$\begin{aligned} C_{ij}^\mathrm{pc}(\theta )=\frac{[S_{\tilde{x}\tilde{x}}(\theta )]_{ij}}{\sqrt{[S_{\tilde{x}\tilde{x}}(\theta )]_{ii}[S_{\tilde{x}\tilde{x}}(\theta )]_{jj}}}, \end{aligned}$$
(8)

which reflects the correlation between signals at different frequencies. We note that \(C_{ij}^\mathrm{pc}(\theta )\) is different from the partial correlation function used in [2, 4, 5], which are defined using the inverse PSD \(S^{-1}_{{u}{u}}(\theta )\). The relations between these measures will be explored elsewhere.

2.4 Causality Analysis

Granger causality is a widely used approach for studying linear dependence among multivariate stochastic processes [6]. Let \(\varOmega _{{x}{x}}\) denote the optimal one-step ahead prediction error variance of \({x}(t)\) using only its past values. Then, \(\varOmega _{{x}{x}}\) is larger than the prediction error variance \(\varOmega _{\tilde{x}\tilde{x}}=\varOmega _{11}\) obtained using the past values of both \({x}(t)\) and \({y}(t)\). One measure of linear feedback from \({y}(t)\) to \({x}(t)\) was defined in [6] as:

$$\begin{aligned} F_{{y}\rightarrow {x}}=\log \frac{\det (\varOmega _{{x}{x}})}{\det (\varOmega _{\tilde{x}\tilde{x}})}. \end{aligned}$$
(9)

A frequency domain expression for \(F_{{y}\rightarrow {x}}\) was also introduced in [6] using the following algebraic reformulation: \(\left[ \begin{matrix}Q_{11}(e^{i\theta })&{} Q_{12}(e^{i\theta })\\ Q_{21}(e^{i\theta }) &{} Q_{22}(e^{i\theta })\end{matrix} \right] =\left[ \begin{matrix}P_{11}(e^{i\theta })&{} P_{12}(e^{i\theta })\\ P_{21}(e^{i\theta }) &{} P_{22}(e^{i\theta })\end{matrix} \right] ^{-1}.\) The frequency-domain Geweke causality measure (GCM) was then defined in [6] as:

$$\begin{aligned} f^\mathrm{GCM}_{{y}\rightarrow {x}}(\theta ):=\log \frac{\det (S_{{x}{x}}(\theta ))}{\det (S_{{x}{x}}(\theta )-Q_{12}(e^{i\theta })\varOmega _{2\mid 1}Q_{12}(e^{i\theta })^*) }, \end{aligned}$$
(10)

where \(\varOmega _{2\mid 1}:=\varOmega _{22}-\varOmega _{12}\varOmega _{11}^{-1}\varOmega _{12}\). The main motivation for expressing \(f^\mathrm{GCM}_{{y}\rightarrow {x}}(\theta )\) in [6] in such form (the denominator) was to ensure proper normalization of \(f^\mathrm{GCM}_{{y}\rightarrow {x}}(\theta )\) so that the integral \( \int _{-\pi }^{\pi } f^\mathrm{GCM}_{{y}\rightarrow {x}}(\theta ) \frac{d\theta }{2\pi }=F_{{y}\rightarrow {x}} \) holds. But we note that this heuristic expression \(S_{{x}{x}}(\theta )-Q_{12}(e^{i\theta })\varOmega _{2\mid 1}Q_{12}(e^{i\theta })^*\) is not equal to the PSD of the residual process \(S_{\tilde{x}\tilde{x}}(\theta )\). Thus, it does not reflect the fraction of variance that can be explained by regressing out \(\{{y}(t)\}_{t\in {\mathbb Z}}\). Consequently, we define the optimal frequency-domain causality measure as:

$$\begin{aligned} f_{{y}\rightarrow {x}}(\theta ):=\log \frac{\det (S_{{x}{x}}(\theta ))}{\det (S_{\tilde{x}\tilde{x}}(\theta ))}, \end{aligned}$$
(11)

which also satisfies \( \int _{-\pi }^{\pi } f_{{y}\rightarrow {x}}(\theta ) \frac{d\theta }{2\pi }=F_{{y}\rightarrow {x}}\). Finally, we note that this causality measures is also invariant if a scalar filter is applied to all the time series, which can be proved following the derivation in [3]. Derivation of Eqs. (8) and (11) for computing the true partial correlation and causality are the novel contributions of this work.

3 Experiments

We used the proposed measures to analyze resting state functional MRI data from 100 subjects from HCP [16], where each subject had four 15-min run data provided in the standard surface space. The acquisition parameters were TE/TR = 33/720 ms. We used the functional parcellation of [18] to cluster the fMRI data into 7 networks, namely the Visual (VS), Somatomotor (SM), Dorsal Attention (DA), Ventral Attention (VA), Limbic (LB), Frontalparietal (FP), and Default (DF) networks. Based on the method of [18], the intra-network correlation is stronger than the inter-network correlation. For each data set, we first remove the mean value of the signal from each vertex and regress out the global average signal. Then, we take the average signal from each brain network as the representative signal and obtain a 7-dimensional time series. The aim of this experiment was to analyze the partial correlation between each pair of networks and determine their linear dependence on the other 5 regions (networks). As a first step, we estimated the autocorrelation sequence for the 7-dimensional time series for each data set of the four 15-minute runs and computed the average autocorrelation from all the 4 data sets. We then estimated the multivariate AR model that was consistent with the autocorrelation sequence for different orders. The AR model with the lowest Akaike information criterion (AIC) [1] was selected, which in our data set was order 6. From the 6th-order AR model from each subject, we computed the partial correlation coefficients \(C^\mathrm{pc}\) and \(C^\mathrm{pc}(\theta )\) between each pair of networks by regressing out signals from the other 5 networks. For comparison, we computed the standard correlation coefficient \( C_{ij}=\frac{[R_{{u}{u}}]_{ij}}{\sqrt{[R_{{u}{u}}]_{ii}[R_{{u}{u}}]_{jj}}} \) with \(R_{{u}{u}}\) being the 7-dimensional sample covariance of rsfMRI measurements, and computed the frequency-domain correlation function:

$$ C_{ij}(\theta )=\frac{[S_{{u}{u}}(\theta )]_{ij}}{\sqrt{[S_{{u}{u}}(\theta )]_{ii}[S_{{u}{u}}(\theta )]_{jj}}}, $$

without regressing out any signals. Moreover, we also compared the difference between Geweke causality \(f_{{y}\rightarrow {x}}^\mathrm{GCM}(\theta )\) and the proposed causality measure \(f_{{y}\rightarrow {x}}(\theta )\).

4 Results

Figure 1 shows a few representative plots from our experiments (more results given in supplementary material). In particular, the first row shows the correlation and partial correlation results between three pairs of networks, namely the VS-DF, DA-VA, and SM-FP networks. The plots show the average correlation for 100 subjects. The results from the other pairs of networks are similar to these three networks, but due to space constraints are not included in this paper. The dashed blue line shows the static correlation coefficient and the solid blue line is the frequency-domain correlation function. The dashed and solid red lines are the static and frequency-domain partial correlation functions, respectively. The second row shows the causality measures which reflects the linear dependence of the 2 networks on the other 5 networks.

We note that the static partial correlation coefficients (dashed red) for VS-DF and SM-FP is much higher than the standard correlation coefficients (dashed blue), but they have similar values in the DA-VA networks. Note that, the frequency domain correlation and partial correlation is able to reveal more detailed information at different frequencies. In particular, the partial correlation is much higher than the standard correlation at low frequencies for all the three pairs of networks. The significant differences at low frequencies are expected since the BOLD fluctuations are primarily in the 0–0.1 Hz range [15]. We also note that there was no temporal filtering applied in the preprocessing step to remove high-frequency signals.

The causality analysis results in the lower half of figures provide information about the fraction of variance that was explained by dynamic regression. The proposed measure \(f_{{y}\rightarrow {x}}(\theta )\) explains much higher variance than \(f_{{y}\rightarrow {x}}^\mathrm{GCM}(\theta )\) in the 0–0.1 Hz low-frequency range, indicating that the proposed dynamic regression has successfully regressed out low-frequency BOLD signals in the analyzed signal. But this information is not shown in the \(f_{{y}\rightarrow {x}}^\mathrm{GCM}(\theta )\) functions, which are more or less uniform at different frequencies. Thus, our results provide supporting evidence for using dynamic regression to regress out undesired low-frequency influences from other networks. We also note that at high frequencies (> 0.1 Hz), the BOLD signal is dominated by noise and hence \(f_{{y}\rightarrow {x}}(\theta )\) may take negative values. This is consistent with standard fMRI processing pipelines where signal above 0.1 Hz is filtered out.

Fig. 1.
figure 1

The average correlation and causality measures from the 100 HCP subjects in the VS-DP, DA-VA, and SM-FP networks: The first row shows the correlation and partial correlation results for each pair of networks, and the second row shows the Geweke and the proposed causality measures. As can be seen, the proposed method is able to explain a significant portion of the variance.

5 Conclusion

We proposed a novel framework for partial correlation and causality analysis using a dynamic regression model. We derived solutions to this dynamic regression problem using optimal prediction theory and proposed several novel partial correlation and causality measures. Our solution provide a frequency-domain causality measure provides information about the fraction of regressed variance at different frequencies which is not provided by the classical result from [6], which to-date is one of the most popular methods used to analyze fMRI time series data. Our initial experimental results using HCP data provide supporting evidence for using the proposed dynamic regression method for functional brain network analysis.