Keywords

1 Introduction

Analyzing quasiperiodic variations in a video sequence is frequently performed in medical imaging with the goal of extracting information related to the cardiac or respiratory cycles. Natural video of a person’s face or hand can be used for non-contact monitoring of vital signs including heart rate, respiratory rate, and pulse transit time [10, 11] or to generate maps showing the spatial distribution of tissue perfusion [7, 12]. Fourier-decomposition MRI is an emerging technique for imaging lung perfusion and ventilation that relies on Fourier analysis of a sequence of non-contrast enhanced MR images [3]. Another area where periodicity is useful in medical imaging is in the detection of critical pulsating structures in medical interventions [1, 2, 8, 9]. Most of these techniques require extensive filtering and preprocessing that have been fine-tuned for their respective problems. This is necessary to produce a sufficiently clean periodic signal that can be extracted through Fourier analysis or band-pass filtering.

In this paper we propose the use of dynamic linear modeling for analyzing periodicity in video sequences. We show how a cyclic + random walk model can be used to estimate the frequency and amplitude of quasi-periodic components. We also propose a log-likelihood ratio statistic for determining the presence of periodicity. In addition, we derive the power spectral density function for this model and show that it closely resembles that of the observed spectrum commonly found in video sequences. This approach is applied to natural video, ultrasound and MRI.

2 Methods

2.1 Dynamic Linear Models

Dynamic Linear Models (DLM) are linear state-space time series models [6] of the form,

$$\begin{aligned} y_t = Z a_t + \epsilon _t, \qquad \epsilon _t \sim N(0,\varSigma _\epsilon ), \qquad a_t = T a_{t-1} + \eta _t, \qquad \eta _t \sim N(0,\varSigma _\eta ), \end{aligned}$$
(1)

where \(y_t\) and \(a_t\) are the observation and state vectors at time t. The observation and state transition matrices are Z and T with independent normally distributed noise covariances \(\varSigma _\epsilon \) and \(\varSigma _\eta \). Often these are block diagonal matrices formed from several simpler models whose parameters have an intuitive or physical meaning. Maximum likelihood estimates (MLE) for unknown parameters in Z, T, \(\varSigma _\epsilon \) and \(\varSigma _\eta \) can be obtained by numerically optimizing the log-likelihood function,

$$\begin{aligned} \log L(\theta \vert Y_t) = \sum \log p(y_t \vert Y_t,\theta ), \end{aligned}$$
(2)

where \(Y_t\) denotes the vector of observations up to time t, \(\theta \) are the unknown model parameters and \(p(\cdot )\) is the probability density function. Kalman filtering is used to evaluate \(\log L(\theta \vert Y_t)\).

We propose a nested DLM to model quasi-periodicity in video sequences. This model consists of a stationary cyclic component, random walk component and additive measurement noise as specified in Eq. 3,

$$\begin{aligned} Z = \begin{bmatrix} 1&0&1 \end{bmatrix}, \quad T = \begin{bmatrix} \rho \cos {\omega _0}&\rho \sin {\omega _0}&0\\ -\rho \sin {\omega _0}&\rho \cos {\omega _0}&0\\0&0&1 \end{bmatrix}, \quad \varSigma _\epsilon =\sigma _n^2, \quad \varSigma _\eta = \begin{bmatrix} \sigma _c^2&0&0\\0&\sigma _c^2&0\\0&0&\sigma _l^2 \end{bmatrix}. \end{aligned}$$
(3)

The parameters \(\omega _0\) and \(\rho \) represent the frequency and bandwidth of the cyclic component. The variances \(\sigma _c^2\), \(\sigma _l^2\) and \(\sigma _n^2\) specify the strength of cyclic, random walk and additive noise components respectively. In the state vector, \(a_t = [a_{t,1}, a_{t,2}, a_{t,3}]^T\), the first two states, \(a_{t,1}\), and \(a_{t,2}\) are analogous to the real and imaginary components of a complex oscillator, while the third state, \(a_{t,3}\) follows a random walk to account for signal drift and other low frequency variations. The model observation matrix, Z, adds \(a_{t,1}\) and \(a_{t,3}\) to obtain a cyclic model with a local level that follows a random walk.

The power spectral density of the model can be derived by breaking apart the cyclic and random walk components. Transfer functions from the cyclic component state noise \(\eta _{t,1}\) and \(\eta _{t,2}\) to the output \(y_t\) can be obtained directly from the state-space model:

$$\begin{aligned} G_1(z) = \frac{z - \rho \cos (\omega _0)}{z^2 -2\rho \cos (\omega ) z + \rho ^2}, G_2(z) = \frac{- \rho \sin (\omega _0)}{z^2 -2\rho \cos (\omega ) z + \rho ^2}, \end{aligned}$$
(4)

where \(G_1(z)\), \(G_2(z)\) are the respective transfer functions and z is the forward shift operator. The power spectral density can be obtained by evaluating the transfer functions at \(z=e^{j\omega }\) yielding,

$$\begin{aligned} S_{yy}^{cyclic}(\omega ) = \frac{\sigma _c^2 (1+\rho ^2-2\rho \cos {\omega _0} \cos {\omega })}{[1+\rho ^2-2\rho \cos (\omega - \omega _0)] [1+\rho ^2-2\rho \cos (\omega + \omega _0)]}. \end{aligned}$$
(5)

The random walk is non-stationary with a well known frequency drop-off approximately proportional to \(\omega ^{-2}\). An exact expression for its power spectral density, \(S_{yy}^{rw}(\omega )\), with a finite length time series can be obtained through Fourier analysis where the Fourier transform of the random walk state, \(a_{t,3}\) is expressed in terms of the incremental state noise, \(\eta _{t,3}\):

$$\begin{aligned} F(a_{t,3})=\sum _{i=0}^N F( \eta _{i,3} H(t-i) ), \end{aligned}$$
(6)

where F is the discrete Fourier transform operator and H(t) is the Heaviside function. Since \(\eta _{t,3}\) are independent random variables, the power spectral density can be obtained from the expected value of Eq. 6 as,

$$\begin{aligned} S_{yy}^{rw}(\omega )= \mathbf {E} {|}F(a_{t,3}){|}^2 = \sum _{i=1}^N \mathbf {E}{|}F(\eta _{t,3} H(t-i)){|}^2=\frac{\sigma _l^2}{1-\cos {\omega }}. \end{aligned}$$
(7)

The total power spectral density for Eq. 3 is obtained from Eqs. 5 and 7 plus the white additive measurement noise,

$$\begin{aligned} S_{yy}(\omega ) = \frac{\sigma _c^2 (1+\rho ^2-2\rho \cos {\omega _0} \cos {\omega })}{[1+\rho ^2-2\rho \cos (\omega - \omega _0)] [1+\rho ^2-2\rho \cos (\omega + \omega _0)]}+\frac{\sigma _l^2}{1-\cos {\omega }} +\sigma _n^2. \end{aligned}$$
(8)

As can be seen in Fig. 1, the power spectral density of the model slopes downward at −20 dB/decade with a small peak near \(\omega _0\) corresponding to the cyclic component. This closely resembles the empirical spectrum observed in Figs. 2 and 4. When \(\sigma _c^2 \rightarrow 0\), the cyclic component vanishes and we are left with just the random walk and additive noise also known in structural time series as the local level model. Thus, to test for the presence of a cyclic component, we can test \(\mathcal{H}_0: \sigma _c^2 = 0\) against simple negation. Let \(\hat{\theta }_0\) denote the MLE in the restricted model, \(\sigma _c^2 = 0\). Then, if \(\mathcal{H}_0\) is correct, the likelihood-ratio test statistic,

$$\begin{aligned} D = -2 [\log L(\hat{\theta } \vert Y_t) - \log L(\hat{\theta }_0 \vert Y_t)], \end{aligned}$$
(9)

measures the relative plausibility of \(\mathcal{H}_0\). We use D to test for periodically varying regions in the video.

Fig. 1.
figure 1

The analytic spectrum of the DLM given in Eq. (8) and convergence of the MLE of model parameters are verified through numerical simulations. These graphs are shown for \(\omega _0 = 0.3\), \(\rho = 0.95\), \(\sigma _c/\sqrt{(1-\rho ^2)} = 5\sigma _l = 5\sigma _n\)

2.2 Experiments

To demonstrate the effectiveness of this model, we considered three very different datasets consisting of MRI, ultrasound and natural video. The first dataset consisted of natural video of a human hand. We demonstrate how DLM can be used to estimate the frequency (heart rate) and amplitude (perfusion map) from the video. Next we apply DLM to an ultrasound video of the lumbar spine where pixels in the dura exhibit subtle pulsation. Here we use the likelihood ratio statistic, D, to test for the presence of periodicity. Finally, we consider a free-breathing lung MRI sequence where local ventilation images can also be generated from DLM.

3 Results

3.1 Natural Video

Photoplethysmogram (PPG) is an optical measurement of cardiac activity. Typically, infrared light is used in pulse-oximeters due to better tissue penetration depth. However, the ubiquity of digital cameras has led to increasing interest in monitoring vital signs using ambient visible light. These systems have the advantage of monitoring vital signs remotely without requiring carefully controlled lighting or any direct contact with the patient. Methods have been proposed to extract heart rate, respiratory rate, and pulse transit time from videos of a person’s face or hand [10, 11]. Beyond simply measuring vital signs, it is also possible to generate spatial maps showing variations in the magnitude of the PPG signal [7, 12]. PPG imaging has the potential to show tissue perfusion relevant to many clinical problems such as evaluating skin-flaps and burn injuries. These techniques typically require extensive preprocessing to detrend the PPG signal and remove the effects of motion or variations in ambient light. For this experiment 8 videos were acquired showing the hands of seven subjects. Videos 1&2 were aquired of the same subject to demonstrate perfusion mapping. The hand was gently scratched between these aquisitions to stimulate blood flow. This test has been previously used for perfusion mapping techniques based on PPG imaging [7] and laser Doppler imaging [4].

First, we demonstrate that heart rate can be estimated from all 8 videos. A PPG signal was extracted by averaging the green channel intensity over a 400\(\,\times \,\)400 block as shown in Fig. 2. The proposed DLM was fit to the PPG signal and the estimate of \(\omega _0\) compared with the Fourier spectral peak and the readings from the pulse oximeter (Table 1). The absolute error between DLM frequency estimation and the nearest pulse oximeter reading was \(2.3\pm 1.0\) bpm (mean ± standard error). This was significantly lower (\(p<0.05\)) than that estimates through simple Fourier analysis or quadratic peak interpolation (QPI) of the spectral peak, which were \(5.4 \pm 1.3\) and \(3.8 \pm 1.4\) respectively.

Fig. 2.
figure 2

Video clips were recorded of the subjects’ hands and the PPG signal was calculated from the average intensity over the 400\(\,\times \,\)400 box. The −20 dB/decade slope observed in the spectrum is consistent with the proposed DLM.

Table 1. Heart rate estimated from the 8 video clips (bpm)

Next, to generate perfusion maps, the DLM is fitted on a per pixel level for videos 1&2. Here, the amplitude of the cyclic component corresponds to perfusion. The original 1080\(\,\times \,\)1920 videos were reduced to 135\(\,\times \,\)240 by applying a Gaussian blur with \(\sigma =8\) and a down-sampling factor of 8. The DLM was fit with the frequency fixed to the MLE estimate given in Table 1. The quantity \(\sigma _c/(\sqrt{1-\rho ^2})\), corresponding to the amplitude of the cyclic component, is shown in Fig. 3. For comparison, the Fourier estimates of amplitude are also shown.

Fig. 3.
figure 3

PPG imaging using Fourier analysis and DLM. The yellow arrow indicates where the hand was scratched.

Without any preprocessing or tuning for this specific problem, a relatively simple DLM was able to identify very subtle changes in tissue perfusion that occurred after gently scratching the hand. Furthermore, this was accomplished with very short video clips only 5 s in length unlike previous methods that require much longer video clips and extensive preprocessing and detrending.

3.2 Ultrasound

Dural pulsation is a valuable cue in ultrasound guided epidural injections. Previously McLeod et al. [8] proposed an extended Kalman filtering (EKF) method that estimated frequency and amplitude of the pulsating dura on a per-pixel basis in lumbar spine ultrasound. Here, our main objective is to identify which pixels in the image exhibit periodicity. The likelihood-ratio statistic in Eq. 9 is ideal for this purpose. We fit the proposed DLM on a per-pixel basis to a video of the lumbar ultrasound and compared it against those obtained from the EKF method in McLeod et al. [8]. The results are nearly identical despite the EKF having being developed for this application only, and requiring extensive smoothing and tight thresholds on the frequency and amplitude (Fig. 4).

Fig. 4.
figure 4

The average power spectrum of the lumbar ultrasound ROI exhibits an approximately −20 dB/decade slope with a small peak at the cardiac frequency, closely resembling the theoretical DLM spectrum in Fig. 1. The likelihood ratio estimated from the DLM immediately reveals the location of the dural pulsation. The results are very similar to the hand-crafted EKF method [8].

3.3 MRI

Fourier-decomposition of free-breathing proton MRI (FDMRI) has recently emerged as a non-contrast enhanced MRI technique to generate regional pulmonary ventilation maps on any clinically available MRI system [3, 5]. This technique exploits fast pulmonary MRI acquisition and non-rigid image registration to acquire a time series of registered proton MR images. Since the proton density within lung tissue varies with the respiratory cycle as the aveoli expand and contract, periodicity in the signal intensity of the registered images provides a measure of tissue ventilation. In FDMRI, the amplitude of the Fourier component at the respiratory frequency is used as a measure of lung ventilation. We acquired a dynamic free-breathing MRI of a non-small cell lung cancer patient over 125 s at a rate of 4 frames per second where the left lung was obstructed and poorly ventilated. A hyperpolarized 129Xe MR static ventilation image was acquired as a benchmark and shows a lack of ventilation in the left lung. The DLM was fit to the MRI sequence (Fig. 5). The DLM amplitude map was visually similar to FDMRI, but with slightly better rejection of background tissue motion. The likelihood-ratio statistic provided a statistical test for the presence of ventilation. It showed a lack of ventilation in the left lung, only exhibiting motion artifacts around the lung boundary and was qualitatively closest to the 129Xe MRI.

Fig. 5.
figure 5

The benchmark 129Xe MR image is compared with FDMRI and the two DLM images (amplitude, and likelihood-ratio). The cyan overlays display the ventilation maps and the yellow arrows point to the lack of ventilation in the left lung. Of the three images generated from the free-breathing sequence, the DLM likelihood-ratio appears most similar to the 129Xe MR image.

4 Discussion and Conclusion

We have shown how DLM provides a powerful framework for analyzing periodicity in various video sequences. The same DLM was applied directly to natural video, ultrasound and MRI without additional preprocessing or fine-tuning and it provided frequency and amplitude estimates as well as log-likelihood statistic testing for the presence of periodicity. The main strength of this model is that it provides a general and robust method for analyzing periodicity in video sequences that typically require individually handcrafted techniques. In addition, DLM can easily be extended to include multiple frequencies and harmonics as well as more advanced background noise models.