1 Introduction

Cardiac motion estimation and regional deformation analysis are important for detection of myocardial dysfunction. Tracking methods typically follow speckles (texture patterns) or image-derived features (e.g. surfaces) over the image cycle to produce a Lagrangian dense motion field, where the displacement vectors at each image frame references the material point of the initial image frame. However, inherent properties of ultrasound (US) can create image artifacts that cause speckle de-correlation and poor motion tracking results. Therefore, effective regularization of the raw tracking results is essential. Various methods of speckle tracking have been proposed in the past, including block-matching [1], optical flow [2], and registration techniques [3, 4]. These methods generally apply spatial and temporal regularization separately.

Several recently proposed approaches utilize joint spatiotemporal regularization. A free-form deformation (FFD) method with 2D spatiotemporal B-splines, proposed in [3], was extended to regularize velocities (instead of displacements) using diffeomorphi FFD with 3D B-splines [4]. However, B-splines require an priori carefully defined explicit grid of points that may yield bias in the tracking data. Recently, we proposed [5] learning joint spatiotemporal cardiac motion patterns via sparse dictionaries and reconstructing noisy tracking results with the learned dictionary. However, due to the inherent limitations of K-SVD dictionaries shown in [5], the dictionary representation was applied only to the high-error trajectories, yielding limited regularization performance.

In this work, we propose a neural-network based method that eliminates the above limitation and applies spatiotemporal regularization to the entire myocardium. The regularization procedure is learned by feeding 4D Lagrangian displacement patches to a multi-layered perceptron (MLP) network [6]. We demonstrate the effectiveness of our procedure for regularization of different tracking techniques, including block matching on radio-frequency (RF) images [1], non-rigid registration using FFD [7] and a graph-based tracking method with learned weights [8]. We further propose combining complementary tracking methods using a multi-view learning framework [9]. Our experiments show that combination of complementary tracking methods lead to the best overall estimation. Finally, we apply the combined architecture on a different set of 4D echocardiography images and show the plausibility of domain adaptation. This implies that the learned regularization procedures can be adapted and applied to other echocardiography datasets for improvement of tracking and strain estimations.

In our experiments, we use 8 synthetic cardiac sequences from [10], that simulate different physiological conditions, including one normal, 4 sequences with occlusions in the proximal (ladprox) and distal (laddist) parts of the left anterior descending coronary artery, left circumflex (lcx), right coronary artery (rca), 3 sequences with dilated geometry with 1 synchronous (sync) and 2 dyssynchronous (lbbb, lbbbsmall) sequences. These sequences contain realistic US features that simulate challenges for tracking. Each sequence contains 2250 ground truth trajectories. In [11], five speckle tracking algorithm are validated and compared using these sparse ground-truth trajectories located at grid-intersection points. However, regional tracking and strain validation is more appropriate for precise localization of myocardial injury. In this work, we spatially interpolate the sparse ground truth trajectories to produce dense ground-truth displacement fields for evaluation of both dense tracking and regional strain accuracy.

Fig. 1.
figure 1

Extraction of 4D spatiotemporal patches from dense displacement field

Fig. 2.
figure 2

Process diagram for training and testing of MLP architecture

2 Method

2.1 Initial Tracking Methods

We demonstrate our method on three widely used, distinctive cardiac tracking methods: radio-frequency image-based block matching (RFBM), free-form deformation (FFD), and flow-network tracking (FNT).

RFBM is a block-matching method applied to 3D radio-frequency (RF) echocardiography images in a spherical coordinate system. Given two subsequent frames, the algorithm maximizes normalized cross-correlation (NCC) between a 3D block defined around every voxel in the first RF frame and a 3D block within a search region in the second frame [1]. FNT tracks discrete points on the myocardium surfaces while enforcing spatial and temporal consistency in the resulting trajectories. The tracking problem is defined with a graphical framework, where the nodes represent points on the endocardial and epicardial surfaces, and edges define spatial and temporal connections among points. The edge weights are learned using a Siamese network. The objective function finds optimal trajectories that adhere to edge weights while subjected to physiological constraints [8]. FFD finds a global transformation given a set of fixed grid points. The grid points parameterize a B-spline transformation that best minimizes the difference between a reference frame and an adjacent frame. Spatial regularization is imposed both implicitly, via the smooth B-splines, and explicitly, via minimizing the bending energy and uses a coarse to fine optimization scheme [7]. For each method, the resulting frame-to-frame displacement field is temporally interpolated and propagated to produce Lagrangian displacement fields and sampled into \(X_{train}\) and \(X_{test}\) as illustrated in Fig. 1.

We spatially interpolate the sparse set of ground-truth trajectories, provided in [10], with radial basis functions (RBFs) using the method described in [12]. The resulting frame-to-frame displacement fields are temporally interpolated and propagated to produce the Lagrangian displacement field, where ground-truth trajectory patches \(Y_{train}\) and \(Y_{test}\) are sampled as illustrated in Fig. 1.

2.2 Spatiotemporal Displacement Regularization Learning

In the training stage, given initial Lagrangian noisy tracking data, the optimal parameters \(\theta ^*\) are found by solving:

$$\begin{aligned} \theta ^* = \mathop {\arg \,\mathrm{min}}\limits _{\theta } \frac{1}{N}\sum _{i=0}^{N-1}\log \cosh [Y_{train}^{(i)}- f_{\theta }(X_{train}^{(i)})]\;, \end{aligned}$$
(1)

where \(Y_{train}^{(i)}\) is the ground-truth trajectory patch, and \(f_{\theta }(X_{train}^{(i)})\) is the regularized trajectory patch for sample i over N samples. While \(L_2\) norm (i.e. sum of squared distances between the patches’ pixels) is widely used, we use the Mean Log-Cosh error, which is more robust to noise and outliers [13].

We approximate \(f_{\theta }\) using a MLP network f with three fully-connected hidden layers and parameters \(\theta \). To accelerate learning, we use rectified linear units (ReLU) as our activation function. To avoid overfitting, we incorporate a dropout layer after each activation layer. Dropout randomly drops the output of each neuron during training in order to avoid co-adaptation among neurons [14].

During testing, we apply the neural network with the learned parameters \(\theta ^*\) onto the noisy trajectory patches \(X_{test}\) to produce corresponding regularized displacement trajectories. We then reconstruct the dense displacement field by averaging the overlapping regularized trajectories.

2.3 Soft-Threshold Outlier Regularization

Next, we outline our soft-threshold regularization approach. As described in Fig. 2, in order to train the network, we need to provide pairs of noisy - ground truth trajectory patches. However, similar to [5], we observed oversmoothing of initially well-tracked trajectories. Hence, better tracking performance was achieved when the learned regularization function was applied only on the outliers trajectory patches (that were detected via stacking an additional neural network). However, regularization of only selected trajectory patches created spatial displacement discontinuities that caused high derivatives and noisy strain estimations. Therefore, instead of applying hard threshold regularization (i.e. determining whether to regularize a certain trajectory), we implicitly learned soft-threshold regularization by simultaneously training our MLP architecture with both ground truth-ground truth as well as noisy-ground truth pairs of data. MLP learned to regularize both initially well-tracked trajectories via learned identity function and poorly-tracked trajectory patches via learned regularization function. Thus, the trade-off between good signal preservation and spatially smooth regularization is learned.

Fig. 3.
figure 3

Multiview Learning Architecture for integrating two tracking methods

Fig. 4.
figure 4

RFBM vs. FNT tracking error at cross-sectional slice of myocardium for ladprox. RFBM error is higher near the boundaries but lower inside myocardium

2.4 Combining Complementary Methods via Multiview Learning

Next we describe our multi-view MLP architecture. As illustrated in Fig. 4, RFBM performs better within the myocardium, while FNT performs better near the boundaries of the myocardium. Therefore, RFBM and FNT might complement each other in these regions. Inspired by the multi-view learning framework [9], we utilize trajectory patches from two different methods (i.e. RFBM and FNT) and combine them at the input layer of the regularization net as shown in Fig. 3.

3 Experiments and Results

We resampled each voxel to \(0.5\,\mathrm{mm}^3\) with image size 75\(\,\times \,\)75\(\,\times \,\)61 voxels. To test our method, we used a leave-one-image-out scheme, training on 7 images and testing on the 8th image. Training patches were sampled with a stride of 2 in each direction, and we used 5\(\,\times \,\)5\(\,\times \,\)5\(\,\times \,\)32\(\,\times \,\)3 (3 for x–y–z directions) for normal geometry images and 5\(\,\times \,\)5\(\,\times \,\)5\(\,\times \,\)39\(\,\times \,\)3 for dilated geometry images (around 100,000 patches). Test patches were sampled with a stride of 1 (around 22,000 patches). For each MLP, we utilized three hidden layers with 1000 neurons each along with dropout with probability of 0.2. Average test time is around 800 s.

3.1 Quantitative Results

We quantitatively evaluated the performance of our algorithm on dense trajectories. Table 1 shows that applying the neural network-based spatiotemporal regularization (NNSTR) to RFBM, FNT, and FFD yielded significant improvements in tracking accuracy for all three methods over both initial tracking and dictionary learning-regularized trajectories (DL) [5]. In addition, combining RFBM and FNT in the multi-view learning framework further improved the tracking accuracy by leveraging the complementary nature of FNT and RFBM tracking.

We also analyzed our performance via regional strain analysis. We computed strain as \(E_f = \frac{1}{2}[\nabla U_f + (\nabla U_f)^T + (\nabla U_f)\cdot (\nabla U_f)^T]\), where \(U_f\) is the Lagrangian dense displacement at frame f. We projected the strain tensor in clinically relevant radial (Rad.), circumferential (Cir.), and longitudinal (Long.) directions. We summarize strain performance improvements in Table 2.

3.2 Qualitative Results

Figure 5 shows the median strain curves within each segment of mid-cavity according to the American Heart Association (AHA) 17-segment standard. RFBM estimates radial strain poorly due to relatively high deformation (see also Table 1). FNT estimates radial strain well due to its restriction of the tracking space to myocardial surfaces and capturing high deformations. However, FNT tends to underestimate circumferential strain due to the lack of surface features that capture torsion, while RFBM captures rotational motion well. Applying NNSTR to RFBM and FNT individually indeed yielded improvement. Further combining RFBM and FNT using the proposed multi-view architecture, thus exploiting the complementary nature of these two methods, produced better overall results for both radial(Fig. 5a) and circumferential (Fig. 5b) strains. Figure 6 shows that NNSTR and combined method significantly reduced the spatial noise, producing more clinically plausible results. In the case of lcx, the combined method leveraged FNT to produce better estimation than regularizing RFBM only.

Table 1. Median tracking error (mm) per frame compiled for all 8 studies for all trajectories within myocardium
Table 2. Median strain error (%) per frame between estimated strain and ground-truth strain compiled for all 8 studies for all trajectories within myocardium
Fig. 5.
figure 5

Strain (\(\%\)) vs. time in the mid-cavity according to the AHA 17-segment model

Fig. 6.
figure 6

Radial strain during end-systole produced with RFBM, regularized RFBM, combined method, and GT interpolated at epicardium

Finally, we trained the multi-view learning architecture combining RFBM and FNT with all 8 synthetic images, and we applied our learned network on a completely different set of in vivo open-chest canine data (N = 5 studies) acquired using our Philips iE33 scanner and X7-2 probe (conducted in compliance with Institutional Animal Care and Use Committee policies). For each canine study, we applied NNSTR to a baseline image and a corresponding image with occlusion in the Left Anterior Descending (LAD) artery for simulation of high stenosis. Figure 7 shows example displacements from RFBM and regularized displacements, which are smoother and physically plausible compared to the original RFBM results. Figure 8 shows example radial strain for both baseline and high stenosis case. We noticed again that our multi-view architecture learns the complementary nature of FNT and RFBM and produced radial strain that resembles FNT. Finally, we expected to see motion abnormalities at the Left Ventricle(LV)-Right Ventricle(RV) junction due to occlusion in the LAD. This is captured in the radial strain map of the combined method.

Fig. 7.
figure 7

Displacements at end-systole from canine images in the horizontal, vertical, and longitudinal directions for RFBM and regularized RFBM with combined architecture

Fig. 8.
figure 8

Radial strain (%) at end-systole from FNT, RFBM, and combined architecture estimated on canine images. Strain from combined method shows expected dysfunction from LV-RV junction

4 Conclusions

In this work, we proposed a learning-based method for spatiotemporal regularization of myocardial tracking. The regularization procedure was learned by feeding 4D Lagrangian displacement trajectories to a multi-layered perceptron (MLP) network. We showed effectiveness of our method on three distinct tracking methods: RF-block matching (RFBM), non-rigid registration (FFD), and a graph-based myocardial surface tracking method (FNT). We further proposed a multi-view learning framework that learned to leverage the complementary nature of FNT and RFBM to produce better estimations than individual regularization. Finally, we showed how our learned regularization model can potentially be applied to other echocardiography datasets via domain adaptation.