1 Introduction

COPD is the fourth leading cause of death and 24 million people are afflicted in the US alone [14]. Lung registration could support assessment of COPD phenotypes as well as disease progression [6] and improved treatment of patients and speedups in clinical workflows are expected if registration is used for follow-up inspection and/or motion estimation in treatment planning [12, 14]. Hence, registration of inhale/exhale and longitudinal data is critical, but also challenging due to the presence of large non-linear deformations, which should be diffeomorphic [12]. LDDMM can address these difficulties [2]. However, its drawbacks are large memory use and high computational costs: e.g. run times of up to three hours are reported in [17] for moderately sized data of \(256\times 192 \times 180\) voxels using 32 CPUs and 128 GB RAM. We present a new variant of LDDMM which

  • has significantly lower computational and memory demands,

  • employs the well-suited distance measure Normalized Gradient Fields [11],

  • and is as accurate as standard LDDMM and state-of-the-art algorithms.

Lung registration faces severe challenges. Diffeomorphic modeling of large lung motion (for inhale/exhale in the order of 50 mm [5]) is necessary [12]. Lung volume changes are large: a doubling of lung capacity is not unusual during inspiration [5]. Additionally, acquisitions with limited dose deteriorate CT quality [14].

Related Work: Sakamoto et al. [17] applied LDDMM in pulmonary computer-aided diagnosis (CAD) to analyze and detect growing or shrinking nodules in follow-up CT scans with encouraging results, but at high computational cost. A finite dimensional Lie algebra was introduced and integrated into a geodesic shooting approach by Zhang and Fletcher [19] to considerably speed up computations. Their experiments on 3D brain MRI data showed no loss of accuracy. Risser et al. [15] used appropriate kernels for LDDMM to cope with sliding motion occurring at the interface between lungs and the ribcage as this motion is not diffeomorphic. Our focus is on the registration of inner lung structures, hence we use lung segmentations as in [16] and thereby avoid sliding issues.

Contributions: We focus on the relaxation formulation of LDDMM [2], but the approach could easily be generalized to shooting formulations [1, 18]. Our scheme is based on the smoothness of transformations and velocities computed with LDDMM allowing discretizations at lower spatial resolution and consequentially substantially reduced memory requirements. The image match is computed at the original resolution thereby maintaining accurate results. Furthermore, we employ the Normalized Gradient Fields (NGF) image similarity measure [11] that aligns edges, e.g. vessels. We use NGF to cope with large volume changes influencing tissue densities and thus absorption of X-rays. The resulting gray values of the parenchyma are quite different for inhale and exhale scans. Hence, sum of squared differences might not be an appropriate similarity measure.

Outline: In Sect. 2 the optimization problem, including the energy with NGF distance measure, and the memory efficient implementation of LDDMM are described. We evaluate our method on 20 3D lung CT scan pairs in Sect. 3, compare the results to the state of the art and show that a coarser discretization of velocities and transformations is sufficient for an accurate registration. In Sect. 4 results and possible extensions are discussed.

2 Methods

2.1 Continuous Model and LDDMM Background

Let denote the moving and fixed images with domain \(\varOmega \). LDDMM uses space- and time-dependent velocity () and transformation () fields and seeks the minimizer \((v^*,\varphi ^*)\) of the following energy subject to a transport constraint [2, 7]:

$$\begin{aligned} \left. \begin{array}{l} E(v,\varphi ) := \int _0^1 \langle Lv(\cdot ,t),Lv(\cdot ,t) \rangle \mathrm {d}t + \frac{1}{\sigma ^2} D(\varphi (\cdot ,1) ;I_0,I_1) \rightarrow \min , \\ \mathrm {s.\;t.} \,\,\,\, \varphi _t + J_\varphi v = \varvec{0}, \quad \varphi (\varvec{x},0) = \varvec{x} \text { for all } \varvec{x} \in \varOmega , ~ t \in [0,1] \end{array} \right\} \end{aligned}$$
(1)

We denote the partial time derivative as and the Jacobian with respect to the spatial coordinates as . L is a suitable differential operator enforcing smoothness of v. In our method the Helmholtz operator \(Lv := \gamma v - \alpha \varDelta v\) with \(\alpha ,\gamma > 0\) was used, which is a standard choice for LDDMM registration, cf., e.g., [2, 7]. In all experiments we fixed \(\gamma = 1\). D is a general distance measure and \(\sigma > 0\) its weighting parameter. As motivated in Sect. 1 we use the Normalized Gradient Fields (NGF) distance measure [11]. NGF was successfully applied to lung CT registration [9, 16] in the following adaption of the original formulation:

$$\begin{aligned} D(\varphi _1;I_0,I_1) := \int _{\varOmega }1-\left( \frac{\langle \nabla I_0(\varphi _1(\varvec{x})), \nabla I_1(\varvec{x})\rangle _{\eta }}{\Vert \nabla I_0(\varphi _1(\varvec{x}))\Vert _{\eta } \Vert \nabla I_1(\varvec{x})\Vert _{\eta }} \right) ^2 \ \mathrm {d}\varvec{x}, \end{aligned}$$
(2)

with \(\varphi _1 := \varphi (\cdot ,1)\), \(\langle \varvec{u},\varvec{v} \rangle _\eta := \eta ^2 + \sum _{i=1}^d u_i v_i\) and . The parameter \(\eta > 0\) is used to decide if a gradient is considered noise or edge [11]. Throughout this paper \(\eta = 100\) is used as proposed in [16].

Following the steps of [7] we add the transport equation constraint of (1) into the objective functional employing Lagrange multipliers . After some straightforward calculations we obtain the necessary conditions as

$$\begin{aligned}&L^\dagger Lv + J_\varphi ^\top \lambda = 0, \end{aligned}$$
(3)
$$\begin{aligned}&\varphi _t + J_\varphi v = \varvec{0}, ~\varphi (\varvec{x},0) = \varvec{x}, \end{aligned}$$
(4)
$$\begin{aligned}&\lambda _t + J_{\lambda }v + \mathrm {div}(v)\lambda = \varvec{0}, \quad \lambda (\varvec{x},1) = -(1/\sigma ^2) \nabla _{\varphi } D(\varphi _1(\varvec{x});I_0,I_1) \end{aligned}$$
(5)

for all \(\varvec{x} \in \varOmega \), \(t \in [0,1]\) and a differentiable D. Solving (4) means transporting the transformation maps according to the velocities whereas solving (5) is the flow of the image mismatch (given by the derivative of the distance measure) backward in time, cf. [7]. As in [7] we apply the smoothing kernel \((L^\dagger L)^{-1}\) before updating the velocities in (3) to improve numerical optimization:

$$\begin{aligned} p := v + (L^\dagger L)^{-1}(J_\varphi ^\top \lambda ). \end{aligned}$$
(6)
figure a

2.2 Discretization and Algorithm

From now on we use \(d = 3\). Velocities v and transformations \(\varphi \) were discretized on a nodal grid in 4D (3D + t) [11]. The number of time points was set to \(n_4 = 11\). The number of points in space varied during the multi-level optimization but was chosen equal for all spatial directions, i.e. \(n_1 = n_2 = n_3\). Defining \(\bar{n}:=n_1n_2n_3\) and using linear ordering in space yields arrays .

We use central differences and Neumann boundary conditions for the discretization of \(\mathrm {div}\), L, \(L^\dagger \) and \(J_\varphi \). NGF and its derivative were implemented according to [11]. Equations (4) and (5) were solved with a fourth order Runge-Kutta scheme. Given images with \(\bar{m}:=m_1m_2m_3\) voxels, smoothed and downsampled versions with \(m_j^i = \lfloor m_j \cdot 2^{-F+i} \rfloor \) were computed for \(j=1,2,3\) and \(i=1,\dots ,F\) [11]. Problem (1) was then solved using a coarse-to-fine resolution multi-level strategy with levels.

Choosing \(n_i = m_i\) exceeds common memory limitations and results in an extremely expensive solution of (4) and (6). As v and \(\varphi \) are assumed to be smooth functions it is usually not necessary to use a high resolution for \({\varvec{v}}\) and \(\varvec{\varphi }\). This motivates our choice of \(n_i < m_i\). Nevertheless, we use image information at the highest resolution. Hence \({\varvec{v}}\) and \(\varvec{\varphi }\) have to be prolongated to image resolution to solve (5). This can be done gradually, i.e. for only one time point at once and reduces memory consumption substantially. Following [9], we use a prolongation matrix that linearly interpolates \({\varvec{v}}\) and \(\varvec{\varphi }\) on the cell-centered grid points of the images. \({\varvec{P}}\) is sparse but large and does not need to be kept in memory [9]. We use matrix notation, but implemented a matrix-free operator.

After (5) is solved, the adjoint has to be brought back to grid resolution by computation of \({\varvec{P}}^\top \varvec{\lambda }\) which is then used to solve (3). A memory efficient way to do this is storing \({\varvec{P}}^\top \varvec{\lambda }\) instead of \(\varvec{\lambda }\) and doing computations gradually again. The numerical solution of Eqs. (3) to (6) is performed in the multi-level registration described in Algorithm 1. It is useful to start Algorithm 1 with the result of a pre-registration . Given a (preferably diffeomorphic) \(\psi \) only the initial condition of (4) has to be adapted: \(\varphi (\cdot ,0) = \psi (\cdot )\).

The discretized objective function was optimized with a L-BFGS approach [13] saving the last \(M=5\) iterate vectors for approximation of the inverse Hessian. The maximum number of iterations was set to \(k_{\max } = 50\). Additional stopping criteria proposed in [11] were used to terminate the while loop in Algorithm 1. During the optimization an Armijo line search with parameters \(\beta _k = 0.5^{k-1}\), \(k^\mathrm {LS}_{\max } = 30\) and \(c_1 = 10^{-6}\) was used to guarantee a decrease of the objective function.

3 Experiments and Results

The proposed method was applied to the DIR-Lab 4DCT [4] (see Sect. 3.1) and COPD [5] (see Sect. 3.2) data sets, as they are a well-known benchmark for lung CT registration. In total 20 inhale/exhale scan pairs are available that include 300 expert annotated landmarks each. The scans were masked with a lung segmentation obtained using [10]. For all experiments a PC with 3.4 GHz Intel i7-2600 quad-core CPU was used. We implemented Algorithm 1 with Matlab and employed C++ code for computations of D,  L and P.

3.1 Experiments for Varying n

Since we assume smooth velocities and transformations, we investigated the influence of a coarse grid on registration accuracy. The 4DCT data sets were affinely pre-registered to obtain \(\psi \) and Algorithm 1 was used with \(\alpha = 200\), \(\sigma = 0.01\) and \(F=4\) for all registrations. Initially we set \(\bar{n} = 5^3\), \(\bar{n} = 9^3\) and \(\bar{n} = 17^3\) respectively. The final number of grid points in space for the registration on full image resolution was \(33^3\), \(65^3\) and \(129^3\) accordingly. We compared the performance for different \(\varvec{n}\) parameters using the mean distance of expert annotated landmarks, runtime and memory consumption. See Table 1 for results. One-sided paired t-tests with significance level 0.05 were used to identify improvements in mean landmark error compared to the \(65^3\) method. Registrations were repeated without the finest image pyramid level to see if a reduction in image data influences accuracy. Again paired t-tests were used to compare to the methods with the same number of grid points that were registered with all levels.

Using \(129^3\) grid points does not provide the best results, which could be explained by the fact that the optimization is prone to local minima because more degrees of freedom are given. This is visible for case 7, where \(129^3\) is the fastest method but yields the worst results. Using \(65^3\) grid points results in significantly better accuracy compared to both \(33^3\) and \(129^3\) although improvements with respect to \(33^3\) are small. For memory consumption and runtime \(33^3\) is clearly the best choice while \(129^3\) requires most resources. However, memory requirements do not grow with factor 8 as there is an overhead for saving images and computing \(\varvec{\lambda }\), therefore \(65^3\) offers the best compromise. Omitting the last image pyramid level results in a mean memory consumption of 0.80, 1.98 and 10.91 GB for \(33^3\), \(65^3\) and \(129^3\) grid points respectively. Using the full image data is beneficial for the registration accuracy which is also confirmed by the significance tests.

Table 1. Mean landmark distance per DIR-Lab 4DCT data set, necessary memory and runtime. Rounding to the next regular grid voxel was performed prior to computing the distance. Best values are printed bold. Significant differences to \(65^3\) are indicated by \(^*\) and to the respective method with full image information by \(^+\).
Fig. 1.
figure 1

Overlay of a coronal slice of the fixed (blue) and (transformed) moving image (orange) of data set 10 of the DIR-Lab COPD data [5]. Aligned structures are displayed in gray or white due to addition of RGB values. Yellow circles highlight improvements. (Color figure online)

3.2 Comparison to State-of-the-Art Algorithms

In the following experiments we used a thin-plate spline (TPS) pre-registration with keypoints [8] to provide a very accurate initial estimate on the COPD data sets [5]. We ensured that \(\psi \) is diffeomorphic by removal of keypoints whose set of six nearest neighbors changes after TPS computation by more than one point, which indicates a violation of topology preservation. This procedure reduced the number of keypoints to approximately one quarter of the original number. Parameters were fixed to \(\sigma = 0.1\), \(\alpha = 85\) and \(F=5\). Motivated by the results of Sect. 3.1 we used \(\varvec{n} = (65,65,65,11)\) on the finest level. Resulting mean and standard deviations of landmark distances and p-values of one-sided paired t-tests are given in Table 2. The filtering of the keypoints results in a diffeomorphic pre-registration with worse mean landmark distances as visible by comparing the columns MRF and Prereg. respectively.

We compared the proposed method to the MILO [3] and MRF [8] algorithms, which are the two best ranked methods for registration of DIR-Lab COPD data sets. Results of the NLR [16] algorithm are reported because it also uses the NGF distance measure. The proposed registration achieves in most cases the best result and performed significantly better than the pre-registration, MILO [3] and NLR [16]. Mostly it is also better than MRF, which may have singularities while our method is diffeomorphic. The registrations took on average 46 min and used at most 5.9 GB of RAM. Figure 1 shows an exemplary central coronal slice of data set 10 without registration, after TPS pre-registration and after the proposed LDDMM registration. The TPS pre-registration captures the large motion and aligns most of the vessels. The subsequent LDDMM registration improves the alignment of lung boundaries and nearby vessels as well as small vessels as highlighted by the yellow circles.

Table 2. Mean and standard deviation of the distances of the 300 expert annotated landmarks per DIR-Lab COPD data set. All values are given in mm. Rounding to the next regular grid voxel was performed prior to computing the distances. Results for state-of-the-art methods are the reported ones from [3, 8, 16]. The p-values were computed performing paired t-tests with the hypothesis that the respective method is better than the proposed one. Best values are printed bold.

4 Discussion

In this paper a memory efficient LDDMM scheme was introduced that employs the well-suited NGF distance measure for the registration of pulmonary CT data. Although the method was tested only for lung CT images it is applicable for other anatomical sites or modalities. We accomplish at least a 25-fold reduction of memory requirements (cf. supplementary material) compared to full resolution LDDMM and obtain diffeomorphic mappings without impairment of registration quality. The proposed method achieves the currently lowest average landmark error (1.03  mm) and advances the state of the art on challenging lung CT data [5]. A trustworthy registration is vital for clinical applications such as COPD classification or CAD of lung nodule evolution [6, 14]. A possible extension of our method would be a lung mask similarity term as described in [16] to improve the lung boundary alignment. Another option is the integration of sliding motion into the model as in [15]. We also plan to submit results to the EMPIRE10 challenge [12] for additional evaluation of our method.