1 Introduction

Statistical shape models (SSMs) [1] of abdominal organs have played important roles in computer-aided diagnosis applications, including image segmentation and assessment of the degree of abnormality. Despite considerable research on SSMs for adult organs, infant abdominal organs have not been extensively investigated. The limitation of conventional SSMs is that intersubject variability is not separated from the age-related shape variation that is not negligible in pediatric datasets. Therefore, we propose a method to construct a spatiotemporal SSM, or temporally varying SSM, which we refer to as stSSM, that reflects both the intersubject and temporal sources of variation.

Over the last decade, there has been considerable research on spatiotemporal shape and image analysis [2,3,4,5,6,7,8]. Its main objective is longitudinal atlas construction that represents the trend of anatomical changes of the population. One of the most successful approaches is the kernel regression-based technique. Davis et al. [2] performed kernel regression on the Riemannian manifold of diffeomorphisms. Serag et al. [3] improved the kernel regression method using adaptive bandwidth selection to deal with non-uniformly distributed ages. Another approach is to utilize subject-specific longitudinal information (i.e., same subject taken at different timepoints) using the spatiotemporal atlas construction process as described by Durrlemann et al. [4], which would be the better choice if longitudinal data is available.

Analysis of intersubject shape variability is also required for stSSM, which can be even more challenging as intersubject change is also a function of time that has not been addressed in [2,3,4]. An early study dealing with stSSM is found in [5], where principal component analysis (PCA) was first applied to all samples, and then correlation analysis was performed to find the mode best describing the temporal change. Qiu et al. [6] performed a subject-wise analysis by registering all images to a common reference frame under the assumption that the individual change of a specific point of time can be propagated by a parallel transportation method [9]. Kishimoto et al. [7] developed a spatiotemporal statistical model of the anatomical landmarks of human embryos, where group-wise PCA was first applied to build the statistical models of each developmental stage. These statistical models were then smoothly interpolated between neighboring stages to define a continuous subspace. Similarly, Alam et al. [8] employed subject-weighted PCA instead of group-wise PCA to construct a neonatal brain shape model. The main weakness of these methods [7, 8] is that independent PCA does not ensure the consistency of the principal modes of shape variation among different timepoints.

In this paper, we propose a novel stSSM construction method for the pediatric liver based on level set shape representation. The key contributions of this paper are summarized below:

  1. 1.

    While most related studies depend on diffeomorphic registration technique [2,3,4,5,6], we employ a level set-based shape representation that can be analyzed without registration and can intrinsically deal with topological changes.

  2. 2.

    We use kernel regression with adaptive bandwidth to compute the average and covariance matrices at given timepoints, which can be seen as a generalization of the method by [8]. We employ an adaptive bandwidth method to overcome inhomogeneous distributions due to patient age.

  3. 3.

    The principal modes of shape variation are calculated by PCA with an additional regularization term to ensure the temporal consistency of the principal modes that is not considered in conventional studies [7, 8].

  4. 4.

    We use nonlinear shape interpolation based on level set morphing for data augmentation [10, 11].

  5. 5.

    The effectiveness of the proposed method is quantitatively verified based on generalization, specificity [12], and the smoothness of the shape evolution.

We note that level set is just one example of shape representation method we employed to demonstrate the concept of our spatiotemporal SSM. In principle, our statistical analysis method is applicable to other shape representations such as point distribution model and diffeomorphism-based model [2,3,4,5,6].

2 Methods

Let us denote a set of \(n\) training data as \(\{(S_{i},t_{i})\}_{i\in \{1,\ldots ,n\}}\) where \(S_{i}\subset \mathrm {\Omega }\) (\(i\in \{1,\ldots ,n\}\)) is the \(i\)-th training shape in the \(m\)-dimensional image domain \(\mathrm {\Omega }\subset \mathbb {R}^{m}\) and \(t_{i}\in [t_{min},t_{max}]\) is the associated patient age ranging between \(t_{min}\) and \(t_{max}\). In order to provide a linear operation between shapes, they are represented as the zero sublevel set of the function: \(\phi _{i}:\varvec{r}\in \mathrm {\Omega }\mapsto \mathbb {R}\) that satisfies:

$$\begin{aligned} S_{i}=\{\varvec{r}\in \mathrm {\Omega }|\phi _{i}(\varvec{r})<0\}. \end{aligned}$$
(1)

Specifically, \(\phi _{i}\) is the signed distance function of \(S_{i}\) in this study. Sampling \(\phi _{i}\) on every grid point in an image domain \(\{\varvec{r}_{1},\varvec{r}_{2},\ldots ,\varvec{r}_{p}\}\in \mathrm {\Omega }\) yields the p-dimensional shape representation vector \(\varvec{\phi }_{i}\):

$$\begin{aligned} \varvec{\phi }_{i}={[\phi _{i}(\varvec{r}_{1}),\phi _{i}(\varvec{r}_{2}),\ldots ,\phi _{i}(\varvec{r}_{p})]}^{\intercal }. \end{aligned}$$
(2)

For the sake of computational efficiency, instead of \(\varvec{\phi }_{i}\), we use a compact representation of the level set function: \(\varvec{x}_{i}\approx {P}^{\intercal }(\varvec{\phi }_{i}-\overline{\varvec{\phi }})\in \mathbb {R}^{q}\) in the following process, where: \(\overline{\varvec{\phi }}=\sum _{i=1}^{n}\varvec{\phi }_{i}\) is the mean and \(P\in \mathbb {R}^{p\times q}\) is the matrix with \(q\) (\(\le n-1\)) eigenmodes of the shape variation derived from spatially-weighted PCA [13].

The goal of stSSM is to decompose the shape variation into temporal and individual modes. To this end, inspired by [4], we assume that each shape \(\varvec{x}_{i}\) is an instance of a random process:

$$\begin{aligned} \varvec{x}_{i}=\varvec{X}(\varvec{\alpha }_{i};t_{i})+\varvec{\epsilon }_{i}, \end{aligned}$$
(3)

where \(\varvec{X}(\varvec{\alpha };t)\) is a time-dependent shape model with the subject-specific shape parameter: \(\varvec{\alpha }={[\alpha _{1},\ldots ,\alpha _{d}]}^{\intercal }\), and \(\varvec{\epsilon }_{i}\) is a Gaussian random noise i.i.d. over \(\varvec{\alpha }\) and t. In this study, we consider a linear model:

$$\begin{aligned} \varvec{X}(\varvec{\alpha };t)=\varvec{\mu }(t)+\sum _{j=1}^{d}\varvec{v}_{j}(t)\sigma _{j}(t)\alpha _{j}, \end{aligned}$$
(4)

where \(\varvec{\mu }(t)\in \mathbb {R}^{q}\), \(\varvec{v}_{j}(t)\in \mathbb {R}^{q}\) and \(\sigma _{j}(t)\in \mathbb {R}\) are the mean, eigenmode and standard deviation of the \(j\)-th (\(j\in \{1,\ldots ,d\}\)) principal mode of the shape variation at time point \(t\), respectively.

We estimate the spatiotemporal trajectory \(\varvec{\mu }(t)\) via the Nadaraya-Watson kernel regression method [14], where we employ an adaptive-kernel width strategy in order to achieve stability against an inhomogeneous distribution of the patient age. Instead of the time-varying kernel width used in [3], we employ a subject-wise kernel width inspired by [15] in order to make it differentiable with respect to time (which is required to calculate the tangent of \(\varvec{\mu }(t)\), cf. Eq.(6)):

$$\begin{aligned} \varvec{\mu }(t)=\sum _{i=1}^{n}w_{i}(t)\varvec{x}_{i},\;\text {where}\;w_{i}(t)=\frac{\frac{1}{\lambda _{i}}k\bigl (\frac{t_{i}-t}{\lambda _{i}h}\bigr )}{\sum _{i=1}^{n}\frac{1}{\lambda _{i}}k\bigl (\frac{t_{i}-t}{\lambda _{i}h}\bigr )}, \end{aligned}$$
(5)

where \(k(y)=\frac{1}{\sqrt{2\pi }}\exp \left( -\frac{y^{2}}{2}\right) \) is the Gaussian kernel function and \(h>0\) is a global bandwidth parameter. The bandwidth is adaptively changed by a subject-specific factor \(\lambda _{i}>0\), which is calculated as: \(\lambda _{i}=\bigl \{\tilde{f}(t_{i})\big /\sum _{i=1}^{n}\tilde{f}(t_{i})\bigr \}^{-\beta }\), where \(\tilde{f}(t)=\frac{1}{nh}\sum _{i=1}^{n}k\bigl (\frac{t_{i}-t}{h}\bigr )\) is a prior kernel estimator with a fixed bandwidth \(h\). The coefficient \(\beta \in [0,1]\) is the sensitivity parameter and we used \(\beta =0.5\), which has been reported to achieve good results [15]. The derivative of \(\varvec{\mu }(t)\) with respect to time is given as: \(\dot{\varvec{\mu }}(t)=\dot{w}_{i}(t)\varvec{x}_{j}\) where

$$\begin{aligned} \dot{w}_{i}(t)=\frac{\frac{1}{\lambda _{i}}k^{\prime }\bigl (\frac{t_{i}-t}{\lambda _{i}h}\bigr )\bigl \{\sum _{i=1}^{n}\frac{1}{\lambda _{i}}k\bigl (\frac{t_{i}-t}{\lambda _{i}h}\bigr )\bigr \}-k\bigl (\frac{t_{i}-t}{\lambda _{i}h}\bigr )\bigl \{\sum _{i=1}^{n}\frac{1}{\lambda _{i}^{2}}k^{\prime }\bigl (\frac{t_{i}-t}{\lambda _{i}h}\bigr )\bigr \}}{\lambda _{i}h\bigl \{\sum _{i=1}^{n}\frac{1}{\lambda _{i}}k\bigl (\frac{t_{i}-t}{\lambda _{i}h}\bigr )\bigr \}^{2}}, \end{aligned}$$
(6)

which will be used as the 0-th principal component (see Sect. 2.1).

Owing to the fact that kernel regression demonstrates some estimation difficulty for timepoints around the boundary of range \([t_{min},t_{max}]\), this is generally referred to as a boundary problem. Hence, we assume that our model supports the range of \(T=[t_{min}+h,t_{max}-h]\).

2.1 PCA with Temporal Regularization

After the average trajectory is obtained, the kernel weighted covariance matrix at time point t is computed as:

$$\begin{aligned} C(t)=\frac{\sum _{i=1}^{n}w_{i}(t)(\varvec{x}_{i}-\varvec{\mu }(t)){(\varvec{x}_{i}-\varvec{\mu }(t))}^{\intercal }}{1-\sum _{i=1}^{n}w_{i}(t)^{2}}. \end{aligned}$$
(7)

We then propose an objective function of PCA with temporal regularization. The \(j\)-th time-dependent principal mode \(\varvec{v}_{j}\) for \(j\ge 1\) is calculated as:

$$\begin{aligned} \varvec{v}_{j}=\mathop {\mathrm {arg}\,\text {max}}\limits _{\varvec{u}}\int _{T}\left\{ {\varvec{u}(t)}^{\intercal }C(t)\varvec{u}(t)-\gamma \left\| \dot{\varvec{u}}(t)\right\| _{2}^{2}\right\} dt\nonumber \qquad \qquad \\ \mathrm {s.t.}\quad {\varvec{u}(t)}^{\intercal }\varvec{u}(t)=1,\,{\varvec{u}(t)}^{\intercal }\varvec{v}_{l}(t)=0\quad \forall l\in \{0,\ldots ,j-1\},\,\forall t\in T, \end{aligned}$$
(8)

where we assume that the direction of the 0-th principal mode is the tangent of the average trajectory, i.e., \(\varvec{v}_{0}(t)=\dot{\varvec{\mu }}(t)/\left\| \dot{\varvec{\mu }}(t)\right\| \). When \(\gamma =0\), this is equivalent to applying sample weighted PCA for each point of time independently. The calculation of the unit vector \(\varvec{u}(t)\) that maximizes \({\varvec{u}(t)}^{\intercal }C(t)\varvec{u}(t)\) is equivalent to the objective of PCA. The second term evaluates the squared magnitude of the first derivative of the coordinate change, which imposes the temporal smoothness of the coordinate change.

Given that \(\varvec{u}(t)\) is a unit vector, the second term is approximated as \(\left\| \dot{\varvec{u}}(t)\right\| _{2}^{2}\approx \frac{2}{\varDelta t^{2}}(1-{\varvec{u}(t)}^{\intercal }\varvec{u}(t+\varDelta t))\) with a sufficiently small interval \(\varDelta t\). Therefore, given regularly-sampled time points \(\mathcal {T}=\{\tau _{k}\}_{k=1}^{m}\subset T\) with interval \(\varDelta t=\frac{t_{max}-t_{min}}{m-1}\), Eq. (8) can be reformulated as:

$$\begin{aligned} (\varvec{v}_{j}(\tau _{1}),\ldots ,\varvec{v}_{j}(\tau _{m}))=\mathop {\mathrm {arg}\,\text {max}}\limits _{\varvec{u}}\left\{ \sum _{k=1}^{m}{\varvec{u}(\tau _{k})}^{\intercal }C(\tau _{k})\varvec{u}(\tau _{k})+\gamma ^{\prime }\sum _{k=1}^{m-1}{\varvec{u}(\tau _{k})}^{\intercal }\varvec{u}(\tau _{k+1})\right\} \nonumber \\ \mathrm {s.t.}\quad {\varvec{u}(t)}^{\intercal }\varvec{u}(t)=1,\,{\varvec{u}(t)}^{\intercal }\varvec{v}_{l}(t)=0\;\forall l\in \{0,\ldots ,j-1\},\,\forall t\in \mathcal {T}. \qquad \qquad \end{aligned}$$
(9)

where \(\gamma ^{\prime }=\frac{2}{\varDelta t^{2}}\gamma \). This type of problem is known to be a non-convex quadratically-constrained quadratic program (QCQP). To minimize the risk of getting trapped in local minima, we initialize \(\{\varvec{v}_{1},\ldots ,\varvec{v}_{d}\}\) with the optimal solution for \(\gamma ^{\prime }=0\), which can be calculated by pure PCA. Then, the optimal solution is obtained via an internal point algorithm [16] implemented in MATLAB Optimization Toolbox. Finally, the standard deviation along the \(j\)-th axis at the \(k\)-th time point is calculated in analogy with standard PCA:

$$\begin{aligned} \sigma _{j}(\tau _{k})=\sqrt{{\varvec{v}_{j}(\tau _{k})}^{\intercal }C(\tau _{k})\varvec{v}_{j}(\tau _{k})}. \end{aligned}$$
(10)

2.2 Data Augmentation

To overcome the shortage of training data, data augmentation was performed based on level set morphing [10], which is a nonlinear morphing technique and was successfully applied to improve the performance of SSM for abdominal organs [11]. Level set morphing is applied to the pairs of cases (\(S_{i}\), \(S_{j}\)) whose age difference \(|t_{i}-t_{j}|\) is 1–2 years. A sequence of three intermediate shapes with equal surface distance is generated for each pair of shapes. Patient ages for the intermediate shapes are given by linear interpolation.

3 Experiments

We collected 42 healthy CT volumes from children aged 2 weeks to 95 months with manual annotations of the liver. The image size was: \(512 \times 512 \times (109-747)\) voxels and the resolution was: \((0.291-0.625)\times (0.291-0.625)\times (0.625-3.000)\) mm3. The patient age in days ranged from \(t_{min}=16\) to \(t_{max}=2912\). The global kernel width and the regularization parameter were empirically determined and set as \(h=300\) [days] and \(\gamma =2\times 10^{5}\), respectively. In order to validate the study, 42 cases were separated into two groups: \(G_{in}\) and \(G_{out}\), which are 34 and 8 cases inside and outside the age range supported by the kernel regression \(T=[t_{\mathrm {min}}+h,t_{\mathrm {max}}-h]\), respectively. The group \(G_{out}\) was used for training purposes only during the two-fold cross validation study on \(G_{in}\). Prior to statistical shape analysis, spatial standardization with rigid transformation was performed based on four manual landmarks on the bone, i.e., the spinous processes of the 8th and 10th vertebrae, and the left and right 8th rib tips. The image size after standardization was \(256 \times 256 \times 256\) voxels with 1.0 mm isotropic voxel size. The number of modes \(q\) was set to achieve at least 90% of the cumulative contribution ratio (CCR), and the number of modes for the time-dependent principal modes was fixed to \(d=2\) for which CCR was around 80%. The performance of the model was evaluated using generalization and specificity, which assess the ability of the model to reconstruct unknown shapes and to generate acceptable instances, respectively, where Jaccard index (JI) was used for the similarity measure.

Table 1 summarizes the performance indices obtained by the three models in the cross validation study. The proposed stSSM achieved the best generalization with a statistically significant difference when compared to the other models. In summary, our stSSM performed the best out of the three models.

Table 1. Comparison of the generalization and the specificity measures between three SSMs. Brackets indicate the statistical significance (\(**:p<0.01\), \(*:p<0.05\)).
Fig. 1.
figure 1

Comparison of the shape reconstruction ability between different SSMs. The cases A and B are the most-improved and least-improved cases in terms of the reconstruction ability by use of the regularization term, respectively.

Figure 1 displays the comparison of the reconstructed liver surface between different models focusing on two cases, the most improved and the least improved cases in JI by use of the proposed stSSM compared with the case without a regularization term (\(\gamma =0\)), respectively. Case A showed a large improvement (\(+6.8\) pt; \(0.7058\rightarrow 0.7742\)) with a large difference around the left lobe of the liver (see the red circle in Fig. 1). Even for the worst case B, the drop in JI was relatively small (\(-1.6\) pt; \(0.8189\rightarrow 0.8026\)) and had a less visual difference between the constructed surfaces. We also found that the reconstruction ability of the conventional SSM was consistently lower. These findings are consistent with the generalization value in Table 1.

Shapes generated by stSSM with and without temporal regularization are shown in Fig. 2(a) and (b), respectively. Since we expect that changes along time \(t\) for the fixed shape parameter \(\alpha \) represent the shape development for the specific patient, the shape deformation displayed in each row should be smooth. We measured and displayed in Fig. 2 the smoothness of temporal change by accumulating the surface distance between neighboring time points for each row (shape parameters). We found that our stSSM showed a consistently lower value in accumulated surface distance, which suggests the applicability of our SSM to predict shape development for unknown subjects.

Fig. 2.
figure 2

Liver shapes generated form (a) the proposed stSSM and (b) the stSSM without temporal regularization (\(\gamma =0\)). The red numerals on each row show the accumulated value of the surface distance in [mm] between the neighboring time points.

4 Conclusion

In this paper, we proposed a method to construct an stSSM of a pediatric liver. The temporal change of the average shape is calculated using kernel regression with an adaptive bandwidth, and the time-dependent shape variation is learned using PCA with a regularization term to improve the temporal consistency of the temporal variation. The proposed model achieved a generalization of 0.7741 and specificity of 0.7755, which are superior to the stSSM without temporal regularization and the non-spatiotemporal SSM.

In the future, we will collect more samples including longitudinal data for further improvement and evaluation of the performance of stSSM. We also plan to develop computer-aided diagnosis applications including automated organ segmentation and assessment of the degree of abnormality for pediatrics.