Keywords

1 Introduction

In the last few decades, lots of medical image segmentation methods have been proposed. Among these presented approaches, active shape model method has achieved state-of-the-art segmentation accuracy [5]. For achieving robustness, ASM utilizes principal component analysis-based statistical shape models to incorporate high level a priori shape knowledge of the structure to be segmented. A key step of building SSMs is to establish shape correspondence between all training shapes. So far various methods have been proposed to establish shape correspondence, and these approaches can be roughly divided into five major categories: mesh-to-volume registration, mesh-to-mesh registration, parameterization-to-parameterization registration, volume-to-volume registration, and population-based optimization. In particular, Gollmer et al. [4] proposed a evaluation of different groupwise correspondence approaches, including their re-parameterization strategies, objective functions, and optimization methods. However, a main limiting of current shape correspondence establishment approaches, is that the structure to be modeled has to be of the same topology as the sphere, besides they are much computationally intensive.

In mesh-to-volume registration based methods, a landmarked deformable surface model is fitted to all the segmented training images. After the convergence of the evolution of deformable model, the point correspondence is determined by the final landmark locations of the deformable template [5]. Kaus et al. [7] proposed a 3D elastically deformable model for the automated establishment of shape correspondence from segmented images, Their scheme can approximate and predict unseen shapes well. Shang and Dossel [9] then improved the above-mentioned method by utilizing the force equilibrium concept of deformable surface model. Shen et al. [10] presented an adaptive-focus deformable model to establish shape correspondence based on the hand-labeled 3-D brain images. An attribute vector for each landmark of the deformable template was introduced to preserve its geometric shape while evolution. Later, Zhao and Teoh [12] modified the AFDM method by exploiting a “bridge over” framework and made use of it to construct SSMs of brain ventricles. Clogenson et al. [1] raised a model-fitting based method to establish shape correspondence from manually segmented CT images. For getting accurate and robust results, we exploit an automatic shape initialization approach to derive an initial shape that has high overlap with the object of interest, such that the deformable models can then evolve more locally, and the proposed model was utilized for the construction of femur statistical shape model to demonstrate its accuracy and efficiency.

2 Method

In this section, we describe the details of the proposed statistical shape models construction process based on deformable surface model. The whole procedure is shown in Fig. 1.

Fig. 1.
figure 1

Flowchart of the proposed method.

2.1 Initialization of Deformable Surface Model

Providing with the training samples with ground truth segmentation, their simplex mesh representations are obtained firstly, which are defined as the dual of the triangular mesh derived via Marching Cubes algorithm and mesh smoothing methods. We then select the training sample that is similar to an average femur shape as the template mesh for the deformable models.

For purpose of obtaining a fine initial shape for the deformable models, the Gaussian mixture model (GMM) based point set registration method [6] was employed by aligning the template mesh with other training meshes via an affine transformation. Specifically, we depict the point sets of the template mesh as a Gaussian mixture \(f(\mathbf {x}) = \sum _{ i=1 }^{m} {\alpha _{i} \phi (\mathbf {x}\, | \, \mu _{i},\Sigma _{i})}\), the point sets of other training meshes are denoted by \(g(\mathbf {x}) = \sum _{ j=1 }^{n} {\beta _{j} \phi (\mathbf {x} \, | \, \nu _{j},\varGamma _{i})}\). An affine transformation can be indicated by a \(3 \times 3\) matrix \(\mathbf {A}\), and a translation vector \(\mathbf {t}\). For the purpose of convenience, the matrix \(\mathbf {A}\) is factorized as an orthogonal matrix \(\mathbf {Q}\) and a symmetric positive definite matrix \(\mathbf {S}\), i.e., \(\mathbf {A}=\mathbf {Q}\mathbf {S}\) [6]. Then the affine transformation can be obtained by minimizing the following \(L_2\) distance between Gaussian mixtures \(f(\mathbf {x})\) and \(g(\mathbf {x})\) [6]:

$$\begin{aligned} d_{L_2}(f,g,\mathbf {A},\mathbf {t}) = \int {(g-f_{\mathbf {A},\mathbf {t}})^2dx} = \int {(f_{\mathbf {A},\mathbf {t}}^2-2f_{\mathbf {A},\mathbf {t}}g+g^2)dx}, \end{aligned}$$
(1)

Since \(\int {g^2}\) is independent of the affine transformation, only \(\int {f_{\mathbf {A},\mathbf {t}}^2}\) and \(\int {f_{\mathbf {A},\mathbf {t}}g}\) are needed to consider. For the latter term:

$$\begin{aligned} \int {f_{\mathbf {A},\mathbf {t}}g \, dx} = \sum _{ i=1 }^{m} {\sum _{ j=1 }^{n} {\alpha _{i}\beta _{j} \phi (\mathbf {x} \, | \, \mu _{i}-\nu _{j},\mathbf {Q}(\Sigma _{i})\mathbf {Q}^T+\varGamma _{i})}}. \end{aligned}$$
(2)

Considering that the gradients associated with affine transformation have analytical solutions [6], fast gradient-based numerical optimization techniques can be deployed to minimize the objective function.

After deriving the affine transformation it was used to transform the template mesh to the space of other training meshes.

2.2 Evolution Method

The greedy algorithm was utilized as evolution method to conduct deformable 2-simplex meshes to the object of interest. Let \(V_i\) be the voxel in the volumetric image containing vertex \(P_i\). In each iteration, a \(w\times w\times w\) cubic window around \(V_i\) is searched, and the energy is computed at each voxel within the window (see Fig. 2a). The energy at vertex \(P_i\) is defined as a combination of both external and internal energy normalized within the window:

$$\begin{aligned} \begin{aligned} \varvec{E}(P_i) =&\ \alpha \varvec{E}_{int}(P_i) + \beta \varvec{E}_{ext}(P_i)\\ =&\ \alpha \left( \varvec{E}_{Tangent}(P_i) + \varvec{E}_{Normal}(P_i) \right) + \beta \varvec{E}_{VFC}(P_i), \end{aligned} \end{aligned}$$
(3)

where \(\varvec{E}_{Tangent}\) and \(\varvec{E}_{Normal}\) are internal energy, \(\varvec{E}_{VFC}\) is the VFC external energy,

\({N_{i}^{t}}\) is defined as the \(w\times w\times w\) cubic window around voxel \(V_i\). The position \({Q_{i}^{t}}\) with minimum energy within the window \({N_{i}^{t}}\) is selected as:

$$\begin{aligned} Q_{i}^{t} = \mathop {\arg \,\min }\limits _{{P_{j} \in N_{i}^{t} }} {\varvec{E}}\left( {P_{j} } \right) . \end{aligned}$$
(4)

The vertex \(P_i\) is moved only along its normal direction \(\mathbf {n}_i\), rather than directly moved to \({Q_{i}^{t}}\) as in the classical greedy algorithm (see Fig. 2b):

$$\begin{aligned} {P_{i}^{t+1}} = {P_{i}^{t}}+ \left( ({Q_{i}^{t}}-{P_{i}^{t}})\cdot \mathbf {n}_i \right) \mathbf {n}_i. \end{aligned}$$
(5)
Fig. 2.
figure 2

The greedy algorithm: (a) The energy function is calculated at vertex \(P_i\) and voxels in the \(w\times w\times w\) cubic window around \(V_i\), and the point with the smallest energy is selected as the target position of \(P_i\). (b) The vertex \(P_i\) is moved only along its normal direction \(\mathbf {n}_i\).

2.3 Internal Energy Computation

The original internal force proposed in [3] is utilized as internal energy, which is decomposed into tangential energy and normal energy:

$$\begin{aligned} \begin{aligned} \varvec{E}_{int}(P_i) = \;&\varvec{E}_{Tangent}(P_i) + \varvec{E}_{Normal}(P_i) \\ =&{\bigg \Vert \frac{1}{3}( P_{N_1(i)}+P_{N_2(i)}+P_{N_3(i)} )-F_i \bigg \Vert }^2 + {\bigg \Vert L(r_i,d_i,\tilde{\phi }_i)-L(r_i,d_i,\phi _i) \bigg \Vert }^2, \end{aligned} \end{aligned}$$
(6)

2.4 VFC External Energy Computation

Vector field convolution (VFC) field [8] is largely used to solve the problems associated with traditional external force and can conduct the active contour into thin boundary. Besides, comparing with the classical GVF external force, VFC force shows superior stability to noise and initialization, and takes less computational time [8]. The 3D VFC field \(\mathbf {v}=[u, v, w]\) is defined as the convolution of a vector field kernel \(\mathbf {k}\) with the edge map f generated from the input image I:

$$\begin{aligned} \mathbf {v}&= f \otimes {\mathbf {k}} , \end{aligned}$$
(7)

where \(\otimes \) indicates linear convolution.

Generally the edge map f for gray-level input images is defined as the gradient magnitude of the blurred image:

$$\begin{aligned} f= | \nabla [G_{\sigma }] \otimes I|, \end{aligned}$$
(8)

where \(G_{\sigma }\) is a 3D Gaussian function with standard deviation \(\sigma \), and \(\nabla \) is the gradient operator. The vector field kernel \(\mathbf {k}=[u_k, v_k, w_k]\) can be computed as follows:

$$\begin{aligned} \mathbf {k} = m {\mathbf {n}} , \end{aligned}$$
(9)

where m represents the magnitude of the vector and \(\mathbf {n}\) denotes the unit vector.

In the practical implementation, the 3D VFC field \(\mathbf {v}=[u, v, w]\) can be calculated by convolving the edge map f with each component of the vector field kernel \(\mathbf {k}=[u_k, v_k, w_k]\) [8]:

$$\begin{aligned} u&= f \otimes u_k, \end{aligned}$$
(10)
$$\begin{aligned} v&= f \otimes v_k, \end{aligned}$$
(11)
$$\begin{aligned} w&= f \otimes w_k. \end{aligned}$$
(12)

The continuous vector field kernel \(\mathbf {k}\) is approximated by a discrete matrix

$$\begin{aligned} \{\mathbf {k}(x,y,z)\, | \, x,y,z=-R,\ldots ,-1,0,1,\ldots ,R\}, \end{aligned}$$
(13)

where R is the chosen kernel radius.

To handle with the major issues of the external energy, we modified it for greedy algorithm as external potential energy, and the following enhancements was made to the original VFC method:

  1. 1.

    The magnitude of the VFC field \(\big \Vert {\mathbf {v}}\big \Vert \) is used as external energy instead of the force field itself.

  2. 2.

    The edge map is normalized before the computation of VFC energy,

Therefore, the VFC energy at vertex \(P_i\) can be defined as:

$$\begin{aligned} \varvec{E}_{VFC}(P_i) = \big \Vert {\mathbf {v}(P_i)}\big \Vert . \end{aligned}$$
(14)

When vertex \(P_i\) is within the volumetric image, its external energy can be derived through a trilinear interpolation of the external energy at its neighboring image grid points; otherwise, its VFC energy is set to be the maximal value of the external energy in the volumetric image.

2.5 Statistical Shape Model Construction

Firstly, all training shapes are spatially aligned into a common coordinate frame using the generalized Procrustes analysis. The corresponding covariance matrix is defined as:

$$\begin{aligned} \mathbf {S} = \frac{1}{K-1} \sum _{i=1}^{K} (\mathbf {x}_i - \mathbf {\bar{x}})(\mathbf {x}_i - \mathbf {\bar{x}})^T, \end{aligned}$$
(15)

where \(\mathbf {\bar{x}}\) is the mean shape vector of all subjects: \( \mathbf {\bar{x}} = \frac{1}{K} { \sum _{i=1}^{K} { \mathbf {x}_i } } \). Then the principal component analysis-based statistical shape model can be established by an eigen-decomposition on the covariance matrix \(\mathbf {S}\):

$$\begin{aligned} \mathbf {S} = \mathbf {U}\mathbf {D}\mathbf {U}^T, \end{aligned}$$
(16)

Then any valid shapes of femur structure can be approximated by a linear combination of the first c modes of variation:

$$\begin{aligned} \mathbf { x } = \mathbf {\bar{x}} + \sum _{m=1}^{c} { b_m\phi _m }, \end{aligned}$$
(17)

\(b_m\) is the shape parameter constrained to \(\left[ -3\sqrt{\lambda _m}, 3\sqrt{\lambda _m}\right] \).

3 Experimental Results

3.1 Datasets and Parameters

To evaluate the performance of our the proposed method, we applied it to construct the femur statistical shape models based on a local database. The database consists of 5 femur CT scans from 5 different patients (3 males, 2 females) with ground truth segmentation manually segmented by radiology expert. These CT scans were provided by the Weihai Municipal Hospital, and were acquired by use of a 64-row multidetector CT scanner (Brilliance 64; Philips) with resolution between 0.58 mm and 0.67 mm, and a slice thickness of 1.5 mm.

3.2 Qualitative and Quantitative Analysis

Our proposed method was compared with the spherical harmonics (spharm) shape correspondence method [11]. Considering that the spharm method requires the input shape to be of spherical topology, all holes in the training images must be filled firstly. The spharm method maps all the input shapes to a common parameter domain through an area-preserving spherical parameterization. Finally we derived the corresponded training shapes by sampling the rotated spherical parameterization via icosahedron subdivision.

Figure 3 illustrates the first principal modes of variation for the constructed femur shape model, with the variation of a specific mode between \(-3\sigma \) and \(+3\sigma \). It can be seen that the first mode accounts for the whole volume size of the femur structure.

Fig. 3.
figure 3

The first principal mode of variation for the constructed femur shape model.

In this section, by using our proposed method, two metrics (compactness and generalization ability) are used to quantitatively evaluate the quality of the constructed femur shape model.

Compactness. The compactness is defined as the cumulative variance of the Mth mode used in the shape reconstruction [2]:

$$\begin{aligned} C(M) = \sum _{m=1}^{M} {\lambda _m}, \end{aligned}$$
(18)

where \(\lambda _m\) is the mth eigenvalue. For the compactness, the smaller the value is, the better the constructed shape model.

Figure 4 shows the compactness for both spharm and our proposed method with varying number of modes of variation. For all the employed number of modes, our method achieves better compactness than the spharm method. Table 1 shows the compactness for the two compared shape prior modeling methods with 5 modes. Specifically, the compactness of our method is 1.34 mm, and spharm method’s compactness is more than three times that of our method. Therefore, the shape model constructed by our method is more compact than that of the spharm method.

Fig. 4.
figure 4

Compactness for both spharm and our proposed method.

Table 1. The compactness for the two compared shape prior modeling methods with 5 modes.

Generalization Ability. The generalization ability quantifies the ability of the constructed shape model to represent new shape instances of the same structure class. It is measured based on the training data by performing leave-one-out tests. Specifically, a shape model is built by using all but one training shape \(\mathbf {x}_i\), and then the constructed model is employed to reconstruct the excluded shape \(\mathbf {x}_i\). The approximation error is defined as the distance between the excluded shape \(\mathbf {x}_i\) and its reconstructed shape \(\mathbf {x}_{i}^{'}\). The generalization ability is the average approximation error of all the performed K tests [2]:

$$\begin{aligned} G(M) = \frac{1}{K} \sum _{i=1}^{K} { \Vert \mathbf {x}_i - \mathbf {x}_{i}^{'}(M) \Vert }^2, \end{aligned}$$
(19)

where the reconstructed shape \(\mathbf {x}_{i}^{'}(M)\) is defined as a linear combination of the first M modes of variation:

$$\begin{aligned} \mathbf {x}_{i}^{'}(M) = \mathbf {\bar{x}} + \sum _{m=1}^{M} { b_m\phi _m }. \end{aligned}$$
(20)

The generalization ability for both spharm and our proposed method is shown in Fig. 5 with different number of modes of variation. And our method shows consistently better generalization ability than the spharm method in all the used number of modes. The mean and standard deviation of the generalization ability for the two compared shape prior modeling methods are shown in Table 2 by using 5 modes. The generalization ability of our method is 0.81 mm, while the generalization ability of spharm method is nearly 50% higher than that of our method. These demonstrate that our method can reconstruct new shape instances more accurately than the spharm method.

Fig. 5.
figure 5

Generalization ability for both spharm and our proposed method.

Table 2. The mean and standard deviation of the generalization ability for the two different shape prior modeling methods with 5 modes.

Computational Time. The presented scheme was implemented in C++ and tested on a computer of GB RAM. In our experiment, an average of about 12.6 min is taken to establish shape correspondence for each shape. For comparison, the time taken by the spharm method for the same task is about 1 h in average, and it thus proved the superiority of our proposed method on computational efficiency.

4 Conclusion

In this paper, a novel mesh-to-volume registration based method was proposed for automatically establishing shape correspondence. It integrates the computational simplicity of simplex meshes, speed of the greedy algorithm, and robustness of VFC in a unified system. Through extensive experiments on 5 femur CT scans from 5 different patients, we demonstrate that the quality of the constructed femur shape models by using the proposed method is much better than that of the classical spharm method. Moreover, the proposed approach achieves much higher computational efficiency than the spharm method.