Keywords

1 Introduction

The functions of the placenta affect the fetal birth weight, growth, prematurity, and neuro-development since it controls the transmission of nutrients from the maternal to the fetal circulatory system. Recent work [8] has shown that magnetic resonance imaging (MRI) can be used for the evaluation of the placenta during both normal and high-risk pregnancies. Particularly, quantitative measurements such as placental volume and surface attachment to the uterine wall, are required for identifying abnormalities. In addition, recording the structural appearance (e.g., placental cotyledons and shape) is essential for clinical qualitative analysis. Moreover, the placenta is usually examined after birth, on a flat surface providing a standard representation for obstetricians. Flat cutting planes, as common in radiology, show only a small part of the placenta. A 3D visualization is considered useful in particular for cases that require preoperative planning or surgical navigation (e.g. treatment of twin-to-twin transfusion syndrome). Hence, fully automatic 3D segmentation, correction of motion artifacts, and visualization is highly desirable for an efficient pre-natal examination of the placenta in the clinical practice.

Fast MRI acquisition techniques (single shot fast spin echo – ssFSE) allow acquiring single 2D images of the moving uterus and fetus fast enough so that motion does not affect the image quality. However, 3D data acquisition and subsequent automatic segmentation is challenging because maternal respiratory motion and fetal movements displace the overall anatomy, which causes motion artifacts between individual slices as shown in Fig. 1. Furthermore, a high variability of the placenta’s position, orientation, thickness, shape and appearance inhibits conventional image analysis approaches to be successful.

Fig. 1.
figure 1

Three orthogonal 2D planes from a motion corrupted 3D stack of slices showing a delineated placenta. The native scan orientation (a) shows no motion artifacts, while (b) and (c) do.

Related work: To the best of our knowledge, fully automatic segmentation of the placenta from MRI has not been investigated before. Most previous work in fetal MRI was focused on brain segmentation [2] and very recently has been extended to localize other fetal organs [6]. These methods rely on engineering visual features for training a classifier such as random forests. Stevenson et al. [9] present a semi-automatic approach for measuring the placental volume from motion free 3D ultrasound with a random walker (RW) algorithm. Their method shows a good inter-observer reproducibility but requires extensive user interaction and several minutes per segmentation. Even though ultrasound is fast enough to acquire a motion free volume, the lack of structural information and weak tissue gradients make it only useful for volume measurements. Wang et al. [12] present an interactive method for the segmentation of the placenta from MR images, which requires user interaction to initialize the localization of the placenta. Their approach performs well on a small cohort of six subjects but shows a user-dependent variability in segmentation accuracy.

Contribution: In this paper we propose for the first time a fully automatic segmentation framework for the placenta from motion corrupted fetal MRI. The proposed framework adopts convolutional neural networks (CNNs) as a strong classifier for image segmentation followed by a conditional random field (CRF) for refinement. Our approach scales well to real clinical applications. We propose how to use the resulting placental mask as initialization for slice-to-volume registration (SVR) techniques to compensate for motion artifacts. We also show how the resulting reconstructed volume can be used to provide a novel standardized view into the placental structures by applying shape skeleton extraction and curved planar reformation for shape abstraction.

2 Method

The proposed approach combines a 3D multi-scale CNN architecture for segmentation with a 3D dense CRF for segmentation refinement. This approach can be extended to compensate for motion and to provide a clinically useful visualization. Figure 2 shows an overview of the proposed framework.

Fig. 2.
figure 2

The proposed framework for automatic placenta segmentation with extensions for motion correction and visualization.

Placenta segmentation: We adopt a 3D deep multi-scale CNN architecture [4] that is 11-layers deep and consists of two pathways to segment the placenta from the whole uterus. This multi-scale architecture has the advantage of capturing larger 3D contextual information, which is essential for detecting highly variable organs. Both pathways are complementary as the main pathway extracts local features, whereas the second one extracts larger contextual features. Multi-scale features are integrated efficiently by down-sampling the input image and processing the two pathways in parallel. In order to deal with the variations of the placenta’s appearance, we apply data augmentation for training by flipping the image around the main 3D axes (maternal orientation).

Despite the fact that the multi-scale architecture can interpret contextual information, inference is subject to misclassification and errors. Hence, we apply a CRF to penalize inconsistencies of the segmentation by regularizing classification priors with the relational consistency of their neighbors. We use a 3D fully connected CRF model [4, 7] which applies a linear combination of Gaussian kernels to define the pairwise edge potentials. It is defined as \(E(\mathbf x ) = \sum _{i\in N}U(x_{i})+\sum _{i<j}V(x_{i},x_{j})\), where i and j are pixel indexes. The unary potential U is given by the probabilistic predictions of the CNN classification. Whereas the pairwise potential V is defined by

$$\begin{aligned} V(x_{i},x_{j}) = \mu (x_{i},x_{j}) \sum _{m=1}^{K} \left( \omega _{1} e^{\left( - \frac{\left| p_{i} - p_{j} \right| ^{2}}{2\theta ^2 _{\alpha }} - \frac{\left| I_{i} - I_{j} \right| ^{2}}{2\theta ^2 _{\beta }} \right) } + \omega _{2} e^{ \left( \frac{\left| p_{i} - p_{j} \right| ^{2}}{2\theta ^2 _{\gamma }} \right) } \right) , \end{aligned}$$

where I and p are intensity and position values. \(\mu (x_{i},x_{j})\) is a simple label compatibility function given by the Potts model [\(x_{i} \ne x_{j}\)]. Here, \(\omega _{1}\) controls the importance of the appearance of nearby pixels to have similar labels. \(\omega _2\) controls the size of the smoothness kernel for removing isolated regions. \(\theta _{\alpha }\), \(\theta _{\beta }\) and \(\theta _{\gamma }\) are used to adjust the degree of similarity and proximity. We have chosen the configuration parameters heuristically similar to [4]. Although this tissue classification approach is capable of segmenting the placenta robustly, the segmentation is still subject to inter-slice motion artifacts.

Placenta segmentation recovery: To tackle these motion artifacts caused by fetal and maternal movements we combine our segmentation framework with flexible motion compensation algorithm based on patch-to-volume registration (PVR) [3]. This technique requires multiple orthogonal stacks of 2D slices to provide a better reconstruction quality. It is based on splitting the input data into overlapping square patches or superpixels [1]. The motion-free 3D image is then reconstructed from the extracted patches using iterative super-resolution and 2D/3D registration steps. The motion-corrupted and misaligned patches are excluded during the reconstruction using an EM-based outliers rejection model. We extend this process to allow propagation of the placental mask to the final reconstruction through evaluating an MR specific point spread function, registration-based transformation, and the learned confidence weights.

Placenta visualization: We present an extension of our placenta segmentation pipeline based on a novel application of shape abstraction using a flexible cutting plane. It is supported by a mean-curvature flow skeleton [10] generated from the triangulated polygonal mesh of the placenta segmentation and textured similar to curved planar reformation [5], see Fig. 3. Although this part is not evaluated thoroughly, clinicians revealed that such a representation is potentially desirable since it compares well to a flattened placenta after birth.

Fig. 3.
figure 3

A native plane (a) cannot represent all structures of the placenta at once. Therefore, we use our segmentation method (b), correct the motion in this area using [3], project the placenta mask into the resulting isotropically resolved volume (c), extract the mean curvature flow skeleton [10] (black lines in (d)), use the resulting points to support a curved surface plane (e) and visualize this plane with curved planar reformation [5] (f). The plane in (f) covers only relevant areas, hence gray value mapping can be adjusted automatically to emphasis placental structures.

3 Experimental Results

Data: We test our approach on two dissimilar datasets that are different in health status, gestational ages and acquired using different scanning parameters. All scans have been ethically approved. Dataset I contains 44 MR scans of healthy fetuses at gestational age between 20–25 weeks. The data has been acquired on a Philips Achieva 1.5T, the mother lying 20\(^{\circ }\) tilt on the left side to avoid pressure on the inferior vena cava. ssFSE T2-weighted sequences are used to acquire stacks of images that are aligned to the main axes of the fetus. Usually three to six stacks are acquired for the whole womb and the placenta with a voxel size of \(1.25 \times 1.25 \times 2.50 \text {mm}\). Dataset II contains 22 MR scans of healthy fetuses and fetuses with intrauterine fetal growth restriction (IUGR) at gestational age between 20–38 weeks. The data was acquired with a 1.5T Philips MRI system using ssFSE sequences and a voxel size of \(0.8398 \times 0.8398 \times 4 \text {mm}\). Ground truth labels for both datasets have been obtained manually slice-by-slice in 2D views from the original motion-corrupted stacks by a clinical expert.

Experiments: The proposed segmentation framework is evaluated using three main metrics: Dice similarity coefficient to measure the accuracy of the segmentation, absolute volume similarity to measure the volumetric error between the segmented and the ground truth volumes, and average Hausdorff distance as a distance error metric between the segmented and the ground truth surfaces.

We evaluate in a first experiment [exp-1] the automatic segmentation of the placenta on Dataset I using a 4-fold cross validation (11 test patients and 33 training patients per fold). The main aim of this experiment is to evaluate the performance of our segmentation framework on a healthy homogeneous dataset. The results for this experiment are \(71.95\pm 19.79\,\%\) Dice, \(30.92\pm 33.68\,\%\) absolute volume difference, and \(4.94\pm 6.93mm\) average Hausdorff distance.

In a second experiment [exp-2], we train the CNN using the whole 44 subject from Dataset I and test it on the 22 subjects from Dataset II. Where datasets I and II are significantly different using different scanners and scanning parameters. In addition, the gestational age range of the fetuses in Dataset II is wider, which has a big influence on the fetal body and placenta sizes. Hence we test the performance of our framework when it is used to test data from a different environment. The results of this experiment are \(56.78\pm 21.86\,\%\) Dice, \(48.19\pm 46.96\,\%\) absolute volume difference, and \(8.41\pm 7.1mm\) average Hausdorff distance.

To resemble a realistic transfer learning application we have designed a third experiment [exp-3] using both datasets. The network is evaluated with 2-fold cross validation, 10 test subjects from Dataset II, and 44+10 training subjects from Dataset I and Dataset II. This experiment yielded a Dice accuracy of \(66.89\pm 15.35\,\%\), an absolute volume difference of \(33.05\pm 30.71\,\%\), and an average Hausdorff distance of \(5.8\pm 4.24mm\). Detailed results are shown in Fig. 4. Training one fold takes approximately 40 hours and inference can be done within 2 minutes on an Nvidia Tesla K40.

Fig. 4.
figure 4

Evaluation of the proposed method using (a) average Dice coefficient and (b) average Hausdorff distance. [exp-1] refers to 4-fold evaluation on Dataset I, [exp-2] uses Dataset I for training and Dataset II for testing, and [exp-3] mixes both datasets for training and uses unseen examples from Dataset II for testing.

Evaluation of clinical parameters: We compare our work to known values from the clinical practice. [8] shows that the average placental volume increases from 252.4 \(cm^{3}\) at 20 weeks to 1421.5 \(cm^{3}\) at 37 weeks. Figure 5 compares these values from the clinical literature [11] to our automatically measured volumes from our datasets. It shows that our approach achieves very similar volumetric results compared to both expert estimations and clinical literature. In addition, the slope parameters of our segmentation and ground truth are not significantly different with p-value 0.94. Pathological cases from Dataset II show differences to scans of healthy placentas in Fig. 5(c).

Fig. 5.
figure 5

A graph comparing automatic segmentations (our approach), the ground truth (expert) and the linear estimations [11] of the placental volumes versus their gestational ages. (a) shows that our results from the first experiment [exp-1] using Dataset I are very close to both the expert and theoretical estimations. However, the third experiment [exp-3] using healthy subjects from Dataset II shows less consistency of the segmented volumes (b) due to the large dissimilarity test data. Moreover, (c) shows more inconsistency by testing on fetuses with IUGR from [exp-3]. (fetuses with unknown gestational age were excluded)

Motion compensation: Evaluating the quality of a motion compensated reconstruction is challenging due to the absence of the motion-free ground truth data. Assuming that the 2D in-plane patches from the original 3D stacks have no motion artifacts, reconstructed patches are evaluated using these motion-free 2D patches as ground truth. The average peak signal-to-noise ratio (PSNR) is calculated for all the patches of each subject. The baseline represents the quality of patches from non-native slice orientations, which is comparable to using a single motion corrupted stack for diagnostics with arbitrary cutting planes. The baseline has low PSNR values due to the motion artifacts between the input stacks. These values increase during the reconstruction iterations as a result of reducing the motion artifacts of the segmented placenta, see Fig. 6.

Fig. 6.
figure 6

A comparisons between the 3D reconstructed placenta using superpixel-based PVR reconstructions [3] with initial superpixel size 20x20 pixels and a baseline using directly 2D slices from the input stacks for multi-planar examination. (stacks with high motion artifacts were excluded)

4 Discussion and Conclusion

We present a fully automatic segmentation framework for the human placenta from motion corrupted fetal MRI scans. We perform rigorous experiments on two different testing datasets in order to evaluate thoroughly the presented segmentation approach, which is based on a 3D deep multi-scale convolutional neural network combined with conditional random field segmentation refinement. Our experiments show that this framework can tackle motion artifacts by achieving segmentation accuracy of \(71.95\,\%\) for healthy fetuses. It is also capable of segmenting the placenta from dissimilar data by achieving segmentation accuracy of \(66.89\,\%\) for a cohort mixed with cases of intrauterine fetal growth restriction from different scanners. Moreover, we extend our framework scope to real clinical applications by compensating motion artifacts using slice to volume registration techniques, as well as providing a novel standardized view into the placental structures using skeleton extraction and curved planar reformation. In future work we will investigate the potential use of the standardized placenta views for image-based classification and automatic detection of abnormalities.