Fetal Pose Estimation in Volumetric MRI Using a 3D Convolution Neural Network

Xu, Junshen; Zhang, Molin; Turk, Esra Abaci; Zhang, Larry; Grant, P. Ellen; Ying, Kui; Golland, Polina; Adalsteinsson, Elfar

doi:10.1007/978-3-030-32251-9_44

Junshen Xu¹⁶,
Molin Zhang¹⁷,
Esra Abaci Turk¹⁸,
Larry Zhang¹⁹,
P. Ellen Grant^18,20,
Kui Ying¹⁷,
Polina Golland^16,19 &
…
Elfar Adalsteinsson^16,21

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11767))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

9889 Accesses
17 Citations

Abstract

The performance and diagnostic utility of magnetic resonance imaging (MRI) in pregnancy is fundamentally constrained by fetal motion. Motion of the fetus, which is unpredictable and rapid on the scale of conventional imaging times, limits the set of viable acquisition techniques to single-shot imaging with severe compromises in signal-to-noise ratio and diagnostic contrast, and frequently results in unacceptable image quality. Surprisingly little is known about the characteristics of fetal motion during MRI and here we propose and demonstrate methods that exploit a growing repository of MRI observations of the gravid abdomen that are acquired at low spatial resolution but relatively high temporal resolution and over long durations (10–30 min). We estimate fetal pose per frame in MRI volumes of the pregnant abdomen via deep learning algorithms that detect key fetal landmarks. Evaluation of the proposed method shows that our framework achieves quantitatively an average error of 4.47 mm and 96.4% accuracy (with error less than 10 mm). Fetal pose estimation in MRI time series yields novel means of quantifying fetal movements in health and disease, and enables the learning of kinematic models that may enhance prospective mitigation of fetal motion artifacts during MRI acquisition.

J. Xu and M. Zhang—Equal contribution.

You have full access to this open access chapter, Download conference paper PDF

Automatic Disentanglement of Motion in Fetal Low Field MRI Scans

Anatomy-Guided Convolutional Neural Network for Motion Correction in Fetal Brain MRI

3D Fetal Pose Estimation with Adaptive Variance and Conditional Generative Adversarial Network

Keywords

1 Introduction

Estimation of fetal pose from volumetric MRI in pregnancy has applications that include motion tracking and prospective artifact mitigation during diagnostic imaging, retrospective analysis and evaluation of movement by the fetus, as well as the establishment of kinematic models of fetal movement during MRI. Prior work in fetal motion includes methods that rely on simple indices for fetal motion analysis and quantification, such as the angle of the fetal body axes with respect to the maternal body [1] and maternal perception of fetal movements [2].

Although pose estimation for the human (adult) body is an established domain in computer vision [3], to the best of our knowledge, no work has demonstrated fetal pose estimation over time in medical images by MRI. In contrast to human pose estimation from 2D photography, in fetal pose estimation we need to predict 3D pose from dense volumetric data, which increases the computational burden. Further complicating the task is the variable orientation of the fetus within the mother, rapid growth and change in fetal features over gestational age, and poor-quality observations of ground truth pose.

In pose estimation, handcrafted features such as graphical models and tree-based methods typically suffer from low accuracy and low processing speed while recent developments in deep learning have demonstrated great success in computer vision with acceleration by GPUs and the capability to learn high-level features from data. Consequently, deep convolution neural networks have also found their way into human pose estimation and achieved state-of-the-art results.

In an ongoing study of placental function by EPI BOLD imaging time series (see Fig. 1(a)), we have built an archive of over 70 subjects, each with 200–500 time frames of EPI volumes, imaged continuously over 10–30 min observation intervals and resulting in over 18,000 EPI volumes. By visual inspection, the fetal pose can be inferred from these data but manual labeling of keypoints for pose estimation (see Fig. 1(b)) across these volumes is prohibitive and here we propose a method based on deep neural networks to identify fetal key points.

We propose, demonstrate, and characterize the performance of a two-stage framework for fetal pose estimation in 3D MRI using deep learning, where we first generate heatmaps for each fetal keypoint using a convolution network and then infer fetal pose from heat maps using a Markov Random Field (MRF) that exploit anatomically rational information about connections between keypoints. Evaluation of performance shows that the proposed method achieves a mean error of 4.47 mm and a percentage of correct detection of 96.4%. Further, computation time of our pipeline is less than 1 s/volume, which potentially enables low-latency tracking of fetal pose during diagnostic MRI in pregnancy.

2 Methods

2.1 Pose Estimation Framework

Exploring the idea of heatmap prediction in human pose estimation [3], here we propose a two-stage framework for fetal pose estimation in 3D MRI using deep learning (see Fig. 2). In the first stage, a CNN is used to generate heatmaps from input MR volume, which produce per-pixel likelihoods for keypoints on the fetal skeleton. However, the generated heatmap may have multiple local maxima and simply using max activating location as prediction may lead to low accuracy.

To address this problem, a second stage is proposed to infer location from estimated heatmaps, exploiting the constraints of fetal pose to refine the results. We model the fetal pose as a MRF, where each keypoint of fetus is represented by a node in the graph and the states are the plausible locations of the keypoint. The final prediction is generated by performing inference on this MRF.

The following subsections describe the proposed framework in detail.

2.2 Heatmap Prediction Using CNN

Inspired by the successful application of hourglass networks in human pose estimation [3], we propose a 3D hourglass network for heatmap prediction of fetal keypoints. The overall architecture of the proposed network is shown in Fig. 3. The network is based on the encoder-decoder structure which is motivated by the idea of capturing multi-scale information. In pose estimation, while local evidence, e.g., local contrast, is important for identification of keypoint, global information can help resolve ambiguity, such as fetus’ orientation and relative position of other joints or body parts. In each scale of the network, resblocks with 3D convolution layers are used to extract features. To recover loss of high resolution information in downscale-upscale structure, skipped connections with element-wise addition are adopted to connect symmetric scales.

The CNN tries to learn a mapping from MR images to target heatmaps, which is generated by placing a Gaussian distribution with $\sigma =2$ on the ground-truth position and stacking together. So the output heatmaps will be of the same spatial dimensions but have J channels, where J is the number of keypoints need to predict. The loss function used for training is the mean-squared error (MSE) between the predicted heatmap and target heatmap. Instead of using the whole volume, 3D patches with size of $64\times 64\times 64$ are used as input for training. This strategy can reduce GPU memory usage, enabling mini-batch training. Since the network is fully convolutional, in inference, the whole 3D MR volumes are fed into the network to generate heatmap of full scale.

2.3 Location Estimation from Heatmap

Given the output heatmap from CNN, the second stage of the pose estimation framework is to estimate location of each keypoint. Let $x_i$ and $H_i$ be the location and heatmap of the i-th keypoint, $i=1,...,J$. Let $x=(x_1,...,x_J)$. Then one simple idea to infer keypoint positions from heatmaps is taking the max activating location of each heatmap However, this method handles each keypoint independently and does not make use of the connection between keypoints, e.g., the distance between two joints should be a constant if they are connected by bones. To exploiting these connections, we model the fetal pose as a MRF, where each keypoint correspond to a node in the graph and connections of keypoints are represented as edges in the graph. The states $\mathcal {S}_i=\{x_i^{(1)}, ..., x_i^{(L)}\}$ for node i is the top-L local maxima in heatmap i. Our prediction of fetal pose would be a particular configuration of the MRF, i.e., $\hat{x}\in \mathcal {S}_1\times \cdots \times \mathcal {S}_J$. Each configuration is assigned an energy, E(x), defined as

$$\begin{aligned} E(x)= \sum _{i=1}^J \varphi _i(x_i) + \sum _{(i,j)\in B}\phi _{i,j}(x_i, x_j) \end{aligned}$$

(1)

where B is the set of connections. A low energy of a configuration implies high probability. Therefore, the inference is equivalent to finding the configuration with lowest energy

Since the heatmap can be considered as a surrogate for the probability distribution of the corresponding keypoint, the unary term in energy function F can be modeled as

$$\begin{aligned} \varphi _i(x_i)=-\log H_i(x_i) \end{aligned}$$

(2)

As for the pairwise term, we define $\phi _{i,j}$ as a quadratic function of $||x_i-x_j||_2$, the distance between keypoint i and j.

$$\begin{aligned} \phi _{i,j}(x_i, x_j)=-\frac{\alpha (||x_i-x_j||_2/r_t-\mu _{ij})^2}{\sigma _{ij}^2}, \end{aligned}$$

(3)

where $r_t$ is the mean bone length at gestational age t, so that $||x_i-x_j||_2/r_t$ can be regarded as the distance of two keypoints normalized by gestational age. $\mu _{ij}$ and $\sigma _{ij}^2$ are the mean and variance of the normalized distance, which are estimated from training data. $\alpha $ is the regularization weight. The optimization problem is solved by a belief propagation algorithm [4].

3 Experiments and Results

3.1 Dataset

The data for this study consist of volumetric MRI time series from imaging of 70 mothers pregnant with singletons at a gestational age ranging from 25 to 35 weeks. MRIs were acquired on a 3T Skyra scanner (Siemens Healthcare, Erlangen, Germany). Multislice, single-shot, gradient echo EPI sequence was used for acquisitions with in-plane resolution of $3\times 3$ mm$^2$, slice thickness of 3 mm, mean matrix size = $120\times 120\times 80$; TR = 5–8 s, TE = 32–38 ms, FA = 90$^{\circ }$. Each subject was scanned for 10 to 30 min.

Similar to the task of adult human pose estimation, we model the pose of a fetus with a set of keypoints. We chose fifteen keypoints (ankles, knees, hips, bladder, shoulders, elbows, wrists and eyes) to capture pose and labeled manually, with a representative example shown in Fig. 1(b). These fifteen landmarks were selected as keypoints as they capture gross fetal anatomy that is critical in subsequent motion analysis, and they presented with adequate image contrast to be relatively robustly observed in the MR volumes, thus mitigating the error and noise in labelling. In total, 1705 MR volumes were labelled, 1028(${\sim }60\%$) for training, 240(${\sim }15\%$) for validation and 437(${\sim }25\%$) for testing, where the testing set consists of subjects different from training and validation sets.

In order to improve the generalization capacity and avoid overfitting, several data augmentation techniques were used, including intensity scaling, 3D rotation and flipping.

3.2 Experiments Setup

All experiments were performed on a server with an Intel Xeon E5-1650 CPU, 128 GB RAM and a NVIDIA TITAN X GPU. Neural networks were implemented with TensorFlow and for optimization we use Adam with an initial learning rate of $5\times 10^{-3}$, weight decay of $1\times 10^{-4}$ and the restart strategy [5]. The networks are trained for 200 epochs. For the second stage, we set $L=3$ and $\alpha =1$.

3.3 Results

In this section, we evaluate the proposed pipeline for fetal pose estimation. First, we evaluate the proposed 3D hourglass network (HG) with max activating location of the heatmap as final prediction. For comparison, 3D UNet [6] is used in our experiment, which has been used for heatmap regression [7]. Finally, we examine the whole pipeline by combine the CNN-based heatmap regression and MRF. These models are denoted as UNet-M and HG-M respectively.

Several metrics are used for evaluation: (a) Percentage of Correct Keypoint (PCK), where a detected keypoint is considered correct if the distance between the predicted and the true keypoint is within a certain threshold, (b) mean error (in mm), i.e., the mean distance between the predicted and the ground-truth keypoint, and (c) median of error.

Table 1. Mean and median of error of different models.

Full size table

Table 2. Computation time and number of parameters of different networks.

Full size table

Figure 4 shows PCK with two threshold, 5 mm (1.67 pixel) and 10 mm (3.33 pixel) while the mean and median of error of different models are illustrated in Table 1. Applying the proposed pipeline, 96.4% of the keypoints are located correctly (with error < 10 mm) and the mean distance between predicted and ground-truth keypoints is 4.47 mm (1.5 pixel). Besides, we see that, in average, the proposed 3D hourglass network has similar performance compared to 3D UNet. However, as illustrated in Table 2, the number of parameters of UNet is 6 times as large as that of hourglass network, indicating that the proposed network is more compact and efficient. The main reason is that the hourglass network use elementwise sum instead of concatenate in skip connection and fix the number of channels across different scales. We also notice that the second stage Markov network refinement improves the performance upon CNN heatmap regression, in terms of PCK as well as mean error. As illustrated in Fig. 5(b), fetal pose estimation based on max activating location of heatmap may result in irrational prediction. Such error is corrected in the MRF refinement by making a trade-off between prior information of keypoint connections and heatmaps generated by the CNN. As for computation time, the proposed 3D hourglass network runs at a speed of 225 ms/volume on a GPU and solving the optimization problem for inferring keypoint locations from heatmaps takes 290 ms/volume on CPU. Therefore, the end-to-end processing time of the whole pipeline is less than 1 s/volume and therefore shorter than the temporal resolution in the current fetal MR protocol, which potentially enables low latency tracking of fetal pose in fetal MR imaging.

4 Conclusions

In this work, we proposed a two-stage deep learning framework for fetal pose estimation in 3D MRI. The proposed method achieves mean error of 4.47 mm ($\sim $1.5 pixels) and percentage of correct detection of 96.4%, which indicates that deep neural networks are able to identify key features for fetal pose estimation from time frames in low-resolution, volumetric EPI data from pregnant mothers. Further, the total processing time of the proposed framework is less than 1 s, potentially enabling low latency tracking of fetal pose in fetal MR imaging. Limitations of the current method include a pipeline that was only trained on singleton pregnancies. Also, the current pose detection was performed on each time frame in isolation without utilizing any form of temporal correlations in the MR series. In future work the proposed framework could be extended to work with multiplet pregnancies as well as exploit temporal correlations across volumes in a time sequence.

Overall, the proposed pipeline could be deployed for fetal motion estimation during MR scanning of pregnant mothers with applications to fetal health and disease, establishment of fetal kinetic motion models, and prospective motion correction with slice-prescription updates for more robust diagnostic fetal and maternal MRI.

References

Biglari, H., Sameni, R.: Fetal motion estimation from noninvasive cardiac signal recordings. Physiol. Meas. 37(11), 2003 (2016)
Article Google Scholar
Heazell, A.P., Frøen, J.: Methods of fetal movement counting and the detection of fetal compromise. J. Obstet. Gynaecol. 28(2), 147–154 (2008)
Article Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Schmidt, M.: UGM: Matlab code for undirected graphical models (2012). http://www.di.ens.fr/mschmidt/Software/UGM.html
Loshchilov, I., Hutter, F.: Fixing weight decay regularization in adam. arXiv preprint arXiv:1711.05101 (2017)
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 424–432. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_49
Chapter Google Scholar
Payer, C., Štern, D., Bischof, H., Urschler, M.: Regressing heatmaps for multiple landmark localization using CNNs. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 230–238. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_27
Chapter Google Scholar

Download references

Acknowledgements

This research was supported by NIH U01HD087211, NIH R01EB01733 and NIH NIBIB NAC P41EB015902.

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
Junshen Xu, Polina Golland & Elfar Adalsteinsson
Department of Engineering Physics, Tsinghua University, Beijing, China
Molin Zhang & Kui Ying
Fetal-Neonatal Neuroimaging and Developmental Science Center, Boston Children’s Hospital, Boston, MA, USA
Esra Abaci Turk & P. Ellen Grant
Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
Larry Zhang & Polina Golland
Harvard Medical School, Boston, MA, USA
P. Ellen Grant
Institute for Medical Engineering and Science, MIT, Cambridge, MA, USA
Elfar Adalsteinsson

Authors

Junshen Xu
View author publications
You can also search for this author in PubMed Google Scholar
Molin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Esra Abaci Turk
View author publications
You can also search for this author in PubMed Google Scholar
Larry Zhang
View author publications
You can also search for this author in PubMed Google Scholar
P. Ellen Grant
View author publications
You can also search for this author in PubMed Google Scholar
Kui Ying
View author publications
You can also search for this author in PubMed Google Scholar
Polina Golland
View author publications
You can also search for this author in PubMed Google Scholar
Elfar Adalsteinsson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junshen Xu .

Editor information

Editors and Affiliations

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Dinggang Shen
University of Georgia, Athens, GA, USA
Tianming Liu
Western University, London, ON, Canada
Terry M. Peters
Yale University, New Haven, CT, USA
Lawrence H. Staib
University of Strasbourg, Illkirch, France
Caroline Essert
United Imaging Intelligence, Shanghai, China
Sean Zhou
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Pew-Thian Yap
Western University, London, ON, Canada
Ali Khan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, J. et al. (2019). Fetal Pose Estimation in Volumetric MRI Using a 3D Convolution Neural Network. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11767. Springer, Cham. https://doi.org/10.1007/978-3-030-32251-9_44

Download citation

DOI: https://doi.org/10.1007/978-3-030-32251-9_44
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32250-2
Online ISBN: 978-3-030-32251-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)