Abstract
Accurate segmentation of the right ventricle (RV) in cardiac magnetic resonance (CMR) images is crucial for ventricular structure and function assessment. However, due to its variable anatomy and ill-defined borders, RV segmentation remains an open problem. While recent advances in deep learning show great promise in tackling these challenges, such methods are typically developed on homogeneous data-sets, not reflecting realistic clinical variation in image acquisition and pathology. In this work, we develop a model, aimed at segmenting all three cardiac structures in a multi-center, multi-disease and multi-view setting, using data provided by the M&Ms-2 challenge. We propose a pipeline addressing various aspects of segmenting heterogeneous data, consisting of heart region detection, augmentation through image synthesis and multi-fusion segmentation. Our extensive experiments demonstrate the importance of different elements of the pipeline, achieving competitive results for RV segmentation in both short-axis and long-axis MR images.
Y. Al Khalil and S. Amirrajab—Contributed equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Abbasi-Sureshjani, S., Amirrajab, S., Lorenz, C., Weese, J., Pluim, J., Breeuwer, M.: 4D semantic cardiac magnetic resonance image synthesis on XCAT anatomical model. In: Medical Imaging with Deep Learning, pp. 6–18. PMLR (2020)
Amirrajab, S., et al.: XCAT-GAN for synthesizing 3D consistent labeled cardiac MR images on anatomically variable XCAT phantoms. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 128–137 (2020)
Attili, A.K., Schuster, A., Nagel, E., Reiber, J.H., van der Geest, R.J.: Quantification in cardiac MRI: advances in image acquisition and processing. Int. J. Cardiovasc. Imaging 26(1), 27–40 (2010)
Avendi, M.R., Kheradvar, A., Jafarkhani, H.: Automatic segmentation of the right ventricle from cardiac MRI using a learning-based approach. Magn. Reson. Med. 78(6), 2439–2448 (2017)
Bai, W., et al.: A probabilistic patch-based label fusion model for multi-atlas segmentation with registration refinement: application to cardiac mr images. IEEE Trans. Med. Imaging 32(7), 1302–1315 (2013)
Campello, V.M., et al.: Multi-centre, multi-vendor and multi-disease cardiac segmentation: the M&MS challenge. IEEE Trans. Med. Imaging 40(12), 3543–3554 (2021)
Caudron, J., Fares, J., Vivier, P.H., Lefebvre, V., Petitjean, C., Dacher, J.N.: Diagnostic accuracy and variability of three semi-quantitative methods for assessing right ventricular systolic function from cardiac mri in patients with acquired heart disease. Eur. Radiol. 21(10), 2111–2120 (2011)
Chen, C., et al.: Deep learning for cardiac image segmentation: a review. Front. Cardiovasc. Med. 7, 25 (2020)
Dolz, J., Desrosiers, C., Ayed, I.B.: IVD-net: intervertebral disc localization and segmentation in MRI with a multi-modal UNet. In: International Workshop and Challenge on Computational Methods and Clinical Applications for Spine Imaging, pp. 130–143 (2018)
Grosgeorge, D., Petitjean, C., Caudron, J., Fares, J., Dacher, J.N.: Automatic cardiac ventricle segmentation in MR images: a validation study. Int. J. Comput. Assist. Radiol. Surg. 6(5), 573–581 (2011)
Grosgeorge, D., Petitjean, C., Dacher, J.N., Ruan, S.: Graph cut segmentation with a statistical shape model in cardiac MRI. Comput. Vis. Image Underst. 117(9), 1027–1035 (2013)
Haddad, F., Hunt, S.A., Rosenthal, D.N., Murphy, D.J.: Right ventricular function in cardiovascular disease, part i: anatomy, physiology, aging, and functional assessment of the right ventricle. Circulation 117(11), 1436–1448 (2008)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hosny, A., Parmar, C., Quackenbush, J., Schwartz, L.H., Aerts, H.J.: Artificial intelligence in radiology. Nat. Rev. Cancer 18(8), 500–510 (2018)
Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18(2), 203–211 (2021)
Li, B., Que, D.: Medical images denoising based on total variation algorithm. Procedia Environ. Sci. 8, 227–234 (2011)
Marchesseau, S., Ho, J.X., Totman, J.J.: Influence of the short-axis cine acquisition protocol on the cardiac function evaluation: a reproducibility study. Eur. J. Radiol. Open 3, 60–66 (2016)
Martin-Isla, C., et al.: Image-based cardiac diagnosis with machine learning: a review. Front. Cardiovasc. Med. 7, 1 (2020)
Nyúl, L.G., Udupa, J.K., Zhang, X.: New variants of a method of MRI scale standardization. IEEE Trans. Med. Imaging 19(2), 143–150 (2000)
Ou, Y., Doshi, J., Erus, G., Davatzikos, C.: Multi-atlas segmentation of the cardiac MR right ventricle. In: Proceedings of 3D Cardiovascular Imaging: A MICCAI Segmentation Challenge (2012)
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)
Petitjean, C., Zuluaga, M.A., et al.: Right ventricle segmentation from cardiac MRI: a collation study. Med. Image Anal. 19(1), 187–202 (2015)
Ringenberg, J., Deo, M., Devabhaktuni, V., Berenfeld, O., Boyers, P., Gold, J.: Fast, accurate, and fully automatic segmentation of the right ventricle in short-axis cardiac MRI. Comput. Med. Imaging Graph. 38(3), 190–201 (2014)
Rumsfeld, J.S., Joynt, K.E., Maddox, T.M.: Big data analytics to improve cardiovascular care: promise and challenges. Nat. Rev. Cardiol. 13(6), 350 (2016)
Scannell, C.M., et al.: Deep-learning-based preprocessing for quantitative myocardial perfusion MRI. J. Magn. Reson. Imaging 51(6), 1689–1696 (2020)
Shameer, K., Johnson, K.W., Glicksberg, B.S., Dudley, J.T., Sengupta, P.P.: ML in cardiovascular medicine: are we there yet? Heart 104(14), 1156–1164 (2018)
Simon, M.A.: Assessment and treatment of right ventricular failure. Nat. Rev. Cardiol. 10(4), 204–218 (2013)
Wang, C.W., Peng, C.W., Chen, H.C.: A simple and fully automatic right ventricle segmentation method for 4-dimensional cardiac MR images. In: Proceedings of MICCAI RV Segmentation Challenge (2012)
Yan, W., Huang, L., Xia, L., et al.: MRI manufacturer shift and adaptation: increasing the generalizability of deep learning segmentation for MR images acquired with different scanners. Radiol. Artif. Intell. 2(4), e190195 (2020)
Yilmaz, P., Wallecan, K., Kristanto, W., Aben, J.P., Moelker, A.: Evaluation of a semi-automatic right ventricle segmentation method on short-axis MR images. J. Digit. Imaging 31(5), 670–679 (2018)
Zuluaga, M.A., Cardoso, M.J., Modat, M., Ourselin, S.: Multi-atlas propagation whole heart segmentation from MRI and CTA using a local normalised correlation coefficient criterion. In: Ourselin, S., Rueckert, D., Smith, N. (eds.) FIMH 2013. LNCS, vol. 7945, pp. 174–181. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38899-6_21
Acknowledgments
This research is a part of the openGTN project, supported by the European Union in the Marie Curie Innovative Training Networks (ITN) fellowship program under project No. 764465.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix A. Pre-processing Stage
1.1 A.1. Heart Detection module
As presented in Sect. 2.2, the first stage of our pipeline is a heart region detection module, consisting of a regression-based neural network that locates and extracts the heart in both SA and LA images, similar to the approach used in [25]. Before generating the training labels, we resample all SA images to a median spatial resolution of 1.25 \(\times \) 1.25 \(\times \) 10 mm\(^{3}\) and all LA images to a spatial resolution of 1.25 \(\times \) 1.25 before cropping. We use a simple CNN designed for a regression task, where the output consists of 6 continuous values. The inputs to the network are 2D (256 \(\times \) 256) mid-cavity SA slices extracted from the training data-set and all LA slices, respectively, normalized to have the intensity values in the range of [0,1]. The outputs consist of parameters that define the bounding box, namely x and y directions of the center of the initialized ROI and its lower left corner, as well as the scaling factors for the width and height of the initial ROI.
The CNN consists of five convolutional layers, followed by two fully-connected layers with a linear activation. Each convolutional layer uses 3 \(\times \) 3 kernels, followed by a 2 \(\times \) 2 max-pooling layer. Batch normalization and leaky ReLU activations are used in each layer, except for the output. Dropout with the probability of 0.5 is used in the fully connected layers. The network is trained for 2000 epochs with a batch size of 32 and early stopping (assessed from the validation accuracy), by minimizing the mean squared error between the computed transformation and the actual transformation (estimated from the ground-truth) using the Adam optimizer. We start with an initial learning rate of 0.001 but decrease it by a factor of 0.5 every 250 epochs. All image dimensions and scaling/displacement parameters are normalized in a way to generate translations that are in the range from −1 to 1.
After prediction, all the parameters are de-normalized to reflect the original image scale. On-the-fly data augmentation is applied to the training images, consisting of random translation, rotation, scaling, vertical and horizontal flips, contrast augmentation and addition of noise. At inference time, we again use mid-cavity slices from the SA test images to obtain the adjustment parameters of the ROI (not needed for LA). The predicted bounding boxes on mid-cavity slices of SA images are then propagated through the whole 3D volume, from which these slices were extracted. This procedure is not applied for LA images, where direct detection is possible (both ED and ES LA images consist of a single slice only). The obtained cropped SA and LA images using the predicted bounding box are post-processed to be of the size 128 \(\times \) 128 voxels and \(176 \times 176\) voxels, respectively. These images are then used for training the cardiac cavity segmentation and synthesis networks.
1.2 A.2. Appearance Transformations for Targeting Variation in Contrast and Intensity
One of the main challenges of deploying a segmentation algorithm on heterogeneous data is its performance in the presence of extensive contrast and intensity variations. By exploring the provided training and validation sets, we observe that not only the data acquired from different vendors varies in contrast, but that the presence of pathology largely influences proper tissue visibility and often occludes tissue boundaries. Applying image appearance transformations can help with improving both the contrast and tissue visibility, as well as put more emphasis on tissue shape, rather than appearance. To achieve this, we select a set of six transformations per image, where each is fed into a separate encoding path during the training of the late fusion model, namely:
-
1.
Histogram standardization: We standardize the intensities of images to those representative of each scanner vendor, by utilizing the algorithm in [19], which detects the landmarks on image histograms in the training set and averages them to form a standard landmark set per vendor. When a new image is acquired, the detected landmarks of its histogram are then matched to the previously computed standard positions by linear interpolation of intensities between the landmarks. A similar approach is applied at inference time using landmarks calculated from the training data. Thus, for each image, we generate its three counterparts, standardized to the landmarks extracted from GE, Siemens and Philips-acquired images.
-
2.
Edge preserving filtering: To emphasize the shape of the heart cavities and discard high frequency features, we apply total variation filtering (TVF) on the original input image. TVF is typically used for denoising and produces images with flat domain separated by enhanced edges [16].
-
3.
Solarization and posterization: Solarization can be defined as “partial” inversion of light and dark intensity values, with the total solarization being the negative of the image. Posterization retains the general appearance of the image, but gradual transitions are replaced by abrupt changes in shading from one region to another. This emphasizes edges, flattens the image, and is typically used for contour tracing.
-
4.
Laplacian filter: The Laplacian of an image highlights regions of rapid intensity change and is therefore often used for edge detection.
Appendix B. Image Synthesis Models
Two identical image synthesis models are trained using LA and SA cardiac MR images. To augment and balance the data using these trained synthesis models, the following strategies are devised;
-
i)
For each vendor-specific subset, the outlier cases are identified based on the end-diastolic or end-systolic volume for the RV calculated using the ground truth label of the SA images. These outlier cases, separated from the rest of the population, are used for image synthesis by applying random label deformations. For balancing the ratio, we apply different number of deformations in a way that we eventually create 1000 synthesized cases for each vendor including 50% outliers and 50% the rest of cases.
-
ii)
For each subject, the ratio of the number of mid-ventricular and apical slices to the number of basal slices is not balanced in the SA stacks; there are typically 2–3 basal slices compared to 6–8 mid-ventricular and apical slices. The basal slices may not be frequently seen by the segmentation network during training compared to other slices. This could account for network failure on these challenging slices. To increase the occurrence of these examples, we utilize the labels of three most basal slices of all cases and randomly deform them 10 times for image synthesis.
Appendix C. Cardiac Cavity Segmentation Architecture and Training Procedure
The architecture of the late fusion U-Net segmentation model aims at learning separate convolutional encoder paths per each transformed image, whose features are fused at their higher layers (or the bottleneck). Here, we assume that higher-level representations from different transformations of each image are more complementary to each other, while containing distinctive features that aid the segmentation process. Each encoding path consists of five convolutional blocks, with four max-pooling layers. Each convolutional block consists of 3 \(\times \) 3 kernel convolutional layers, batch normalization and leaky ReLU activation. We apply batch normalization to improve regularization and help the network be less susceptible to noise and intensity variation. Moreover, we apply dropout regularization, with a rate of 0.5, after each concatenating operation to further avoid over-fitting.
To increase robustness and cover a wide range of variations in terms of heart pose and size, we additionally augment the training set by applying data augmentation. Namely, we apply random vertical and horizontal flips (p = 0.5), random rotation by integer multiples of \(\frac{\pi }{2}\) (p = 0.5), random scaling with a scale factor s \(\in \) [0.8, 1.2] (p = 0.2), random translations (p = 0.3) and mirroring (p = 0.5). All augmentations are applied on the fly during training. At inference time, besides normalization and in-plane re-sampling, we apply a set of six transformations to generate six images at the input to the model. After pre-processing, each encoding path is fed with batches of 144 128 \(\times \) 128 images for training for the SA segmentation model and batches of 64 256 \(\times \) 256 images for the LA model. We use a validation set to track the training progress and identify overfitting, where the same augmentation approach is applied to the validation set and the mean Dice score is calculated per each epoch. To train the network, we use a weighted sum of the categorical cross-entropy and Dice loss. We use Adam for optimization, with an initial learning rate 10\(^{-4}\) and a weight decay of 3 \(\cdot \) e\(^{-5}\). During training, the learning rate is reduced by a factor of 5 if the validation loss does not improve by at least 5 \(\cdot \) 10\(^{-3}\) for 50 epochs. We apply early stopping on the validation set to avoid overfitting and select the model with the highest accuracy. We train each model (LA and SA) using a five-fold cross-validation on the training cases and use them as an ensemble to predict on the validation or testing set. The training of all models runs for 1000 epochs.
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Al Khalil, Y., Amirrajab, S., Pluim, J., Breeuwer, M. (2022). Late Fusion U-Net with GAN-Based Augmentation for Generalizable Cardiac MRI Segmentation. In: Puyol Antón, E., et al. Statistical Atlases and Computational Models of the Heart. Multi-Disease, Multi-View, and Multi-Center Right Ventricular Segmentation in Cardiac MRI Challenge. STACOM 2021. Lecture Notes in Computer Science(), vol 13131. Springer, Cham. https://doi.org/10.1007/978-3-030-93722-5_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-93722-5_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93721-8
Online ISBN: 978-3-030-93722-5
eBook Packages: Computer ScienceComputer Science (R0)