DeU-Net 2.0: Enhanced deformable U-Net for 3D cardiac cine MRI segmentation

https://doi.org/10.1016/j.media.2022.102389Get rights and content

Highlights

  • A novel architecture DeU-Net 2.0, including Temporal Deformable Aggregation Module (TDAM), Enhanced Deformable Attention Network (EDAN) and Probabilistic Noise Correction Module (PNCM), was proposed for 3D cardiac cine MRI segmentation (average Dice score > 95% on the Extended ACDC dataset).

  • TDAM takes in consecutive MR slices as inputs to enhance the feature quality of the target slice by an offset prediction network and a temporal deformable convolutional layer.

  • EDAN exploits pyramidal architecture and high flexible deformable convolutional layers for accurate and robust segmentation.

  • In EDAN, we also propose Multi-Scale Attention Module (MSAM) to exploit multi-scale self-similarity and capture the useful correspondences at different scales.

  • PNCM considers two feature vectors as the normal feature and a random variable that is drawn from a standard Gaussian distribution by a full connection layer to alleviate the impact of noisy samples.

Abstract

Automatic segmentation of cardiac magnetic resonance imaging (MRI) facilitates efficient and accurate volume measurement in clinical applications. However, due to anisotropic resolution, ambiguous borders and complicated shapes, existing methods suffer from the degradation of accuracy and robustness in cardiac MRI segmentation. In this paper, we propose an enhanced Deformable U-Net (DeU-Net) for 3D cardiac cine MRI segmentation, composed of three modules, namely Temporal Deformable Aggregation Module (TDAM), Enhanced Deformable Attention Network (EDAN), and Probabilistic Noise Correction Module (PNCM). TDAM first takes consecutive cardiac MR slices (including a target slice and its neighboring reference slices) as input, and extracts spatio-temporal information by an offset prediction network to generate fused features of the target slice. Then the fused features are also fed into EDAN that exploits several flexible deformable convolutional layers and generates clear borders of every segmentation map. A Multi-Scale Attention Module (MSAM) in EDAN is proposed to capture long range dependencies between features of different scales. Meanwhile, PNCM treats the fused features as a distribution to quantify uncertainty. Experimental results show that our DeU-Net achieves the state-of-the-art performance in terms of the commonly used evaluation metrics on the Extended ACDC dataset and competitive performance on other two datasets, validating the robustness and generalization of DeU-Net.

Introduction

Cardiovascular disease (CVD) has become one of the major causes of deaths in Europe with 3.9 million confirmed deaths (Sanz et al., 2020), according to World Health Organization (WHO). This has driven extensive interest to CVD research, which aims to recognize phenotypes, estimate disease risks, and provide patient therapeutic interventions (Blankstein, 2012, Guo, Ng, Goubran, Petersen, Piechnik, Neubauer, Wright, 2020). With the significant technological advances in digital imagery, a great number of approaches have been proposed and exploited to improve the diagnosis of CVD with various modern medical imaging techniques, such as ultrasound imaging, computed tomography (CT), magnetic resonance imaging (MRI), and Positron Emission Tomography (PET). Among them, MRI has been widely used by cardiologists as a non-invasive golden modality for the qualitative and quantitative assessment (Vick III, 2009) of cardiac structures and functions because of its excellent image quality.

The segmentation of kinetic MR images along the short axis is complicated but essential for the precise morphological and pathological analysis, diagnosis, and surgical planning of CVD. In particular, one has to delineate left ventricular endocardium (LV), myocardium (Myo), and right ventricular endocardium (RV) to calculate the volume of cavities in cardiac MRI, including end-diastolic (ED) and end-systolic (ES) phases (Peng et al., 2016). Automatic and accurate cardiac MRI segmentation, as a highly efficient and effective way, can save a lot of time to assist doctors and eliminate ambiguities from human intervention. However, it still remains unsuitable for clinical applications (Guo, Ng, Goubran, Petersen, Piechnik, Neubauer, Wright, 2020, Zhuang, 2013), due to the following characteristics of 3D cardiac MRI: (1) significant shape variations of cardiac structures; (2) inhomogeneous intensity in imaging and ambiguous borders; (3) motion/blood flow artefacts and partial volume effects; (4) complicated shapes and unbalanced sizes of cardiac structures.

Considering the great advances of deep learning in widespread vision tasks (Chang, Lan, Cheng, Wei, 2020, Dong, Yang, Fu, Tian, Zhuo, 2021, Choi, Kwon, Lee, 2019, Yan, Jiang, Shi, Zhuo, 2020), e.g., recognition, detection, and tracking, deep learning based automatic or semi-automatic segmentation methods for cardiac MRI have been proposed to achieve good performance (Ronneberger, Fischer, Brox, 2015, Oktay, Schlemper, Folgoc, Lee, Heinrich, Misawa, Mori, McDonagh, Hammerla, Kainz, et al., 2018, Zheng, Yang, Han, Zhang, Liang, Zhao, Wang, Chen, 2019, Zotti, Luo, Humbert, Lalande, Jodoin, 2017, Khened, Alex, Krishnamurthi, 2017, Isensee, Jaeger, Full, Wolf, Engelhardt, Maier-Hein, 2017, Baumgartner, Koch, Pollefeys, Konukoglu, 2017). For example, a dilated convolutional neural network (CNN) in (Wolterink et al., 2016) was trained on three orthogonal planes to capture context relations among images, and the further work was performed in (Wolterink et al., 2017). In addition, Zheng et al. (2018) employed pre/post-processing and incorporated prior knowledge into a CNN to improve segmentation consistency. For real-time 3D MRI segmentation, Wang et al. (2019a) proposed Multiscale Statistical U-Net (MSU-Net) by employing a statistical CNN. A new Independent Component Analysis U-Net (ICAU-Net) in (Wang et al., 2020) not only achieved highly precise 3D cardiac MRI segmentation, but also obtained higher throughput and lower latency than MSU-Net.

In this paper, we propose a novel deep neural network architecture, referred as Deformable U-Net (DeU-Net), to address some issues associated with 3D cardiac cine MRI segmentation. Our model contains the following three parts: Temporal Deformable Aggregation Module (TDAM), Enhanced Deformable Attention Network (EDAN), and Probabilistic Noise Correction Module (PNCM):

  • Temporal Deformable Aggregation Module (TDAM) consists of an offset prediction network and a temporal deformable convolutional layer. TDAM takes in consecutive cardiac MR slices, including a target slice and its neighboring reference slices, as inputs to predict the deformable offsets by the offset prediction network. Then, the offsets which integrate spatial and temporal information are applied to the target slice by the temporal deformable convolutional layer, generating the fused features of the target slice.

  • Enhanced Deformable Attention Network (EDAN) contains several flexible deformable convolutional layers and a Multi-Scale Attention Module (MSAM), with a pyramidal and cascading architecture. The fused features produced by TDAM are first fed into EDAN to generate features with different scales. Then features in lower scales are aligned with coarse estimations, and the deformable offsets and aligned features are propagated to higher scales for precise cardiac MRI segmentation. In addition, MSAM is proposed to embrace external statistic and capture long range dependencies between different scale features.

  • Probabilistic Noise Correction Module (PNCM) adopts a two-stream CNN with shared weights. The fused features of the target slice, as the inputs of PNCM, are modeled as a distribution characterized by its mean and variance to account for the feature uncertainty.

This paper is an extended version of our conference paper (Dong et al., 2020) (referred as DeU-Net 1.0 in this paper) with additional contributions and substantial improvements, thus named DeU-Net 2.0 (called DeU-Net in following for simplicity). The extended contributions are summarized as follows:

  • We replace deformable attention U-Net (including Deformable Global Position Attention Module) (Dong et al., 2020) by proposing a new architecture, referred to EDAN, which also contains a new module MSAM. EDAN fully exploits spatio-temporal information in fused features of the target slice by several deformable convolutional layers and guarantees precise and continuous borders of every segmentation map. The proposed MSAM investigates multi-scale feature correlation and captures the useful correspondences at different scales, resulting in better segmentation performance.

  • We propose a novel module named PNCM in which the feature embedding (mean) and uncertainty (variance), drawn from Gaussian distribution, are learned simultaneously. A significant variance produced by PNCM indicates that it is uncertain about which category the pixels should be given in the target slice. Moreover, a novel uncertainty loss is proposed to prevent model from overfitting to noisy slices. Thus, the proposed PNCM alleviates the adverse effects of noisy samples and leads to better generalization.

  • We enlarge our experimental dataset which includes not only the Extended ACDC dataset (Wang et al., 2019a) but also the ACDC dataset (Bernard et al., 2018) and the MM-WHS dataset (Zhuang et al., 2019) to validate the model generalization. The extensive experimental results quantitatively and qualitatively show that the proposed DeU-Net achieves the state-of-the-art performance on commonly used metrics on the Extended ACDC dataset and also gets competitive performance on the other two datasets.

The rest of this paper is organized as follows: In Section 2, we discuss the main components of DeU-Net. In Section 3, we present details of experiments setting. And the experimental results of our proposed architecture and compared existing approaches are shown in Section 4. We also conduct extensive experiments to demonstrate the benefits of TDAM, EDAN and PNCM on the segmentation performance in Section 5. Finally, we conclude this paper in Section 6.

Section snippets

Methods

In this section, we introduce the proposed DeU-Net for 3D cardiac cine MRI segmentation, which consists of three modules, i.e., Temporal Deformable Aggregation Module (TDAM), Enhanced Deformable Attention Network (EDAN), and Probabilistic Noise Correction Module (PNCM), as shown in Fig. 1. A sequence of consecutive cardiac MR slices, a target slice and its neighboring reference slices, is first fed into TDAM which consists of a temporal deformable convolution layer and an offset prediction

Experiments

In this section, we present the experimental results to validate the effectiveness of our proposed DeU-Net method. Note that, DeU-Net 2.0 approach is called DeU-Net in this paper, while the DeU-Net method of our conference paper (Dong et al., 2020) is referred as DeU-Net 1.0 for comparison.

Results

In this section, we present the experimental results of our proposed method on the Extended ACDC dataset (Wang et al., 2019a), compared with state-of-the-art approaches. Note that, to demonstrate the generalization ability of DeU-Net 2.0, we further conduct experiments on the ACDC (Bernard et al., 2018) and the MM-WHS (Zhuang et al., 2019) datasets.

Discussions

In this section, we further conduct ablation studies to numerically analyze the benefits of the three key modules of the proposed DeU-Net 2.0, i.e., TDAM, EDAN, and PNCM, as listed in Table 4. Note that, model (ix) represents DeU-Net 2.0.

Summary

In this paper, we propose an enhanced Deformable U-Net (DeU-Net) for 3D cardiac cine MRI segmentation, including the following three parts: Temporal Deformable Aggregation Module (TDAM), Enhanced Deformable Attention Network (EDAN), and Probabilistic Noise Correction Module (PNCM). A sequence of consecutive cardiac MR slices, a target slice and its neighboring reference slices, is first fed into TDAM, which consists of a temporal deformable convolutional layer and an offset prediction network

CRediT authorship contribution statement

Shunjie Dong: Investigation, Methodology, Software, Visualization, Writing – original draft. Zixuan Pan: Data curation, Software, Validation, Resources, Writing – review & editing. Yu Fu: Data curation, Resources, Writing – original draft, Funding acquisition, Writing – review & editing. Qianqian Yang: Conceptualization, Supervision, Writing – original draft, Funding acquisition, Writing – review & editing. Yuanxue Gao: Data curation, Writing – review & editing. Tianbai Yu: Data curation,

Declaration of Competing Interest

The authors confirm that there are no conflicts of interest.

Acknowledgements

This work was supported by grant from the National Science Foundation of China (No. 62034007 and No. 62141404), the Zhejiang Provincial Innovation Team Project under No. 2020R01001, the Fundamental Research Funds for the Central Universities under Grant 2021FZZX001-20, and the Zhejiang Lab’s International Talent Fund for Young Professionals under No. ZJ2020JS013.

References (53)

  • J. Deng et al.

    Spatio-temporal deformable convolution for compressed video quality enhancement

    Proceedings of the AAAI conference on artificial intelligence

    (2020)
  • S. Dong et al.

    Rconet: deformable mutual information maximization and high-order uncertainty-aware learning for robust covid-19 detection

    IEEE Trans Neural Netw Learn Syst

    (2021)
  • S. Dong et al.

    Deu-net: Deformable u-net for 3d cardiac mri video segmentation

    International Conference on Medical Image Computing and Computer-Assisted Intervention

    (2020)
  • D. Glasner et al.

    Super-resolution from a single image

    2009 IEEE 12th International Conference on Computer Vision

    (2009)
  • M. Habijan et al.

    Whole heart segmentation using 3d fm-pre-resnet encoder–decoder based architecture with variational autoencoder regularization

    Applied Sciences

    (2021)
  • F. Isensee et al.

    Automatic cardiac disease assessment on cine-MRI via time-series segmentation and domain specific features

    International workshop on statistical atlases and computational models of the heart

    (2017)
  • Y. Jang et al.

    Automatic segmentation of LV and RV in cardiac MRI

    International Workshop on Statistical Atlases and Computational Models of the Heart

    (2017)
  • L. Joskowicz et al.

    Inter-observer variability of manual contour delineation of structures in ct

    Eur Radiol

    (2019)
  • A. Kappeler et al.

    Video super-resolution with convolutional neural networks

    TCI

    (2016)
  • A. Karpathy et al.

    Large-scale video classification with convolutional neural networks

    Proceedings of the IEEE conference on Computer Vision and Pattern Recognition

    (2014)
  • M. Khened et al.

    Densely connected fully convolutional network for short-axis cardiac cine mr image segmentation and heart diagnosis using random forest

    International Workshop on Statistical Atlases and Computational Models of the Heart

    (2017)
  • D.P. Kingma et al.

    Auto-encoding variational bayes

    arXiv preprint arXiv:1312.6114

    (2013)
  • W. Li et al.

    Mucan: Multi-correspondence aggregation network for video super-resolution

    European Conference on Computer Vision

    (2020)
  • D. Liu et al.

    Non-local recurrent network for image restoration

    arXiv preprint arXiv:1806.02919

    (2018)
  • O. Oktay et al.

    Attention u-net: learning where to look for the pancreas

    arXiv preprint arXiv:1804.03999

    (2018)
  • J. Patravali et al.

    2d-3d fully convolutional neural networks for cardiac mr segmentation

    International Workshop on Statistical Atlases and Computational Models of the Heart

    (2017)
  • Cited by (19)

    • Hybrid-scale contextual fusion network for medical image segmentation

      2023, Computers in Biology and Medicine
      Citation Excerpt :

      In the last decades, many works have been developed for efficient and robust segmentation performances and attained significant improvements. However, due to the lack of accurate means to identify feature information, medical image segmentation is challenging for various applications, including small and dense nuclei segmentation in microscope images [2,3], abdominal organ segmentation of various shapes and sizes in Computer Tomography (CT) images [4,5], and closely connected cardiac segmentation in Magnetic Resonance Imaging (MRI) [6,7]. Therefore, there is still a high demand for automatic segmentation methods, which can reduce the workload of experts and obtain the more reliable medical analysis.

    • Semi-supervised structure attentive temporal mixup coherence for medical image segmentation

      2022, Biocybernetics and Biomedical Engineering
      Citation Excerpt :

      Computer-aided intervention systems assist clinicians in addressing challenging medical domain problems, such as image segmentation, classification, enhancement, denoising, registration, and super-resolution [1–5].

    View all citing articles on Scopus
    View full text