DeU-Net 2.0: Enhanced deformable U-Net for 3D cardiac cine MRI segmentation
Graphical abstract
Introduction
Cardiovascular disease (CVD) has become one of the major causes of deaths in Europe with 3.9 million confirmed deaths (Sanz et al., 2020), according to World Health Organization (WHO). This has driven extensive interest to CVD research, which aims to recognize phenotypes, estimate disease risks, and provide patient therapeutic interventions (Blankstein, 2012, Guo, Ng, Goubran, Petersen, Piechnik, Neubauer, Wright, 2020). With the significant technological advances in digital imagery, a great number of approaches have been proposed and exploited to improve the diagnosis of CVD with various modern medical imaging techniques, such as ultrasound imaging, computed tomography (CT), magnetic resonance imaging (MRI), and Positron Emission Tomography (PET). Among them, MRI has been widely used by cardiologists as a non-invasive golden modality for the qualitative and quantitative assessment (Vick III, 2009) of cardiac structures and functions because of its excellent image quality.
The segmentation of kinetic MR images along the short axis is complicated but essential for the precise morphological and pathological analysis, diagnosis, and surgical planning of CVD. In particular, one has to delineate left ventricular endocardium (LV), myocardium (Myo), and right ventricular endocardium (RV) to calculate the volume of cavities in cardiac MRI, including end-diastolic (ED) and end-systolic (ES) phases (Peng et al., 2016). Automatic and accurate cardiac MRI segmentation, as a highly efficient and effective way, can save a lot of time to assist doctors and eliminate ambiguities from human intervention. However, it still remains unsuitable for clinical applications (Guo, Ng, Goubran, Petersen, Piechnik, Neubauer, Wright, 2020, Zhuang, 2013), due to the following characteristics of 3D cardiac MRI: (1) significant shape variations of cardiac structures; (2) inhomogeneous intensity in imaging and ambiguous borders; (3) motion/blood flow artefacts and partial volume effects; (4) complicated shapes and unbalanced sizes of cardiac structures.
Considering the great advances of deep learning in widespread vision tasks (Chang, Lan, Cheng, Wei, 2020, Dong, Yang, Fu, Tian, Zhuo, 2021, Choi, Kwon, Lee, 2019, Yan, Jiang, Shi, Zhuo, 2020), e.g., recognition, detection, and tracking, deep learning based automatic or semi-automatic segmentation methods for cardiac MRI have been proposed to achieve good performance (Ronneberger, Fischer, Brox, 2015, Oktay, Schlemper, Folgoc, Lee, Heinrich, Misawa, Mori, McDonagh, Hammerla, Kainz, et al., 2018, Zheng, Yang, Han, Zhang, Liang, Zhao, Wang, Chen, 2019, Zotti, Luo, Humbert, Lalande, Jodoin, 2017, Khened, Alex, Krishnamurthi, 2017, Isensee, Jaeger, Full, Wolf, Engelhardt, Maier-Hein, 2017, Baumgartner, Koch, Pollefeys, Konukoglu, 2017). For example, a dilated convolutional neural network (CNN) in (Wolterink et al., 2016) was trained on three orthogonal planes to capture context relations among images, and the further work was performed in (Wolterink et al., 2017). In addition, Zheng et al. (2018) employed pre/post-processing and incorporated prior knowledge into a CNN to improve segmentation consistency. For real-time 3D MRI segmentation, Wang et al. (2019a) proposed Multiscale Statistical U-Net (MSU-Net) by employing a statistical CNN. A new Independent Component Analysis U-Net (ICAU-Net) in (Wang et al., 2020) not only achieved highly precise 3D cardiac MRI segmentation, but also obtained higher throughput and lower latency than MSU-Net.
In this paper, we propose a novel deep neural network architecture, referred as Deformable U-Net (DeU-Net), to address some issues associated with 3D cardiac cine MRI segmentation. Our model contains the following three parts: Temporal Deformable Aggregation Module (TDAM), Enhanced Deformable Attention Network (EDAN), and Probabilistic Noise Correction Module (PNCM):
- •
Temporal Deformable Aggregation Module (TDAM) consists of an offset prediction network and a temporal deformable convolutional layer. TDAM takes in consecutive cardiac MR slices, including a target slice and its neighboring reference slices, as inputs to predict the deformable offsets by the offset prediction network. Then, the offsets which integrate spatial and temporal information are applied to the target slice by the temporal deformable convolutional layer, generating the fused features of the target slice.
- •
Enhanced Deformable Attention Network (EDAN) contains several flexible deformable convolutional layers and a Multi-Scale Attention Module (MSAM), with a pyramidal and cascading architecture. The fused features produced by TDAM are first fed into EDAN to generate features with different scales. Then features in lower scales are aligned with coarse estimations, and the deformable offsets and aligned features are propagated to higher scales for precise cardiac MRI segmentation. In addition, MSAM is proposed to embrace external statistic and capture long range dependencies between different scale features.
- •
Probabilistic Noise Correction Module (PNCM) adopts a two-stream CNN with shared weights. The fused features of the target slice, as the inputs of PNCM, are modeled as a distribution characterized by its mean and variance to account for the feature uncertainty.
This paper is an extended version of our conference paper (Dong et al., 2020) (referred as DeU-Net 1.0 in this paper) with additional contributions and substantial improvements, thus named DeU-Net 2.0 (called DeU-Net in following for simplicity). The extended contributions are summarized as follows:
- •
We replace deformable attention U-Net (including Deformable Global Position Attention Module) (Dong et al., 2020) by proposing a new architecture, referred to EDAN, which also contains a new module MSAM. EDAN fully exploits spatio-temporal information in fused features of the target slice by several deformable convolutional layers and guarantees precise and continuous borders of every segmentation map. The proposed MSAM investigates multi-scale feature correlation and captures the useful correspondences at different scales, resulting in better segmentation performance.
- •
We propose a novel module named PNCM in which the feature embedding (mean) and uncertainty (variance), drawn from Gaussian distribution, are learned simultaneously. A significant variance produced by PNCM indicates that it is uncertain about which category the pixels should be given in the target slice. Moreover, a novel uncertainty loss is proposed to prevent model from overfitting to noisy slices. Thus, the proposed PNCM alleviates the adverse effects of noisy samples and leads to better generalization.
- •
We enlarge our experimental dataset which includes not only the Extended ACDC dataset (Wang et al., 2019a) but also the ACDC dataset (Bernard et al., 2018) and the MM-WHS dataset (Zhuang et al., 2019) to validate the model generalization. The extensive experimental results quantitatively and qualitatively show that the proposed DeU-Net achieves the state-of-the-art performance on commonly used metrics on the Extended ACDC dataset and also gets competitive performance on the other two datasets.
The rest of this paper is organized as follows: In Section 2, we discuss the main components of DeU-Net. In Section 3, we present details of experiments setting. And the experimental results of our proposed architecture and compared existing approaches are shown in Section 4. We also conduct extensive experiments to demonstrate the benefits of TDAM, EDAN and PNCM on the segmentation performance in Section 5. Finally, we conclude this paper in Section 6.
Section snippets
Methods
In this section, we introduce the proposed DeU-Net for 3D cardiac cine MRI segmentation, which consists of three modules, i.e., Temporal Deformable Aggregation Module (TDAM), Enhanced Deformable Attention Network (EDAN), and Probabilistic Noise Correction Module (PNCM), as shown in Fig. 1. A sequence of consecutive cardiac MR slices, a target slice and its neighboring reference slices, is first fed into TDAM which consists of a temporal deformable convolution layer and an offset prediction
Experiments
In this section, we present the experimental results to validate the effectiveness of our proposed DeU-Net method. Note that, DeU-Net 2.0 approach is called DeU-Net in this paper, while the DeU-Net method of our conference paper (Dong et al., 2020) is referred as DeU-Net 1.0 for comparison.
Results
In this section, we present the experimental results of our proposed method on the Extended ACDC dataset (Wang et al., 2019a), compared with state-of-the-art approaches. Note that, to demonstrate the generalization ability of DeU-Net 2.0, we further conduct experiments on the ACDC (Bernard et al., 2018) and the MM-WHS (Zhuang et al., 2019) datasets.
Discussions
In this section, we further conduct ablation studies to numerically analyze the benefits of the three key modules of the proposed DeU-Net 2.0, i.e., TDAM, EDAN, and PNCM, as listed in Table 4. Note that, model (ix) represents DeU-Net 2.0.
Summary
In this paper, we propose an enhanced Deformable U-Net (DeU-Net) for 3D cardiac cine MRI segmentation, including the following three parts: Temporal Deformable Aggregation Module (TDAM), Enhanced Deformable Attention Network (EDAN), and Probabilistic Noise Correction Module (PNCM). A sequence of consecutive cardiac MR slices, a target slice and its neighboring reference slices, is first fed into TDAM, which consists of a temporal deformable convolutional layer and an offset prediction network
CRediT authorship contribution statement
Shunjie Dong: Investigation, Methodology, Software, Visualization, Writing – original draft. Zixuan Pan: Data curation, Software, Validation, Resources, Writing – review & editing. Yu Fu: Data curation, Resources, Writing – original draft, Funding acquisition, Writing – review & editing. Qianqian Yang: Conceptualization, Supervision, Writing – original draft, Funding acquisition, Writing – review & editing. Yuanxue Gao: Data curation, Writing – review & editing. Tianbai Yu: Data curation,
Declaration of Competing Interest
The authors confirm that there are no conflicts of interest.
Acknowledgements
This work was supported by grant from the National Science Foundation of China (No. 62034007 and No. 62141404), the Zhejiang Provincial Innovation Team Project under No. 2020R01001, the Fundamental Research Funds for the Central Universities under Grant 2021FZZX001-20, and the Zhejiang Lab’s International Talent Fund for Young Professionals under No. ZJ2020JS013.
References (53)
- et al.
Improving cardiac mri convolutional neural network segmentation on small training datasets and dataset shift: a continuous kernel cut approach
Med Image Anal
(2020) - et al.
Efficient and robust instrument segmentation in 3d ultrasound using patch-of-interest-fusenet with hybrid loss
Med Image Anal
(2021) - et al.
Evaluation of algorithms for multi-modality whole heart segmentation: an open-access grand challenge
Med Image Anal
(2019) - et al.
An exploration of 2D and 3D deep learning techniques for cardiac mr image segmentation
International Workshop on Statistical Atlases and Computational Models of the Heart
(2017) - et al.
Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved?
IEEE Trans Med Imaging
(2018) Introduction to noninvasive cardiac imaging
Circulation
(2012)- et al.
Data uncertainty learning in face recognition
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
(2020) - et al.
Combining fully convolutional and recurrent neural networks for 3d biomedical image segmentation
Advances in neural information processing systems
(2016) - et al.
Deep meta learning for real-time target-aware visual tracking
Proceedings of the IEEE/CVF International Conference on Computer Vision
(2019) - et al.
Deformable convolutional networks
Proceedings of the IEEE international conference on computer vision
(2017)
Spatio-temporal deformable convolution for compressed video quality enhancement
Proceedings of the AAAI conference on artificial intelligence
Rconet: deformable mutual information maximization and high-order uncertainty-aware learning for robust covid-19 detection
IEEE Trans Neural Netw Learn Syst
Deu-net: Deformable u-net for 3d cardiac mri video segmentation
International Conference on Medical Image Computing and Computer-Assisted Intervention
Super-resolution from a single image
2009 IEEE 12th International Conference on Computer Vision
Whole heart segmentation using 3d fm-pre-resnet encoder–decoder based architecture with variational autoencoder regularization
Applied Sciences
Automatic cardiac disease assessment on cine-MRI via time-series segmentation and domain specific features
International workshop on statistical atlases and computational models of the heart
Automatic segmentation of LV and RV in cardiac MRI
International Workshop on Statistical Atlases and Computational Models of the Heart
Inter-observer variability of manual contour delineation of structures in ct
Eur Radiol
Video super-resolution with convolutional neural networks
TCI
Large-scale video classification with convolutional neural networks
Proceedings of the IEEE conference on Computer Vision and Pattern Recognition
Densely connected fully convolutional network for short-axis cardiac cine mr image segmentation and heart diagnosis using random forest
International Workshop on Statistical Atlases and Computational Models of the Heart
Auto-encoding variational bayes
arXiv preprint arXiv:1312.6114
Mucan: Multi-correspondence aggregation network for video super-resolution
European Conference on Computer Vision
Non-local recurrent network for image restoration
arXiv preprint arXiv:1806.02919
Attention u-net: learning where to look for the pancreas
arXiv preprint arXiv:1804.03999
2d-3d fully convolutional neural networks for cardiac mr segmentation
International Workshop on Statistical Atlases and Computational Models of the Heart
Cited by (19)
DFBU-Net: Double-branch flat bottom U-Net for efficient medical image segmentation
2024, Biomedical Signal Processing and ControlMSEF-Net: Multi-scale edge fusion network for lumbosacral plexus segmentation with MR image
2024, Artificial Intelligence in MedicineHybrid-scale contextual fusion network for medical image segmentation
2023, Computers in Biology and MedicineCitation Excerpt :In the last decades, many works have been developed for efficient and robust segmentation performances and attained significant improvements. However, due to the lack of accurate means to identify feature information, medical image segmentation is challenging for various applications, including small and dense nuclei segmentation in microscope images [2,3], abdominal organ segmentation of various shapes and sizes in Computer Tomography (CT) images [4,5], and closely connected cardiac segmentation in Magnetic Resonance Imaging (MRI) [6,7]. Therefore, there is still a high demand for automatic segmentation methods, which can reduce the workload of experts and obtain the more reliable medical analysis.
Semi-supervised structure attentive temporal mixup coherence for medical image segmentation
2022, Biocybernetics and Biomedical EngineeringCitation Excerpt :Computer-aided intervention systems assist clinicians in addressing challenging medical domain problems, such as image segmentation, classification, enhancement, denoising, registration, and super-resolution [1–5].