A contrastive consistency semi-supervised left atrium segmentation model

https://doi.org/10.1016/j.compmedimag.2022.102092Get rights and content

Highlights

  • A semi-supervised LA segmentation framework to leverage unlabeled data for automatic and accurate LA segmentation.

  • A contrastive consistency loss function based on the class-vector for learning more distinguishable representations.

  • A classification model based on the class-aware information to improve the performance of segmentation model.

Abstract

Accurate segmentation for the left atrium (LA) is a key process of clinical diagnosis and therapy for atrial fibrillation. In clinical, the semantic-level segmentation of LA consumes much time and labor. Although supervised deep learning methods can somewhat solve this problem, a high-efficient deep learning model requires abundant labeled data that is hard to acquire. Therefore, the research on automatic LA segmentation of leveraging unlabeled data is highly required. In this paper, we propose a semi-supervised LA segmentation framework including a segmentation model and a classification model. The segmentation model takes volumes from both labeled and unlabeled data as input and generates predictions of LAs. And then, a classification model maps these predictions to class-vectors for each input. Afterward, to leverage the class information, we construct a contrastive consistency loss function based on these class-vectors, so that the model can enlarge the discrepancy of the inter-class and compact the similarity of the intra-class for learning more distinguishable representation. Moreover, we set the class-vectors from the labeled data as references to the class-vectors from the unlabeled data to relieve the influence of the unreliable prediction for the unlabeled data. At last, we evaluate our semi-supervised LA segmentation framework on a public LA dataset using four universal metrics and compare it with recent state-of-the-art models. The proposed model achieves the best performance on all metrics with a Dice Score of 89.81 %, Jaccard of 81.64 %, 95 % Hausdorff distance of 7.15 mm, and Average Surface Distance of 1.82 mm. The outstanding performance of the proposed framework shows that it may have a significant contribution to assisting the therapy of patients with atrial fibrillation. Code is available at: https://github.com/PerceptionComputingLab/SCC.

Introduction

Atrial fibrillation (AF) is a common heart disease and the risk of it increases with age (Feinberg et al., 1995). Patients with AF may have heart palpitations, breathlessness, low energy, and an increased risk of stroke (Center, 2009). Catheter ablation is a current routine therapy for patients with AF (Kalla et al., 2017). However, the success ratio of catheter ablation is unsatisfactory, after which AF recurrence and the second ablation often happen (Chelu et al., 2018). According to the clinical experience, ablation strategies and AF recurrence are dominated by the degree of atrial fibrosis and the ablation-related scar (Akoum et al., 2011; Wu et al., 2021). And learning the topology of the left atrium (LA) is crucial for evaluating the degree of atrial fibrosis and ablation-related scar in patients with AF. Therefore, to improve the success ratio of the catheter ablation, accurate segmentation of the LA in medical images is a critical process that can assist the clinic in understanding the topology of LA, assessing the risk of AF, and making patient-specific treatment plan. Recently, late gadolinium-enhanced MRI (LGE MRI) provides a promising visualizing ability for myocardial scar tissues through brightening scar signal intensities to differentiate them from the healthy tissues, which results in the poor boundary of the LA (Yang et al., 2020). The LA segmentation involves the LA cavity, pulmonary veins, LA appendage, etc. These complex structures and the fuzzy boundary problem make the acquirement of the semantic-level label of the LA consuming much more time and labor. Therefore, accurate and automatic segmenting of the LA in LGE MRI is a challenging and necessary task.

For the past few years, deep learning models have taken impressive improvements on several medical image segmentation tasks (Shen et al., 2017). However, a high-efficient supervised deep learning model requires abundant labeled data. And the requirement of plenty of data with dense annotations somewhat slows down the process of deep learning application in medical image analysis. On the other hand, a large amount of unlabeled data may be available with the development of the wise information technology of med (Cheplygina et al., 2019). Hence, research on leveraging unlabeled data for medical image analysis is highly required.

In this work, we focus on semi-supervised learning (SSL) to learn representations from both labeled and unlabeled data for LA segmentation. SSL is an intermediate way between supervised learning and unsupervised learning (Chapelle et al., 2006), and its efficiency has been verified in many computer vision tasks (Van Engelen and Hoos, 2020). Typically, SSL attempts to train a model with a limited amount of labeled data and a large amount of unlabeled data. The unlabeled data supervises the model in a self-training manner with the consistent regularization which is based on the assumption that predictions of the model should be consistent under minor perturbations for the same input (Van Engelen and Hoos, 2020). Notably, the scope of this work is the standard SSL whose involved data has the same categories and modality (e.g. MRI).

Recently, several LA segmentation works have been done with SSL to relieve the requirement of expensive dense annotations for deep learning models. Primary works of these SSL models for LA segmentation are based on consistent regularization. To be specific, they can either make model predictions consistent with the original unlabeled data and its random perturbed data (e.g. noise, scaling) or make the model learn distribution consistency between labeled and unlabeled data by adversarial learning. Due to the consistency is calculated among predictions of unlabeled data (also called pseudo labels), the false prediction has the potential to make the training unstable. To mitigate the effect of unreliable predictions on the stability of training, UA-MT leveraged an uncertainty map of predictions for perturbed data to filter out the high uncertainty regions (Yu et al., 2019). This model adopted the mean-teacher (Tarvainen and Valpola, 2017) framework that required two networks and multiple forward propagations to formulate the uncertainty information. To reduce the time and memory cost, Wu et al. designed a network with two decoders and formulated the discrepancy of these two predictions as model uncertainty information to construct an unsupervised loss (Wu et al., 2021b). However, this model just considered the consistency in the output-level. To embed the geometric information into training, Li et al. (2020) took a distance map regression as an auxiliary task and adopted a discriminator to distinguish the source of the predicted distance map to learn the representation from unlabeled data while learning the shape information (). Following this work, Luo et al., 2021a, Luo et al., 2021b extended the concept of consistency to the task-level and proposed a dual-task model that jointly optimized the segmentation task and a distance map regression task to utilize geometric information and unlabeled data at the same time. Most of these models leveraged unlabeled data by forcing the model to be consistent in either image-/ouput-level or feature-level (Wang et al., 2020). But they ignored the class-level information and became class-agnostic approaches. However, the class-level information is crucial to improve the distinguishability of the segmentation model.

Contrastive learning has achieved major advances in self-supervised representation learning. The main idea of it is to pull the positive samples together and push the negative samples apart. And the sample construction strategy is commonly based on data augmentations at the image-level. Augmentations of the same input are positive samples, and the other data are negative samples (Khosla et al., 2020; Chaitanya et al., 2020). The performance of contrastive learning has shown great potential and achieved state-of-the-art results in downstream visual tasks (He et al., 2020; Chen et al., 2020). However, the representation learning of contrastive learning is usually on the image-level. It is too rough to fit the semantic segmentation task. To learn more specific representations, Chaitanya et al. (2020) proposed a local version contrastive learning to encourage the model to learn local representations (). Following this local contrastive learning idea, Xiang et al. embedded a contrastive loss at the feature level for SSL based on a teacher-student model (Xiang et al., 2021). Although these models constructed sampling based on local or feature levels, the class information is still ignored.

Inspired by the idea of contrastive learning (Chen et al., 2020; Chaitanya et al., 2020; Khosla et al., 2020; Chen et al., 2021), we embedded a contrastive consistency loss at the class-level in an unsupervised manner to enable the class-aware SSL. For learning the class-level representation, we constructed a classification model following a segmentation model that takes the segmentation predictions as input and maps them into a class-vector space. Then, we set class-vectors of the same class as intra-class samples and class-vectors of different classes as inter-class samples. At last, the contrastive consistency loss based on these samples is embedded in the supervised segmentation loss to jointly optimize the segmentation framework.

In summary, the main contributions of our model are three folds:

Firstly, we proposed a class-aware semi-supervised LA segmentation framework. Compared with the class-agnostic SSL models, the framework can leverage the class-level information to learn representations from both labeled and unlabeled data to improve the distinguishability of the segmentation model.

Secondly, we proposed a contrastive consistency loss on the class-vector space. Compared with the sample construction strategy at the image-level, our class-level sample construction strategy can enable the model to learn more distinguishable representations that will be beneficial to the pixel-level segmentation task. Moreover, we set the samples of labeled data as the reference to samples of unlabeled data to alleviate the effect of the unreliable predictions for unlabeled data.

Thirdly, we verified our framework on the popular left atrial segmentation dataset and performed plenty of ablation and comparative experiments. Both quantitative and qualitative results demonstrated the superiority of the proposed framework.

Section snippets

Materials and methods

In this section, we will introduce the detail of the proposed LA semi-supervised segmentation framework. We first briefly present the involved data in this work. Afterward, we describe details of our framework and loss functions. At last, details of the implementation and metrics are described.

Comparative Experiments and Results

Firstly, we compared our framework with four state-of-the-art LA semi-supervised segmentation works, including the uncertainty-aware mean teacher approach (UA-MT) (Yu et al., 2019), shape-aware adversarial network (SASSNet) (Li et al., 2020), local and global structure-aware entropy regularized mean teacher model (LG-ER-MT) (Hang et al., 2020), and dual-task consistency framework (DTC) (Luo et al., 2021a). Table 1 demonstrates the quantitative comparative result of these methods. The first two

Discussion

In this work, we aimed to develop a class-aware semi-supervised LA segmentation framework on LGE MRI for patients with AF. Extensive experiments demonstrate that learning representation from the unlabeled data in the training stage can improve segmentation performance. Current mainstream semi-supervised LA segmentation works focused on a consistent regularization strategy to leverage the unlabeled data. While this kind of model usually requires a complex structure, such as a mean teacher with

5. Conclusions

In this study, we constructed a semi-supervised LA segmentation framework with a segmentation model followed by a classification model. The E2DNet takes patches as input to predict probability maps for each class. And the classification model maps these probability maps into the class-vector space. At last, the framework is supervised by the segmentation loss of labeled data and self-supervised by the contrastive consistency loss between labeled data and the unlabeled data. Thanks to the

CRediT authorship contribution statement

Yashu Liu: Conceptualization; Formal analysis; Investigation; Methodology; Validation; Visualization; Writing-original draft. Wei Wang, Funding acquisition; Supervision; Writing-review & editing. Gongning Luo: Funding acquisition; Project administration; Writing-review & editing. Kuanquan Wang: Funding acquisition; Project administration; Supervision; Writing-review & editing. Shuo Li: Investigation; Methodology; Validation; Writing-review & editing.

Funding

This work was supported by the National Natural Science Foundation of China [grant numbers 62001141, 62001144]; and the Science and Technology Innovation Committee of Shenzhen Municipality [grant number JCYJ20210324131800002].

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

Thanks to the authors of Yu et al. (2019) and Ma et al. (2020). Their code repositories are the fundament of our work. Thanks to the organizers of the 2018 Atrial Segmentation Challenge to publish the LA segmentation dataset.

References (39)

  • J. Chen et al.

    Adaptive hierarchical dual consistency for semi-supervised left atrium segmentation on cross-domain data

    IEEE T. Med. Imaging

    (2022)
  • Chen, T., Kornblith, S., Norouzi, M., Hinton, G. , 2020. A Simple Framework for Contrastive Learning of Visual...
  • Chen, T., Luo, C., Li, L., 2021. Intriguing properties of contrastive losses. In: Advances in Neural Information...
  • L.R. Dice

    Measures of the amount of ecologic association between species

    Ecology

    (1945)
  • W.M. Feinberg et al.

    Prevalence, age distribution, and gender of patients with atrial fibrillation: analysis and implications

    Arch. Intern. Med.

    (1995)
  • Fausto, M., Nassir, N., Seyed-Ahmad, A., 2016. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image...
  • Gerig, G., Jomier, M., Chakos, M., 2001. Valmet: A New Validation Tool for Assessing and Improving 3D Object...
  • Hang, W., Feng, W., Liang, S., Yu, L., Wang, Q., Choi, K., Qin, J., 2020. Local and Global Structure-Aware Entropy...
  • He, K., Fan, H., Wu, Y., Xie, S., Grishick, R., 2020. Momentum Contrast for Unsupervised Visual Representation...
  • Cited by (23)

    View all citing articles on Scopus
    View full text