Elsevier

Medical Image Analysis

Volume 83, January 2023, 102673
Medical Image Analysis

Deep semi-supervised multiple instance learning with self-correction for DME classification from OCT images

https://doi.org/10.1016/j.media.2022.102673Get rights and content

Highlights

  • A novel deep semi-supervised multiple instance learning framework for OCT analysis.

  • Dynamic weighting and self-correction could alleviate the impact of label noise.

  • Dual consistency regularization is proposed to exploit unlabeled data for training.

  • Significant improvements on two DME OCT datasets compared to the state of the art.

Abstract

Supervised deep learning has achieved prominent success in various diabetic macular edema (DME) recognition tasks from optical coherence tomography (OCT) volumetric images. A common problematic issue that frequently occurs in this field is the shortage of labeled data due to the expensive fine-grained annotations, which increases substantial difficulty in accurate analysis by supervised learning. The morphological changes in the retina caused by DME might be distributed sparsely in B-scan images of the OCT volume, and OCT data is often coarsely labeled at the volume level. Hence, the DME identification task can be formulated as a multiple instance classification problem that could be addressed by multiple instance learning (MIL) techniques. Nevertheless, none of previous studies utilize unlabeled data simultaneously to promote the classification accuracy, which is particularly significant for a high quality of analysis at the minimum annotation cost. To this end, we present a novel deep semi-supervised multiple instance learning framework to explore the feasibility of leveraging a small amount of coarsely labeled data and a large amount of unlabeled data to tackle this problem. Specifically, we come up with several modules to further improve the performance according to the availability and granularity of their labels. To warm up the training, we propagate the bag labels to the corresponding instances as the supervision of training, and propose a self-correction strategy to handle the label noise in the positive bags. This strategy is based on confidence-based pseudo-labeling with consistency regularization. The model uses its prediction to generate the pseudo-label for each weakly augmented input only if it is highly confident about the prediction, which is subsequently used to supervise the same input in a strongly augmented version. This learning scheme is also applicable to unlabeled data. To enhance the discrimination capability of the model, we introduce the Student–Teacher architecture and impose consistency constraints between two models. For demonstration, the proposed approach was evaluated on two large-scale DME OCT image datasets. Extensive results indicate that the proposed method improves DME classification with the incorporation of unlabeled data and outperforms competing MIL methods significantly, which confirm the feasibility of deep semi-supervised multiple instance learning at a low annotation cost.

Introduction

Diabetes mellitus is the global epidemic of the 21st century. The global diabetes prevalence is estimated to be 10.2% (578 million) by 2030 and 10.9% (700 million) by 2045 (Saeedi et al., 2019, Teo et al., 2021). Diabetic retinopathy (DR), a specific microvascular complication of diabetes, is a leading cause of blindness in the working-age population of most developed countries (Ciulla et al., 2003, Lee et al., 2015, Wong et al., 2016). At any time during the progression of DR, patients with diabetes can also develop diabetic macular edema (DME), which involves retinal thickening in the macular area, increased vascular permeability and the deposition of hard exudates at the central retina (Ferris III and Patz, 1984, Ciulla et al., 2003, Mohamed et al., 2007, Wong et al., 2016, Tan et al., 2017, Schmidt-Erfurth et al., 2017). With a variety of locations and sizes, DME can be further classified into center-involved DME (CI-DME), which was defined as either retinal thickening or the presence of DME features in the macula involving the central subfield zone (1 mm in diameter) in the Early Treatment Diabetic Retinopathy Study (ETDRS) grid, and non-center-involved DME (non-CI-DME), which was defined as retinal thickening or the presence of DME features in the macula not involving the central subfield zone. Retinal thickening was defined according to DRCR.net protocol-defined thresholds ( 320 μm for men and  305 μm for women on Spectralis OCT;  305 μm for men and  290 μm for women on Cirrus OCT) and the Moorfields DME study ( 350 μm on Topcon OCT) (Wells et al., 2015, Patrao et al., 2016).

At present, DME has become the principal cause of vision loss in people with diabetes, and the increasing number of individuals with diabetes worldwide indicates that DME will continue to be the major contributor to vision loss and associated functional impairment for years to come (Ciulla et al., 2003, Tan et al., 2017). Early detection and timely treatment are important to prevent DME patients from further visual loss.

In clinical practice, spectral-domain optical coherence tomograhpy (Huang et al., 1991) is the most widely used imaging equipment for DME diagnosis and patient management due to its many unique merits compared to fundus photography. For instance, OCT is noninvasive and can provide precise visualization of the morphology of the macula and retina layers in multiple high-resolution cross-sectional images (i.e., B-scans). However, the high dimensionality of OCT scans, on the other hand, also increases the burden of ophthalmologists, as it demands qualified experts to examine every single B-scan image carefully to make convincing diagnosis (whether being DME or not). Several typical B-scan examples of normal and DME cases are depicted in Fig. 1. Automatic recognition of DME from OCT images is full of clinical significance in this field, which could largely reduce subjectiveness of ophthalmologists, speed up the diagnosis workflow, and also improve the accuracy. It is specially important for early detection of DME.

Thanks to the advancement in data storage and OCT imaging technology, OCT data grows in scale dramatically, both in numbers and image resolutions, which enables OCT image related researches towards upon large-scale high-resolution datasets. However, a challenge, i.e., the lack of annotated data, also follows, which has been frequently mentioned in this field (Cheplygina et al., 2019). The annotation work of OCT data is very high-demanding, time-consuming and laborious. In particular, the symptom of DME always appears sparsely across the entire OCT volume, making the acquisition of thorough and fine-grained annotation of DME lesions quite expensive, and even impossible in certain situations. An example of an eye with non-CI-DME on Topcon Triton radial scan is illustrated in Fig. 2. The DME lesion can only be visualized on one B-scan (in the yellow circles within the 4th B-scan image). Therefore, it is important to have an overall output (i.e., at the volume level) from a series of B-scans in clinical practice.

Compared to pixel-level or lesion-level annotations, coarse annotations (volume-level labels) are much easier, faster, and more feasible to acquire in clinical practice, which provide a global diagnosis of the whole OCT volume (e.g., DME or normal) from a high-level perspective. Moreover, there are a huge amount of unlabeled OCT data stored in the hospital database, which can be readily utilized for medical image analysis.

Over the past few years, researchers have constructed different OCT image datasets for a series of ophthalmic disease recognition tasks, including retinal nerve fiber layer segmentation (Garvin et al., 2008, Mayer et al., 2010), specific lesion (e.g., cysts) segmentation (Quellec et al., 2010, Schlegl et al., 2015) and disease screening (Venhuizen et al., 2015, Liu et al., 2011, Farsiu et al., 2014, Wang et al., 2020b, Wang et al., 2020a). However, the publicly available OCT datasets contain either individual B-scan images that are elaborately selected from the horizontal fovea cut of OCT volumes (Kermany et al., 2018) or just a very small number of OCT volumes (Srinivasan et al., 2014, Farsiu et al., 2014), which are far insufficient for comprehensive analysis at both B-scan and volume levels. Given volume-level labels only, the identification of DME from OCT volumes can be formulated as a typical multiple instance classification (MIC) problem since the lesions always have a sparse distribution across B-scans. A variety of algorithms built on the weak supervisory signal have been proposed to address this problem. Most previous works first encoded the whole OCT volume to a feature embedding (Venhuizen et al., 2015, Lemaitre et al., 2015, Fu et al., 2016, Lemaître et al., 2016, Venhuizen et al., 2017, Alsaih et al., 2017, Mousavi et al., 2019, Sun and Sun, 2019), and then built a volume-level classifier to yield the diagnosis prediction (Albarrak et al., 2013, Srinivasan et al., 2014, Farsiu et al., 2014, Venhuizen et al., 2015, Lemaitre et al., 2015, Venhuizen et al., 2017, Liu et al., 2011). Among these works, handcrafted features (Lemaitre et al., 2015, Fu et al., 2016, Alsaih et al., 2017, Mousavi et al., 2019) or hidden layer features (Wang et al., 2020b) from the deep neural networks were abstracted and aggregated for the global feature representation. Besides, some researchers devoted efforts to building a robust B-scan classifier that could accurately recognize B-scan images with diagnostic information. In this manner, each B-scan image corresponds to a single prediction score, and a volume-level fusion strategy is subsequently applied to aggregate these B-scan scores to make the final prediction (Perdomo et al., 2018, Qiu and Sun, 2019, Wang et al., 2020b). Another line of research formulates the problem as an anomaly detection task (Sidibe et al., 2017, Seeböck et al., 2018, Schlegl et al., 2019), in which only healthy images are involved for model learning. Once the normal appearance pattern is well captured by the model, the model recognizes the disease from unseen OCT images by detecting outliers as abnormal cases.

Considering that the hidden structure in unlabeled OCT images often seem useful, we are motivated to exploit unlabeled OCT images for DME OCT classification, which enables learning from a small set of labeled data. So far, there has been hardly research reported on semi-supervised multiple instance learning for DME identification from OCT volumes. This is mainly because under multiple instance learning (MIL) circumstance, the performance of semi-supervised learning (SSL) would be affected by the ambiguity that the positive instances are not directly given, which adds difficulty in finding the correct target proxy for unlabeled instances during training. To address this problem, we propose a novel deep semi-supervised multiple instance learning framework in which an instance-level classifier is trained in an end-to-end manner. Specifically, we first initialize all training instances by the volume labels to warm up the training phase, and the network is partially supervised by this weak supervisory signal. But it would often hinder the discrimination capability of the model because of the confusion caused by the noisy instances. To mitigate this issue, we propose two strategies as follows: one is that we gradually weaken the impact of such ‘supervision’ over training steps. This is done by reducing the cross-entropy loss by a dynamic factor after each training epoch. The other is that we design a self-correction strategy to filter out unconvincing instances in positive bags during training. More concretely, it is based on confidence-based pseudo-labeling and consistency regularization. For each instance from the positive bags, the model uses its own prediction on the weakly augmented input to generate the pseudo-label, which is then utilized to supervise the same input of a strongly augmented version. We apply the same procedure to unlabeled data since it also works effectively to mine the semantic information from unlabeled data (Sohn et al., 2020). Additionally, we introduce the Student–Teacher scheme (Tarvainen and Valpola, 2017) to promote the discrimination capability of the model. The teacher model is an exponential moving average (EMA) of consecutive student models over training steps, and gradually become more accurate than the final weights of the student model. In this regard, we generate pseudo-labels of instances that originate from positive and unlabeled bags by the prediction of the teacher model. We then allow the two models to learn from their disagreements via a shared consistency regularization loss on all training data with the same or different perturbation(s). To summarize, our work has the following contributions:

  • 1.

    To the best of our knowledge, this is the first work that presents a deep semi-supervised multiple instance learning framework for DME identification from OCT scans using a small amount of coarsely labeled data and a large amount of unlabeled data, which is a promising solution to real-world application of DME OCT screening considering expert annotation efforts and readiness of data.

  • 2.

    To tackle the label ambiguity in positive bags, we design a dynamic weighting technique to gradually weaken the bag-label supervision and a self-correction strategy to maintain the most convincing instances for training, which can effectively alleviate impact of label noise, and thus boost the model robustness.

  • 3.

    To efficiently exploit unlabeled data, we propose to diversify consistency constraints by considering the models weights and augmentations of inputs. We take advantage of the Student–Teacher architecture and apply weak/strong augmentation to inputs of the two models. Hence, the discrimination capability of the model can be greatly promoted.

  • 4.

    We conduct extensive ablation studies to verify the efficacy of each component of the framework, and the prominent classification performance on two large-scale DME OCT image datasets demonstrates the feasibility and effectiveness of our method, outperforming several state-of-the-art deep multiple instance learning methods by a large margin.

Section snippets

Related work

In this section, we will first recap previous studies on DME OCT classification that are established on volume-level labels. Then we will review SSL techniques and their application for medical imaging.

Method

In this work, we innovatively introduce an unified deep semi-supervised multiple instance learning framework to train an instance classifier in an end-to-end fashion. A graphical overview of the proposed framework is depicted in Fig. 3. Our method builds on several top of recent weakly supervised learning approaches, and we pioneer the study that leverages a small number of coarsely labeled data and a large number of unlabeled data for DME identification from OCT scans. Initially, we propagate

Dataset and evaluation metrics

We evaluated our method on two DME OCT image datasets that were collected from patients with diabetes at The Chinese University of Hong Kong Eye Center (Wang et al., 2020b), namely Triton-DME and Heidelberg-DME. The Triton-DME was obtained by Topcon Triton swept-source DRI-OCT (Topcon, Tokyo, Japan) with the scanning protocol of radial 9mm * 30°. Each OCT volume includes 12 macular centered, 9mm B-scans with the resolution of 1024 × 992 pixels. The Heidelberg-DME was acquired by Spectralis OCT

Discussion

In clinical routine practice, DME identification from OCT images is quite essential for individuals with diabetes. However, the high dimension of OCT data and the sparse distribution of DME lesions always make the manual examination very time-consuming and laborious. Besides, the demand of qualified ophthalmologists has far outstripped the real supply worldwide, leading to lots of patients missing the best treatment time. This problematic issue is particularly severe for underserved populations

Conclusion

In this work, we present a deep semi-supervised multiple instance learning framework for OCT image analysis. To our best of knowledge, this is the first study that leverages a small number of coarsely labeled data and a large number of unlabeled data for DME classification from OCT scans. Our framework exploits confidence-based pseudo-labeling to select the most convincing instances in the positive and unlabeled bags for consistency regularization, and further takes advantage of the

CRediT authorship contribution statement

Xi Wang: Conceptualization, Methodology, Software, Investigation, Writing – original draft, Writing – review & editing. Fangyao Tang: Data curation, Resources. Hao Chen: Methodology, Writing – review & editing, Supervision. Carol Y. Cheung: Data curation, Resources, Supervision. Pheng-Ann Heng: Supervision, Funding acquisition, Writing – review & editing, Resources.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The work described in this paper was supported in parts by the following grants: Key-Area Research and Development Program of Guangdong Province, China (2020B010165004); National Natural Science Foundation of China with Project No. U1813204; Hong Kong Innovation and Technology Fund (Project No. GHP/110/19SZ); Research Grants Council General Research Fund, Hong Kong (Ref No. 14102418), and Innovation and Technology Fund, Hong Kong (Ref No. MRP/056/20X). We thank Dr. An-ran Ran, Ziqi Tang and

References (82)

  • TanG.S. et al.

    Diabetic macular oedema

    Lancet Diabet. Endocrinol.

    (2017)
  • WangX. et al.

    Towards multi-center glaucoma OCT image screening with semi-supervised joint structure and function multi-task learning

    Med. Image Anal.

    (2020)
  • WangX. et al.

    Deep virtual adversarial self-training with consistency regularization for semi-supervised medical image classification

    Med. Image Anal.

    (2021)
  • WangS. et al.

    Boundary and Entropy-driven Adversarial Learning for Fundus Image Segmentation

  • Albarrak, A., Coenen, F., Zheng, Y., et al., 2013. Age-related macular degeneration identification in volumetric...
  • AlsaihK. et al.

    Machine learning techniques for diabetic macular edema (DME) classification on SD-OCT images

    Biomed. Eng. Online

    (2017)
  • BatmanghelichK.N. et al.

    Disease classification and prediction via semi-supervised dimensionality reduction

  • BatmanghelichN.K. et al.

    Generative-discriminative basis learning for medical imaging

    IEEE Trans. Med. Imaging

    (2014)
  • Berthelot, D., Carlini, N., Cubuk, E.D., Kurakin, A., Sohn, K., Zhang, H., Raffel, C., 2020. Remixmatch:...
  • BerthelotD. et al.

    Mixmatch: A holistic approach to semi-supervised learning

  • Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D., 2017. Unsupervised pixel-level domain adaptation with...
  • ChenC. et al.

    Semantic-aware generative adversarial nets for unsupervised domain adaptation in chest x-ray segmentation

  • ChinchorN. et al.

    MUC-5 evaluation metrics

  • CiullaT.A. et al.

    Diabetic retinopathy and diabetic macular edema: pathophysiology, screening, and novel therapies

    Diabetes Care

    (2003)
  • CubukE.D. et al.

    Randaugment: Practical automated data augmentation with a reduced search space

    (2019)
  • DingY. et al.

    A semi-supervised two-stage approach to learning from noisy labels

  • Ferris IIIF.L. et al.

    Macular edema. a complication of diabetic retinopathy

    Surv. Ophthalmol.

    (1984)
  • FuD. et al.

    Retinal status analysis method based on feature extraction and quantitative grading in OCT images

    Biomed. Eng. Online

    (2016)
  • GarvinM.K. et al.

    Intraretinal layer segmentation of macular optical coherence tomography images using optimal 3-D graph search

    IEEE Trans. Med. Imaging

    (2008)
  • Han, J., Luo, P., Wang, X., 2019. Deep self-learning from noisy labels. In: Proceedings of the IEEE/CVF International...
  • Huang, G., Liu, S., Van der Maaten, L., Weinberger, K.Q., 2018. Condensenet: An efficient densenet using learned group...
  • HuangD. et al.

    Optical coherence tomography

    Science

    (1991)
  • IlseM. et al.

    Attention-based deep multiple instance learning

  • KermanyD.S. et al.

    Identifying medical diagnoses and treatable diseases by image-based deep learning

    Cell

    (2018)
  • Kingma, D.P., Ba, J., 2015. Adam: A method for stochastic optimization. In: The Third International Conference on...
  • Laine, S., Aila, T., 2017. Temporal ensembling for semi-supervised learning. In: Fifth International Conference on...
  • LecouatB. et al.

    Semi-supervised deep learning for abnormality classification in retinal images

    (2018)
  • LeeR. et al.

    Epidemiology of diabetic retinopathy, diabetic macular edema and related vision loss

    Eye Vis.

    (2015)
  • LemaîtreG. et al.

    Classification of SD-OCT volumes using local binary patterns: experimental validation for DME detection

    J. Ophthalmol.

    (2016)
  • Lemaitre, G., Rastgoo, M., Massich, J., Sankar, S., Mériaudeau, F., Sidibé, D., 2015. Classification of SD-OCT volumes...
  • LiY. et al.

    Dual-consistency semi-supervised learning with uncertainty quantification for COVID-19 lesion segmentation from CT images

  • Cited by (17)

    View all citing articles on Scopus
    View full text