RetCCL: Clustering-guided contrastive learning for whole-slide image retrieval

doi:10.1016/j.media.2022.102645

Medical Image Analysis

Volume 83, January 2023, 102645

https://doi.org/10.1016/j.media.2022.102645 Get rights and content

Highlights

•
A novel WSI retrieval algorithm called RetCCL is proposed.
•
RetCCL provides interpretable results by highlighting relevant sub-regions.
•
Our feature extractor (CCL) contains a weighted InfoNCE and a group-level InfoNCE.
•
CCL is pretrained using a large and diverse database of histopathological images.
•
CCL outperforms ImageNet-pretrained features or other SSL-based features.
•
RetCCL shows state-of-the-art WSI retrieval results.

Abstract

Benefiting from the large-scale archiving of digitized whole-slide images (WSIs), computer-aided diagnosis has been well developed to assist pathologists in decision-making. Content-based WSI retrieval can be a new approach to find highly correlated WSIs in a historically diagnosed WSI archive, which has the potential usages for assisted clinical diagnosis, medical research, and trainee education. During WSI retrieval, it is particularly challenging to encode the semantic content of histopathological images and to measure the similarity between images for interpretable results due to the gigapixel size of WSIs. In this work, we propose a Retrieval with Clustering-guided Contrastive Learning (RetCCL) framework for robust and accurate WSI-level image retrieval, which integrates a novel self-supervised feature learning method and a global ranking and aggregation algorithm for much improved performance. The proposed feature learning method makes use of existing large-scale unlabeled histopathological image data, which helps learn universal features that could be used directly for subsequent WSI retrieval tasks without extra fine-tuning. The proposed WSI retrieval method not only returns a set of WSIs similar to a query WSI, but also highlights patches or sub-regions of each WSI that share high similarity with patches of the query WSI, which helps pathologists interpret the searching results. Our WSI retrieval framework has been evaluated on the tasks of anatomical site retrieval and cancer subtype retrieval using over 22,000 slides, and the performance exceeds other state-of-the-art methods significantly (around 10% for the anatomic site retrieval in terms of average $m M V @ 10$ ). Besides, the patch retrieval using our learned feature representation offers a performance improvement of 24% on the TissueNet dataset in terms of $m M V @ 5$ compared with using ImageNet pre-trained features, which further demonstrates the effectiveness of the proposed CCL feature learning method.

Graphical abstract

Introduction

In digital pathology, the glass slides are scanned into whole-slide images (WSIs) with high resolution and gigapixel size, which provide rich cell-level information and have been allowed for clinical diagnosis (Evans et al., 2018, Mukhopadhyay et al., 2018). However, visual inspection on the entire WSI is very labor-intensive and time-consuming. Computational pathology based on deep learning technologies has been emerged to facilitate the automation process of pathology diagnoses, such as classification of cancer types (Campanella et al., 2019, Lu et al., 2021, Xue et al., 2021), delineation of cancerous or nuclear regions (Kumar et al., 2017), survival prediction (Shao et al., 2020), image retrieval (Kalra et al., 2020a), etc. Benefiting from the increasing amount of WSIs, WSI retrieval has recently attracted growing attention (Chen et al., 2021, Kalra et al., 2020a, Kalra et al., 2020b), which can return a series of similar WSIs from a historically characterized database when given a WSI for a query. These retrieved WSIs with associated diagnosis information can help provide high interpretability, making it possible in clinical diagnosis, medical research, and trainee education. For example, WSI retrieval can improve diagnostic accuracy (especially for a rare case) by finding cases with similar morphological features, which may provide a possible virtual peer review to help build a computational consensus.

Content-based image retrieval (CBIR) algorithm is a potential solution for medical image retrieval which contains two stages: image feature extraction and similar image retrieval on a pre-built database (Hegde et al., 2019, Li et al., 2018). If the extracted features in the first stage cover the descriptive visual property of the image, similar image retrieval can be regarded as a nearest-neighbor finding problem, which indicates that a descriptive and robust data representation is the core task of the CBIR task (Kalra et al., 2020a, Tizhoosh et al., 2021).

However, for the content-based WSI retrieval (WSI-CBIR), the gigapixel size of WSI makes both the content feature extraction and interpretability of searching results challenging. (1) Effective feature extraction for semantic content in histopathological images is very challenging due to the enormous heterogeneity within WSIs and intra-/inter-class variations across WSIs. Moreover, WSI-level annotation usually targets a tiny proportion of tissues within the WSI (called a weak annotation). A pan-cancer and annotation-free feature extractor is urgently required to overcome these issues to extract robust feature representations. (2) For the WSI retrieval, it is more desirable to find WSIs in which there exist diagnosis-relevant regions/patches rather than retrieving WSIs with global similarity. Moreover, these target patches may occupy a tiny part of the gigapixel WSI. These characteristics make the task of WSI retrieval very challenging. A possible trick is to perform local patch-by-patch retrieval and then globally aggregate these patch retrieval results to return associated similar WSIs. However, due to the sheer size of WSIs and their unbalanced tissue type distribution, it is very challenging to develop a proper global aggregation algorithm.

Current histopathological image retrieval methods usually split WSIs into patches and perform the patch-level retrieval (Ma et al., 2016, Ma et al., 2018, Shi et al., 2017, Zhang et al., 2014, Zheng et al., 2017), which requires exhaustive annotation for these sub-regions and could not be flexibly expanded to WSI retrieval due to the lack of efficient patch aggregation methods. An early WSI retrieval method directly concatenated all the patch features as the global WSI embedding to find similar WSIs by the nearest neighbor searching. However, the overall WSI-level comparison approach equally treats tissue types and fails to focus on clinically relevant sub-regions within the WSI. Two recent studies have proposed suitable patch aggregation algorithms for WSI retrieval. The difference is that Yottixel (Kalra et al., 2020a, Kalra et al., 2020b) recognized WSIs through the “median-of-min” ranking approach, and FISH (Chen et al., 2021) developed a nearest neighbor approach based on the Van-Emde Boas-tree for the WSI retrieval. However, their features depend entirely or partly on the ImageNet data, which may result in suboptimal performance due to the domain difference between natural and pathological images. Thus, an effective in-domain feature extractor is urgently required to improve the feature extraction ability for histopathological images, ideally, in an unsupervised manner. Self-supervised learning (SSL) without manual annotation has become a promising method to improve the feature representation ability for the histopathological image analysis (Dehaene et al., 2020, Koohbanani et al., 2021, Li et al., 2021, Lu et al., 2019, Srinidhi et al., 2021). However, these methods have not trained on a large-scale and diverse domain-specific database. Meanwhile, their utilized standard contrastive learning methods (e.g., SimCLR (Li et al., 2021) and MoCo (Dehaene et al., 2020)) assume each sample is an individual instance. When applied to WSIs, it may cause serious bias due to the extremely unbalanced tissue type distribution and a large portion of similar tissues within/across WSIs. For histopathological images, negative pairs in the contrastive learning setting may be composed of highly related samples, which could confuse the network training process. In summary, for the broader application of WSI retrieval, there is a need for robust content feature extraction in an unsupervised manner and a global aggregation approach on the local patch retrieval results to find the most similar WSIs.

To overcome the above-mentioned problems, this work proposes a WSI retrieval framework (RetCCL) based on (1) clustering-guided contrastive learning (CCL) for feature extraction and (2) distinctive query patch selection, ranking for searched patches, and aggregation algorithm for interpretable WSI searching. In the first stage, we propose a CCL method to alleviate the effect of unfair assumption in traditional contrastive learning, where we use a subqueue-based weighted InfoNCE and a between-instance-based group-level InfoNCE to learn robust feature representations both at the instance-level and cluster-level. In the second stage, we represent the entire WSI by combining distinctive patches that are obtained by unsupervised feature-based and space-based clustering approaches. Due to the unbalanced tissue type distribution within WSIs, we perform a patch-by-patch retrieval instead of the entire WSI searching to retrieve diagnosis-relevant sub-regions/patches within similar WSIs. These retrieved patches are curated by our ranking and aggregation algorithm depending on the entropy-based uncertainty measurement and cosine-similarity-based constraint. The final retrieved patches are associated with their source WSIs to obtain the most similar WSIs. Additionally, we also show that RetCCL can perform patch retrieval to directly return a series of relevant sub-regions when pathologists provide a sub-region as a query.

The main contributions of our work can be summarized as follows:

•
We propose a novel WSI retrieval algorithm called RetCCL, which includes a novel CCL-based feature extractor and a ranking and aggregation algorithm for WSI retrieval. It can also provide interpretable results by highlighting the diagnosis-relevant sub-regions within WSIs to explain the searching mechanism behind our WSI retrieval algorithm.
•
Our CCL-based feature extractor is designed by integrating a subqueue-based weighted InfoNCE and a between-instance-based group-level InfoNCE into traditional contrastive learning to balance the ratio of positive/negative samples and map similar images closer.
•
Our CCL pretraining is conducted using currently the largest histopathological image database (around 15 million patches cropped from more than 32,000 WSIs), covering diverse cell and tissue types, cancer diagnoses, and organs, which helps extract a pan-cancer feature extractor for WSI-CBIR.
•
Benefiting from the above designs, RetCCL outperforms existing WSI retrieval methods by a large margin. Our CCL-based feature is also superior to the ImageNet pretrained feature or other SSL-based features, which is verified in the patch retrieval experiment. Our best-pretrained model has been released,¹ which has the potential to be a new feature extractor for various histopathological image applications to replace the current widely used ImageNet pretrained model.

Section snippets

Related work

This section conducts a literature review for self-supervised representation learning and histopathological image retrieval considering their relevance to our work.

Methods

The overview of our proposed WSI retrieval framework (RetCCL) is presented in Fig. 1, which is implemented using a two-stage strategy, including the CCL-based feature extractor in Fig. 1A and the WSI retrieval process in Fig. 1B. The first stage introduces two loss functions (weighted InfoNCE and group-level InfoNCE) to help extract robust and universal features. The second stage is performed in two steps: (1) offline database construction for WSI retrieval and (2) online WSI query process. In

Experimental results and discussions

This section first introduces five datasets utilized for our CCL-based pretraining, histopathological image retrieval procedures, and downstream classification. Then, the experimental setups in the training process and evaluation metrics for the image retrieval and downstream classification are described in detail. The remaining parts cover a series of validation experiments presented in terms of patch-level retrieval, WSI-level retrieval, and downstream classification. Patch-level retrieval

Conclusion

This work proposes a histopathological image retrieval algorithm, which is applicable for both WSI-level and patch-level retrieval and can provide visually interpretable results for pathologists. Since a rich and descriptive feature is the key success factor in the image retrieval task, our work pays more attention to the design of the feature extractor. We developed a CCL-based backbone model, which is trained by integrating the multiple sub-memory banks and group-level discrimination together

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This research was in part funded by the National Natural Science Foundation of China (No. 61571314), Science & technology department of Sichuan Province (No. 2020YFG0081), and the Innovative Youth Projects of Ocean Remote Sensing Engineering Technology Research Center of State Oceanic Administration of China (No. 2015001). We also thank Dr. Jietian Jin from the Sun Yat-sen University Cancer Center and Dr. Siteng Chen from the Shanghai Jiao Tong University School of Medicine for their help in

References (54)

LiZ. et al.
Large-scale retrieval for medical image analytics: A comprehensive review
Med. Image Anal.
(2018)
MaY. et al.
Generating region proposals for histopathological whole slide image retrieval
Comput. Methods Programs Biomed.
(2018)
ShiX. et al.
Supervised graph hashing for histopathology image retrieval and classification
Med. Image Anal.
(2017)
TizhooshH.R. et al.
Searching images for consensus: Can AI remove observer variability in pathology?
Am. J. Pathol.
(2021)
XueY. et al.
Selective synthetic augmentation with HistoGAN for improved histopathology image classification
Med. Image Anal.
(2021)
ZhangX. et al.
Towards large-scale histopathological image analysis: Hashing-based image retrieval
IEEE Trans. Med. Imaging
(2014)
AkakinH.C. et al.
Content-based microscopic image retrieval system for multi-image queries
IEEE Trans. Inf. Technol. Biomed.
(2012)
Azizi, S., Mustafa, B., Ryan, F., Beaver, Z., Freyberg, J., Deaton, J., Loh, A., Karthikesalingam, A., Kornblith, S.,...
BarbanoC.A. et al.
UniToPatho, a labeled histopathological dataset for colorectal polyps classification and adenoma dysplasia grading
(2021)
CampanellaG. et al.
Clinical-grade computational pathology using weakly supervised deep learning on whole slide images
Nat. Med.
(2019)

Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A., 2020. Unsupervised Learning of Visual Features...

ChenX. et al.

Improved baselines with momentum contrastive learning

(2020)

Chen, X., He, K., 2020. Exploring Simple Siamese Representation Learning. In: CVPR. pp....

Chen, T., Kornblith, S., Norouzi, M., Hinton, G., 2020b. A Simple Framework for Contrastive Learning of Visual...

ChenC. et al.

Fast and scalable image search for histology

(2021)

CooperL.A.D. et al.

PanCancer insights from the cancer genome atlas: the pathologist’s perspective

J. Pathol.

(2018)

DehaeneO. et al.

Self-supervision closes the gap between weak and strong supervision in histology

(2020)

Doersch, C., Gupta, A., Efros, A.A., 2015. Unsupervised Visual Representation Learning by Context Prediction. In: ICCV....

EvansA.J. et al.

US food and drug administration approval of whole slide imaging for primary diagnosis: a key milestone is reached and new questions are raised

Arch. Pathol. Lab. Med.

(2018)

Foster, A., Pukdee, R., Rainforth, T., 2021. Improving Transformation Invariance in Contrastive Representation...

GidarisS. et al.

Unsupervised representation learning by predicting image rotations

(2018)

GrillJ.-B. et al.

Bootstrap your own latent: A new approach to self-supervised learning

He, K., Fan, H., Wu, Y., Xie, S., Girshick, R., 2020. Momentum Contrast for Unsupervised Visual Representation...

He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: CVPR. pp....

HegdeN. et al.

Similar image search for histopathology: SMILY

Npj Digit. Med.

(2019)

IshiiT. et al.

Tubular adenomas with minor villous changes show molecular features characteristic of tubulovillous adenomas

Am. J. Surg. Pathol.

(2011)

KalraS. et al.

Yottixel - An image search engine for large archives of histopathology whole slide images

Med. Image Anal.

(2020)

Cited by (72)

Federated attention consistent learning models for prostate cancer diagnosis and Gleason grading
2024, Computational and Structural Biotechnology Journal
Artificial intelligence (AI) holds significant promise in transforming medical imaging, enhancing diagnostics, and refining treatment strategies. However, the reliance on extensive multicenter datasets for training AI models poses challenges due to privacy concerns. Federated learning provides a solution by facilitating collaborative model training across multiple centers without sharing raw data. This study introduces a federated attention-consistent learning (FACL) framework to address challenges associated with large-scale pathological images and data heterogeneity. FACL enhances model generalization by maximizing attention consistency between local clients and the server model. To ensure privacy and validate robustness, we incorporated differential privacy by introducing noise during parameter transfer. We assessed the effectiveness of FACL in cancer diagnosis and Gleason grading tasks using 19,461 whole-slide images of prostate cancer from multiple centers. In the diagnosis task, FACL achieved an area under the curve (AUC) of 0.9718, outperforming seven centers with an average AUC of 0.9499 when categories are relatively balanced. For the Gleason grading task, FACL attained a Kappa score of 0.8463, surpassing the average Kappa score of 0.7379 from six centers. In conclusion, FACL offers a robust, accurate, and cost-effective AI training model for prostate cancer pathology while maintaining effective data safeguards.
An effective colorectal polyp classification for histopathological images based on supervised contrastive learning
2024, Computers in Biology and Medicine
Early detection of colon adenomatous polyps is pivotal in reducing colon cancer risk. In this context, accurately distinguishing between adenomatous polyp subtypes, especially tubular and tubulovillous, from hyperplastic variants is crucial. This study introduces a cutting-edge computer-aided diagnosis system optimized for this task. Our system employs advanced Supervised Contrastive learning to ensure precise classification of colon histopathology images. Significantly, we have integrated the Big Transfer model, which has gained prominence for its exemplary adaptability to visual tasks in medical imaging. Our novel approach discerns between in-class and out-of-class images, thereby elevating its discriminatory power for polyp subtypes. We validated our system using two datasets: a specially curated one and the publicly accessible UniToPatho dataset. The results reveal that our model markedly surpasses traditional deep convolutional neural networks, registering classification accuracies of 87.1% and 70.3% for the custom and UniToPatho datasets, respectively. Such results emphasize the transformative potential of our model in polyp classification endeavors.
Attention De-sparsification Matters: Inducing diversity in digital pathology representation learning
2024, Medical Image Analysis
We propose DiRL, a Diversity-inducing Representation Learning technique for histopathology imaging. Self-supervised learning (SSL) techniques, such as contrastive and non-contrastive approaches, have been shown to learn rich and effective representations of digitized tissue samples with limited pathologist supervision. Our analysis of vanilla SSL-pretrained models’ attention distribution reveals an insightful observation: sparsity in attention, i.e, models tends to localize most of their attention to some prominent patterns in the image. Although attention sparsity can be beneficial in natural images due to these prominent patterns being the object of interest itself, this can be sub-optimal in digital pathology; this is because, unlike natural images, digital pathology scans are not object-centric, but rather a complex phenotype of various spatially intermixed biological components. Inadequate diversification of attention in these complex images could result in crucial information loss. To address this, we leverage cell segmentation to densely extract multiple histopathology-specific representations, and then propose a prior-guided dense pretext task, designed to match the multiple corresponding representations between the views. Through this, the model learns to attend to various components more closely and evenly, thus inducing adequate diversification in attention for capturing context-rich representations. Through quantitative and qualitative analysis on multiple tasks across cancer types, we demonstrate the efficacy of our method and observe that the attention is more globally distributed.
Anatomical sites identification in both ordinary and capsule gastroduodenoscopy via deep learning
2024, Biomedical Signal Processing and Control
Anatomical sites recognition is a basic requirement for gastroenterologists. But there is not a unified framework for anatomical sites identification in both ordinary and capsule endoscopes. Deep learning (DL), especially vision transformer (ViT), is promising in the field of medical imaging, however, the performance of them is not comprehensively compared. The retrospective cohort study included 322 patients who visited Friendship hospital for capsule endoscopy and 556 patients who visited Minhang hospital for ordinary endoscopy. Convolutional neural network (CNN) and two types of ViT (B/16 and L/32) that was trained to identify qualified and low-quality images (the first model), then the qualified images were used to train the second model for distinguishing different anatomical sites. 62,850 images from capsule endoscopy and 17,434 images from ordinary endoscopy were included in developing models. In internal cross-validation, CNN achieved average area under receiver operating characteristic curve (AUROC) of 0.9844 (95 % confidence interval [CI] 0.9640–0.9960) in distinguishing qualified and low-quality images, and 0.9251 average accuracy (95 % CI 0.9133–0.9369) in distinguishing different anatomical sites. Besides, the performance of ViT did not surpass the performance of CNN. 18,636 images from 355 patients who received capsule endoscopy and 15,949 images of 501 patients who received ordinary endoscopy were prospectively collected. The AUROC of CNN reached 0.8715 (95 % CI 0.8674–0.8754) in the first model, and 0.8376 accuracy (95 % CI 0.8336–0.8414) for the second model, respectively. The performance of CNN is better than ViT with the same hyperparameter setting for sifting out the unqualified images and distinguishing anatomical sites effectively.
Prediction of prognosis and treatment response in ovarian cancer patients from histopathology images using graph deep learning: a multicenter retrospective study
2024, European Journal of Cancer
Ovarian cancer (OV) is a prevalent and deadly disease with high mortality rates. The development of accurate prognostic tools and personalized therapeutic strategies is crucial for improving patient outcomes.
A graph-based deep learning model, the Ovarian Cancer Digital Pathology Index (OCDPI), was introduced to predict prognosis and response to adjuvant therapy using hematoxylin and eosin (H&E)-stained whole-slide images (WSIs). The OCDPI was developed using formalin-fixed, paraffin-embedded (FFPE) WSIs from the TCGA-OV cohort, and was externally validated in two independent cohorts from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) and Harbin Medical University Cancer Hospital (HMUCH).
The OCDPI showed prognostic ability for overall survival prediction in the PLCO (HR, 1.916; 95% CI, 1.380–2.660; log-rank test, P < 0.001) and HMUCH (HR, 2.796; 95% CI, 1.404–5.568; log-rank test, P = 0.0022) cohorts. Patients with low OCDPI experienced better survival benefits and lower recurrence rates following adjuvant therapy compared to those with high OCDPI. Multivariable analyses, adjusting for clinicopathological factors, consistently identified OCDPI as an independent prognostic factor across all cohorts (all P < 0.05). Furthermore, OCDPI performed well in patients with low-grade tumors or fresh-frozen slides, and could differentiate between HRD-deficient or HRD-intact patients with and without sensitivity to adjuvant therapy.
The results from this multicenter cohort study indicate that the OCDPI may serve as a valuable and labor-saving tool to improve prognostic and predictive clinical decision-making in patients with OV.
Assessing the performance of fully supervised and weakly supervised learning in breast cancer histopathology
2024, Expert Systems with Applications
Fully supervised learning (FSL) and weakly supervised learning based on multiple instance learning (WSLMIL) have become two mainstream paradigms for performing computer-aided pathological diagnosis (CAPD). It is well known that the high-intensity annotation burden of FSL and the performance degradation due to poor training constraints of WSLMIL are stumbling blocks for clinical translation. Even more unfortunate is the lack of comprehensive experimental analysis to help researchers make content-specific trade-offs between FSL and WSLMIL. In this work, we systematically compare the performances of FSL and WSLMIL on lymph node metastasis in breast cancer using a publicly available dataset. By analyzing the results of 16 backbone networks in the FSL paradigm, we find that emerging networks based on transformer (PVTv2-B2) and multi-layer perceptron (CycleMLP-B3) are more advantageous for performing patch-level classification task than convolution-based structure (ResNet50); combining their output with morphological feature extraction can be better used to universally perform slide-level classification task. However, the slight improvement brought by the evolution of the backbone network may be overshadowed by the aggregation operation in 6 WSLMIL algorithms, whereas relying on the in-domain backbone network can achieve a stable and excellent prediction performance in both quantitative analysis and interpretability comparisons. All the experimental results ultimately illustrate that the combination of in-domain backbone network and emergent aggregation operation becomes an economical and efficient technical tool for CAPD, which can be regarded as a compromise between FSL and WSLMIL.

View all citing articles on Scopus

View full text

RetCCL: Clustering-guided contrastive learning for whole-slide image retrieval

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Related work

Methods

Experimental results and discussions

Conclusion

Declaration of Competing Interest

Acknowledgment

Med. Image Anal.

Comput. Methods Programs Biomed.

Med. Image Anal.

Am. J. Pathol.

Med. Image Anal.

IEEE Trans. Med. Imaging

Content-based microscopic image retrieval system for multi-image queries

IEEE Trans. Inf. Technol. Biomed.

UniToPatho, a labeled histopathological dataset for colorectal polyps classification and adenoma dysplasia grading

Clinical-grade computational pathology using weakly supervised deep learning on whole slide images

Nat. Med.

Improved baselines with momentum contrastive learning

Fast and scalable image search for histology

PanCancer insights from the cancer genome atlas: the pathologist’s perspective

J. Pathol.

Self-supervision closes the gap between weak and strong supervision in histology

US food and drug administration approval of whole slide imaging for primary diagnosis: a key milestone is reached and new questions are raised

Arch. Pathol. Lab. Med.

Unsupervised representation learning by predicting image rotations

Bootstrap your own latent: A new approach to self-supervised learning

Similar image search for histopathology: SMILY

Npj Digit. Med.

Tubular adenomas with minor villous changes show molecular features characteristic of tubulovillous adenomas

Am. J. Surg. Pathol.

Yottixel - An image search engine for large archives of histopathology whole slide images

Med. Image Anal.