RetCCL: Clustering-guided contrastive learning for whole-slide image retrieval
Graphical abstract
Introduction
In digital pathology, the glass slides are scanned into whole-slide images (WSIs) with high resolution and gigapixel size, which provide rich cell-level information and have been allowed for clinical diagnosis (Evans et al., 2018, Mukhopadhyay et al., 2018). However, visual inspection on the entire WSI is very labor-intensive and time-consuming. Computational pathology based on deep learning technologies has been emerged to facilitate the automation process of pathology diagnoses, such as classification of cancer types (Campanella et al., 2019, Lu et al., 2021, Xue et al., 2021), delineation of cancerous or nuclear regions (Kumar et al., 2017), survival prediction (Shao et al., 2020), image retrieval (Kalra et al., 2020a), etc. Benefiting from the increasing amount of WSIs, WSI retrieval has recently attracted growing attention (Chen et al., 2021, Kalra et al., 2020a, Kalra et al., 2020b), which can return a series of similar WSIs from a historically characterized database when given a WSI for a query. These retrieved WSIs with associated diagnosis information can help provide high interpretability, making it possible in clinical diagnosis, medical research, and trainee education. For example, WSI retrieval can improve diagnostic accuracy (especially for a rare case) by finding cases with similar morphological features, which may provide a possible virtual peer review to help build a computational consensus.
Content-based image retrieval (CBIR) algorithm is a potential solution for medical image retrieval which contains two stages: image feature extraction and similar image retrieval on a pre-built database (Hegde et al., 2019, Li et al., 2018). If the extracted features in the first stage cover the descriptive visual property of the image, similar image retrieval can be regarded as a nearest-neighbor finding problem, which indicates that a descriptive and robust data representation is the core task of the CBIR task (Kalra et al., 2020a, Tizhoosh et al., 2021).
However, for the content-based WSI retrieval (WSI-CBIR), the gigapixel size of WSI makes both the content feature extraction and interpretability of searching results challenging. (1) Effective feature extraction for semantic content in histopathological images is very challenging due to the enormous heterogeneity within WSIs and intra-/inter-class variations across WSIs. Moreover, WSI-level annotation usually targets a tiny proportion of tissues within the WSI (called a weak annotation). A pan-cancer and annotation-free feature extractor is urgently required to overcome these issues to extract robust feature representations. (2) For the WSI retrieval, it is more desirable to find WSIs in which there exist diagnosis-relevant regions/patches rather than retrieving WSIs with global similarity. Moreover, these target patches may occupy a tiny part of the gigapixel WSI. These characteristics make the task of WSI retrieval very challenging. A possible trick is to perform local patch-by-patch retrieval and then globally aggregate these patch retrieval results to return associated similar WSIs. However, due to the sheer size of WSIs and their unbalanced tissue type distribution, it is very challenging to develop a proper global aggregation algorithm.
Current histopathological image retrieval methods usually split WSIs into patches and perform the patch-level retrieval (Ma et al., 2016, Ma et al., 2018, Shi et al., 2017, Zhang et al., 2014, Zheng et al., 2017), which requires exhaustive annotation for these sub-regions and could not be flexibly expanded to WSI retrieval due to the lack of efficient patch aggregation methods. An early WSI retrieval method directly concatenated all the patch features as the global WSI embedding to find similar WSIs by the nearest neighbor searching. However, the overall WSI-level comparison approach equally treats tissue types and fails to focus on clinically relevant sub-regions within the WSI. Two recent studies have proposed suitable patch aggregation algorithms for WSI retrieval. The difference is that Yottixel (Kalra et al., 2020a, Kalra et al., 2020b) recognized WSIs through the “median-of-min” ranking approach, and FISH (Chen et al., 2021) developed a nearest neighbor approach based on the Van-Emde Boas-tree for the WSI retrieval. However, their features depend entirely or partly on the ImageNet data, which may result in suboptimal performance due to the domain difference between natural and pathological images. Thus, an effective in-domain feature extractor is urgently required to improve the feature extraction ability for histopathological images, ideally, in an unsupervised manner. Self-supervised learning (SSL) without manual annotation has become a promising method to improve the feature representation ability for the histopathological image analysis (Dehaene et al., 2020, Koohbanani et al., 2021, Li et al., 2021, Lu et al., 2019, Srinidhi et al., 2021). However, these methods have not trained on a large-scale and diverse domain-specific database. Meanwhile, their utilized standard contrastive learning methods (e.g., SimCLR (Li et al., 2021) and MoCo (Dehaene et al., 2020)) assume each sample is an individual instance. When applied to WSIs, it may cause serious bias due to the extremely unbalanced tissue type distribution and a large portion of similar tissues within/across WSIs. For histopathological images, negative pairs in the contrastive learning setting may be composed of highly related samples, which could confuse the network training process. In summary, for the broader application of WSI retrieval, there is a need for robust content feature extraction in an unsupervised manner and a global aggregation approach on the local patch retrieval results to find the most similar WSIs.
To overcome the above-mentioned problems, this work proposes a WSI retrieval framework (RetCCL) based on (1) clustering-guided contrastive learning (CCL) for feature extraction and (2) distinctive query patch selection, ranking for searched patches, and aggregation algorithm for interpretable WSI searching. In the first stage, we propose a CCL method to alleviate the effect of unfair assumption in traditional contrastive learning, where we use a subqueue-based weighted InfoNCE and a between-instance-based group-level InfoNCE to learn robust feature representations both at the instance-level and cluster-level. In the second stage, we represent the entire WSI by combining distinctive patches that are obtained by unsupervised feature-based and space-based clustering approaches. Due to the unbalanced tissue type distribution within WSIs, we perform a patch-by-patch retrieval instead of the entire WSI searching to retrieve diagnosis-relevant sub-regions/patches within similar WSIs. These retrieved patches are curated by our ranking and aggregation algorithm depending on the entropy-based uncertainty measurement and cosine-similarity-based constraint. The final retrieved patches are associated with their source WSIs to obtain the most similar WSIs. Additionally, we also show that RetCCL can perform patch retrieval to directly return a series of relevant sub-regions when pathologists provide a sub-region as a query.
The main contributions of our work can be summarized as follows:
- •
We propose a novel WSI retrieval algorithm called RetCCL, which includes a novel CCL-based feature extractor and a ranking and aggregation algorithm for WSI retrieval. It can also provide interpretable results by highlighting the diagnosis-relevant sub-regions within WSIs to explain the searching mechanism behind our WSI retrieval algorithm.
- •
Our CCL-based feature extractor is designed by integrating a subqueue-based weighted InfoNCE and a between-instance-based group-level InfoNCE into traditional contrastive learning to balance the ratio of positive/negative samples and map similar images closer.
- •
Our CCL pretraining is conducted using currently the largest histopathological image database (around 15 million patches cropped from more than 32,000 WSIs), covering diverse cell and tissue types, cancer diagnoses, and organs, which helps extract a pan-cancer feature extractor for WSI-CBIR.
- •
Benefiting from the above designs, RetCCL outperforms existing WSI retrieval methods by a large margin. Our CCL-based feature is also superior to the ImageNet pretrained feature or other SSL-based features, which is verified in the patch retrieval experiment. Our best-pretrained model has been released,1 which has the potential to be a new feature extractor for various histopathological image applications to replace the current widely used ImageNet pretrained model.
Section snippets
Related work
This section conducts a literature review for self-supervised representation learning and histopathological image retrieval considering their relevance to our work.
Methods
The overview of our proposed WSI retrieval framework (RetCCL) is presented in Fig. 1, which is implemented using a two-stage strategy, including the CCL-based feature extractor in Fig. 1A and the WSI retrieval process in Fig. 1B. The first stage introduces two loss functions (weighted InfoNCE and group-level InfoNCE) to help extract robust and universal features. The second stage is performed in two steps: (1) offline database construction for WSI retrieval and (2) online WSI query process. In
Experimental results and discussions
This section first introduces five datasets utilized for our CCL-based pretraining, histopathological image retrieval procedures, and downstream classification. Then, the experimental setups in the training process and evaluation metrics for the image retrieval and downstream classification are described in detail. The remaining parts cover a series of validation experiments presented in terms of patch-level retrieval, WSI-level retrieval, and downstream classification. Patch-level retrieval
Conclusion
This work proposes a histopathological image retrieval algorithm, which is applicable for both WSI-level and patch-level retrieval and can provide visually interpretable results for pathologists. Since a rich and descriptive feature is the key success factor in the image retrieval task, our work pays more attention to the design of the feature extractor. We developed a CCL-based backbone model, which is trained by integrating the multiple sub-memory banks and group-level discrimination together
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This research was in part funded by the National Natural Science Foundation of China (No. 61571314), Science & technology department of Sichuan Province (No. 2020YFG0081), and the Innovative Youth Projects of Ocean Remote Sensing Engineering Technology Research Center of State Oceanic Administration of China (No. 2015001). We also thank Dr. Jietian Jin from the Sun Yat-sen University Cancer Center and Dr. Siteng Chen from the Shanghai Jiao Tong University School of Medicine for their help in
References (54)
- et al.
Large-scale retrieval for medical image analytics: A comprehensive review
Med. Image Anal.
(2018) - et al.
Generating region proposals for histopathological whole slide image retrieval
Comput. Methods Programs Biomed.
(2018) - et al.
Supervised graph hashing for histopathology image retrieval and classification
Med. Image Anal.
(2017) - et al.
Searching images for consensus: Can AI remove observer variability in pathology?
Am. J. Pathol.
(2021) - et al.
Selective synthetic augmentation with HistoGAN for improved histopathology image classification
Med. Image Anal.
(2021) - et al.
Towards large-scale histopathological image analysis: Hashing-based image retrieval
IEEE Trans. Med. Imaging
(2014) - et al.
Content-based microscopic image retrieval system for multi-image queries
IEEE Trans. Inf. Technol. Biomed.
(2012) - Azizi, S., Mustafa, B., Ryan, F., Beaver, Z., Freyberg, J., Deaton, J., Loh, A., Karthikesalingam, A., Kornblith, S.,...
- et al.
UniToPatho, a labeled histopathological dataset for colorectal polyps classification and adenoma dysplasia grading
(2021) - et al.
Clinical-grade computational pathology using weakly supervised deep learning on whole slide images
Nat. Med.
(2019)
Improved baselines with momentum contrastive learning
Fast and scalable image search for histology
PanCancer insights from the cancer genome atlas: the pathologist’s perspective
J. Pathol.
Self-supervision closes the gap between weak and strong supervision in histology
US food and drug administration approval of whole slide imaging for primary diagnosis: a key milestone is reached and new questions are raised
Arch. Pathol. Lab. Med.
Unsupervised representation learning by predicting image rotations
Bootstrap your own latent: A new approach to self-supervised learning
Similar image search for histopathology: SMILY
Npj Digit. Med.
Tubular adenomas with minor villous changes show molecular features characteristic of tubulovillous adenomas
Am. J. Surg. Pathol.
Yottixel - An image search engine for large archives of histopathology whole slide images
Med. Image Anal.
Cited by (72)
Federated attention consistent learning models for prostate cancer diagnosis and Gleason grading
2024, Computational and Structural Biotechnology JournalAn effective colorectal polyp classification for histopathological images based on supervised contrastive learning
2024, Computers in Biology and MedicineAttention De-sparsification Matters: Inducing diversity in digital pathology representation learning
2024, Medical Image AnalysisAnatomical sites identification in both ordinary and capsule gastroduodenoscopy via deep learning
2024, Biomedical Signal Processing and ControlAssessing the performance of fully supervised and weakly supervised learning in breast cancer histopathology
2024, Expert Systems with Applications