Elsevier

Medical Image Analysis

Volume 83, January 2023, 102645
Medical Image Analysis

RetCCL: Clustering-guided contrastive learning for whole-slide image retrieval

https://doi.org/10.1016/j.media.2022.102645Get rights and content

Highlights

  • A novel WSI retrieval algorithm called RetCCL is proposed.

  • RetCCL provides interpretable results by highlighting relevant sub-regions.

  • Our feature extractor (CCL) contains a weighted InfoNCE and a group-level InfoNCE.

  • CCL is pretrained using a large and diverse database of histopathological images.

  • CCL outperforms ImageNet-pretrained features or other SSL-based features.

  • RetCCL shows state-of-the-art WSI retrieval results.

Abstract

Benefiting from the large-scale archiving of digitized whole-slide images (WSIs), computer-aided diagnosis has been well developed to assist pathologists in decision-making. Content-based WSI retrieval can be a new approach to find highly correlated WSIs in a historically diagnosed WSI archive, which has the potential usages for assisted clinical diagnosis, medical research, and trainee education. During WSI retrieval, it is particularly challenging to encode the semantic content of histopathological images and to measure the similarity between images for interpretable results due to the gigapixel size of WSIs. In this work, we propose a Retrieval with Clustering-guided Contrastive Learning (RetCCL) framework for robust and accurate WSI-level image retrieval, which integrates a novel self-supervised feature learning method and a global ranking and aggregation algorithm for much improved performance. The proposed feature learning method makes use of existing large-scale unlabeled histopathological image data, which helps learn universal features that could be used directly for subsequent WSI retrieval tasks without extra fine-tuning. The proposed WSI retrieval method not only returns a set of WSIs similar to a query WSI, but also highlights patches or sub-regions of each WSI that share high similarity with patches of the query WSI, which helps pathologists interpret the searching results. Our WSI retrieval framework has been evaluated on the tasks of anatomical site retrieval and cancer subtype retrieval using over 22,000 slides, and the performance exceeds other state-of-the-art methods significantly (around 10% for the anatomic site retrieval in terms of average mMV@10). Besides, the patch retrieval using our learned feature representation offers a performance improvement of 24% on the TissueNet dataset in terms of mMV@5 compared with using ImageNet pre-trained features, which further demonstrates the effectiveness of the proposed CCL feature learning method.

Introduction

In digital pathology, the glass slides are scanned into whole-slide images (WSIs) with high resolution and gigapixel size, which provide rich cell-level information and have been allowed for clinical diagnosis (Evans et al., 2018, Mukhopadhyay et al., 2018). However, visual inspection on the entire WSI is very labor-intensive and time-consuming. Computational pathology based on deep learning technologies has been emerged to facilitate the automation process of pathology diagnoses, such as classification of cancer types (Campanella et al., 2019, Lu et al., 2021, Xue et al., 2021), delineation of cancerous or nuclear regions (Kumar et al., 2017), survival prediction (Shao et al., 2020), image retrieval (Kalra et al., 2020a), etc. Benefiting from the increasing amount of WSIs, WSI retrieval has recently attracted growing attention (Chen et al., 2021, Kalra et al., 2020a, Kalra et al., 2020b), which can return a series of similar WSIs from a historically characterized database when given a WSI for a query. These retrieved WSIs with associated diagnosis information can help provide high interpretability, making it possible in clinical diagnosis, medical research, and trainee education. For example, WSI retrieval can improve diagnostic accuracy (especially for a rare case) by finding cases with similar morphological features, which may provide a possible virtual peer review to help build a computational consensus.

Content-based image retrieval (CBIR) algorithm is a potential solution for medical image retrieval which contains two stages: image feature extraction and similar image retrieval on a pre-built database (Hegde et al., 2019, Li et al., 2018). If the extracted features in the first stage cover the descriptive visual property of the image, similar image retrieval can be regarded as a nearest-neighbor finding problem, which indicates that a descriptive and robust data representation is the core task of the CBIR task (Kalra et al., 2020a, Tizhoosh et al., 2021).

However, for the content-based WSI retrieval (WSI-CBIR), the gigapixel size of WSI makes both the content feature extraction and interpretability of searching results challenging. (1) Effective feature extraction for semantic content in histopathological images is very challenging due to the enormous heterogeneity within WSIs and intra-/inter-class variations across WSIs. Moreover, WSI-level annotation usually targets a tiny proportion of tissues within the WSI (called a weak annotation). A pan-cancer and annotation-free feature extractor is urgently required to overcome these issues to extract robust feature representations. (2) For the WSI retrieval, it is more desirable to find WSIs in which there exist diagnosis-relevant regions/patches rather than retrieving WSIs with global similarity. Moreover, these target patches may occupy a tiny part of the gigapixel WSI. These characteristics make the task of WSI retrieval very challenging. A possible trick is to perform local patch-by-patch retrieval and then globally aggregate these patch retrieval results to return associated similar WSIs. However, due to the sheer size of WSIs and their unbalanced tissue type distribution, it is very challenging to develop a proper global aggregation algorithm.

Current histopathological image retrieval methods usually split WSIs into patches and perform the patch-level retrieval (Ma et al., 2016, Ma et al., 2018, Shi et al., 2017, Zhang et al., 2014, Zheng et al., 2017), which requires exhaustive annotation for these sub-regions and could not be flexibly expanded to WSI retrieval due to the lack of efficient patch aggregation methods. An early WSI retrieval method directly concatenated all the patch features as the global WSI embedding to find similar WSIs by the nearest neighbor searching. However, the overall WSI-level comparison approach equally treats tissue types and fails to focus on clinically relevant sub-regions within the WSI. Two recent studies have proposed suitable patch aggregation algorithms for WSI retrieval. The difference is that Yottixel (Kalra et al., 2020a, Kalra et al., 2020b) recognized WSIs through the “median-of-min” ranking approach, and FISH (Chen et al., 2021) developed a nearest neighbor approach based on the Van-Emde Boas-tree for the WSI retrieval. However, their features depend entirely or partly on the ImageNet data, which may result in suboptimal performance due to the domain difference between natural and pathological images. Thus, an effective in-domain feature extractor is urgently required to improve the feature extraction ability for histopathological images, ideally, in an unsupervised manner. Self-supervised learning (SSL) without manual annotation has become a promising method to improve the feature representation ability for the histopathological image analysis (Dehaene et al., 2020, Koohbanani et al., 2021, Li et al., 2021, Lu et al., 2019, Srinidhi et al., 2021). However, these methods have not trained on a large-scale and diverse domain-specific database. Meanwhile, their utilized standard contrastive learning methods (e.g., SimCLR (Li et al., 2021) and MoCo (Dehaene et al., 2020)) assume each sample is an individual instance. When applied to WSIs, it may cause serious bias due to the extremely unbalanced tissue type distribution and a large portion of similar tissues within/across WSIs. For histopathological images, negative pairs in the contrastive learning setting may be composed of highly related samples, which could confuse the network training process. In summary, for the broader application of WSI retrieval, there is a need for robust content feature extraction in an unsupervised manner and a global aggregation approach on the local patch retrieval results to find the most similar WSIs.

To overcome the above-mentioned problems, this work proposes a WSI retrieval framework (RetCCL) based on (1) clustering-guided contrastive learning (CCL) for feature extraction and (2) distinctive query patch selection, ranking for searched patches, and aggregation algorithm for interpretable WSI searching. In the first stage, we propose a CCL method to alleviate the effect of unfair assumption in traditional contrastive learning, where we use a subqueue-based weighted InfoNCE and a between-instance-based group-level InfoNCE to learn robust feature representations both at the instance-level and cluster-level. In the second stage, we represent the entire WSI by combining distinctive patches that are obtained by unsupervised feature-based and space-based clustering approaches. Due to the unbalanced tissue type distribution within WSIs, we perform a patch-by-patch retrieval instead of the entire WSI searching to retrieve diagnosis-relevant sub-regions/patches within similar WSIs. These retrieved patches are curated by our ranking and aggregation algorithm depending on the entropy-based uncertainty measurement and cosine-similarity-based constraint. The final retrieved patches are associated with their source WSIs to obtain the most similar WSIs. Additionally, we also show that RetCCL can perform patch retrieval to directly return a series of relevant sub-regions when pathologists provide a sub-region as a query.

The main contributions of our work can be summarized as follows:

  • We propose a novel WSI retrieval algorithm called RetCCL, which includes a novel CCL-based feature extractor and a ranking and aggregation algorithm for WSI retrieval. It can also provide interpretable results by highlighting the diagnosis-relevant sub-regions within WSIs to explain the searching mechanism behind our WSI retrieval algorithm.

  • Our CCL-based feature extractor is designed by integrating a subqueue-based weighted InfoNCE and a between-instance-based group-level InfoNCE into traditional contrastive learning to balance the ratio of positive/negative samples and map similar images closer.

  • Our CCL pretraining is conducted using currently the largest histopathological image database (around 15 million patches cropped from more than 32,000 WSIs), covering diverse cell and tissue types, cancer diagnoses, and organs, which helps extract a pan-cancer feature extractor for WSI-CBIR.

  • Benefiting from the above designs, RetCCL outperforms existing WSI retrieval methods by a large margin. Our CCL-based feature is also superior to the ImageNet pretrained feature or other SSL-based features, which is verified in the patch retrieval experiment. Our best-pretrained model has been released,1 which has the potential to be a new feature extractor for various histopathological image applications to replace the current widely used ImageNet pretrained model.

Section snippets

Related work

This section conducts a literature review for self-supervised representation learning and histopathological image retrieval considering their relevance to our work.

Methods

The overview of our proposed WSI retrieval framework (RetCCL) is presented in Fig. 1, which is implemented using a two-stage strategy, including the CCL-based feature extractor in Fig. 1A and the WSI retrieval process in Fig. 1B. The first stage introduces two loss functions (weighted InfoNCE and group-level InfoNCE) to help extract robust and universal features. The second stage is performed in two steps: (1) offline database construction for WSI retrieval and (2) online WSI query process. In

Experimental results and discussions

This section first introduces five datasets utilized for our CCL-based pretraining, histopathological image retrieval procedures, and downstream classification. Then, the experimental setups in the training process and evaluation metrics for the image retrieval and downstream classification are described in detail. The remaining parts cover a series of validation experiments presented in terms of patch-level retrieval, WSI-level retrieval, and downstream classification. Patch-level retrieval

Conclusion

This work proposes a histopathological image retrieval algorithm, which is applicable for both WSI-level and patch-level retrieval and can provide visually interpretable results for pathologists. Since a rich and descriptive feature is the key success factor in the image retrieval task, our work pays more attention to the design of the feature extractor. We developed a CCL-based backbone model, which is trained by integrating the multiple sub-memory banks and group-level discrimination together

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This research was in part funded by the National Natural Science Foundation of China (No. 61571314), Science & technology department of Sichuan Province (No. 2020YFG0081), and the Innovative Youth Projects of Ocean Remote Sensing Engineering Technology Research Center of State Oceanic Administration of China (No. 2015001). We also thank Dr. Jietian Jin from the Sun Yat-sen University Cancer Center and Dr. Siteng Chen from the Shanghai Jiao Tong University School of Medicine for their help in

References (54)

  • Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A., 2020. Unsupervised Learning of Visual Features...
  • ChenX. et al.

    Improved baselines with momentum contrastive learning

    (2020)
  • Chen, X., He, K., 2020. Exploring Simple Siamese Representation Learning. In: CVPR. pp....
  • Chen, T., Kornblith, S., Norouzi, M., Hinton, G., 2020b. A Simple Framework for Contrastive Learning of Visual...
  • ChenC. et al.

    Fast and scalable image search for histology

    (2021)
  • CooperL.A.D. et al.

    PanCancer insights from the cancer genome atlas: the pathologist’s perspective

    J. Pathol.

    (2018)
  • DehaeneO. et al.

    Self-supervision closes the gap between weak and strong supervision in histology

    (2020)
  • Doersch, C., Gupta, A., Efros, A.A., 2015. Unsupervised Visual Representation Learning by Context Prediction. In: ICCV....
  • EvansA.J. et al.

    US food and drug administration approval of whole slide imaging for primary diagnosis: a key milestone is reached and new questions are raised

    Arch. Pathol. Lab. Med.

    (2018)
  • Foster, A., Pukdee, R., Rainforth, T., 2021. Improving Transformation Invariance in Contrastive Representation...
  • GidarisS. et al.

    Unsupervised representation learning by predicting image rotations

    (2018)
  • GrillJ.-B. et al.

    Bootstrap your own latent: A new approach to self-supervised learning

  • He, K., Fan, H., Wu, Y., Xie, S., Girshick, R., 2020. Momentum Contrast for Unsupervised Visual Representation...
  • He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: CVPR. pp....
  • HegdeN. et al.

    Similar image search for histopathology: SMILY

    Npj Digit. Med.

    (2019)
  • IshiiT. et al.

    Tubular adenomas with minor villous changes show molecular features characteristic of tubulovillous adenomas

    Am. J. Surg. Pathol.

    (2011)
  • KalraS. et al.

    Yottixel - An image search engine for large archives of histopathology whole slide images

    Med. Image Anal.

    (2020)
  • Cited by (72)

    View all citing articles on Scopus
    View full text