Abstract
The rapid development of Artificial Intelligence (AI) technology accelerates the application of computational pathology in clinical decision-making. Due to the restriction of computing resources and annotation information, it is challenging for AI-based computational pathology methods to effectively process and analyze the gigapixel whole slide image (WSI). Conventional methods utilize multiple instance learning (MIL) to convert WSI into patches for classification. However, without the patch-level annotation, it is difficult to extract discriminative features, even with pre-trained networks. Furthermore, forcibly applying the patch-level conversion will break the pathological characteristics of WSI from the spatial structure. In this study, we present a two-stage framework named Compressed WSI Classification (CWC-Transformer) to effectively solve the problems of feature extraction and spatial information loss in WSI classification. In the compression stage, we adopt contrastive learning to present a feature compression method, which not only extracts the discriminative features but also decreases the data deviation caused by staining and scanning inconsistency. In the learning stage, we extend the advantages of the convolutional neural network and transformer mechanism to enhance the co-relations between local and global information to provide the final results jointly. Experiments on three large-scale public datasets of different tasks show that our proposed framework outperforms other advanced methods in terms of robustness and interpretation.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Organization WH, et al (2019) International agency for research on cancer
Ying X, Monticello TM (2006) Modern imaging technologies in toxicologic pathology: an overview. Toxicol. pathol. 34(7):815–826
Yang Y, Hu Y, Zhang X, Wang S (2021) Two-stage selective ensemble of CNN via deep tree training for medical image classification. IEEE Trans Cybern 52(9):9194–9207
Yang Y, Jiang J (2018) Adaptive bi-weighting toward automatic initialization and model selection for hmm-based hybrid meta-clustering ensembles. IEEE Trans cybern 49(5):1657–1668
Yang Y, Jiang J (2018) Bi-weighted ensemble via hmm-based approaches for temporal data clustering. Pattern Recogn 76:391–403
Madabhushi A, Lee G (2016) Image analysis and machine learning in digital pathology: challenges and opportunities. Med Image Anal 33:170–175
Ghaznavi F, Evans A, Madabhushi A, Feldman M (2013) Digital imaging in pathology: whole-slide imaging and beyond. Ann Rev Pathol: Mech Disease 8:331–359
Pu B, Li K, Li S, Zhu N (2021) Automatic fetal ultrasound standard plane recognition based on deep learning and IIoT. IEEE Trans. Ind. Inf. 17(11):7771–7780
Zhao L, Li K, Pu B, Chen J, Li S, Liao X (2022) An ultrasound standard plane detection model of fetal head based on multi-task learning and hybrid knowledge graph. Fut. Gen. Comput. Syst. 135:234–243
Chen J, Yang N, Zhou M, Zhang Z, Yang X (2022) A configurable deep learning framework for medical image analysis. Neural Comput. Appl. 34(10):7375–7392
Hou L, Samaras D, Kurc TM, Gao Y, Davis JE, Saltz JH (2016) Patch-based convolutional neural network for whole slide tissue image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2424–2433
Liu Y, Gadepalli K, Norouzi M, Dahl GE, Kohlberger T, Boyko A, Venugopalan S, Timofeev A, Nelson PQ, Corrado GS, et al (2017) Detecting cancer metastases on gigapixel pathology images. arXiv preprint arXiv:1703.02442
Zhou Z-H (2018) A brief introduction to weakly supervised learning. Natl Sci Rev 5(1):44–53
Chikontwe P, Kim M, Nam SJ, Go H, Park SH (2020) Multiple instance learning with center embeddings for histopathology classification. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 519–528
Hashimoto N, Fukushima D, Koga R, Takagi Y, Ko K, Kohno K, Nakaguro M, Nakamura S, Hontani H, Takeuchi I (2020) Multi-scale domain-adversarial multiple-instance CNN for cancer subtype classification with unannotated histopathological images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3852–3861
Ilse M, Tomczak J, Welling M (2018) Attention-based deep multiple instance learning. In: International conference on machine learning. PMLR, pp 2127–2136
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
Patil A, Talha M, Bhatia A, Kurian NC, Mangale S, Patel S, Sethi A (2021) Fast, self supervised, fully convolutional color normalization of h &e stained images. In: 2021 IEEE 18th international symposium on biomedical imaging (ISBI). IEEE, pp 1563–1567
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning. PMLR, pp 1597–1607
Lee SJ, Yun JP, Choi H, Kwon W, Koo G, Kim SW (2017) Weakly supervised learning with convolutional neural networks for power line localization. In: 2017 IEEE symposium series on computational intelligence (SSCI). IEEE, pp 1–8
Wang D, Khosla A, Gargeya R, Irshad H, Beck AH (2016) Deep learning for identifying metastatic breast cancer. arXiv preprint arXiv:1606.05718
Naik N, Madani A, Esteva A, Keskar NS, Press MF, Ruderman D, Agus DB, Socher R (2020) Deep learning-enabled breast cancer hormonal receptor status determination from base-level h &e stains. Nat Commun 11(1):1–8
Campanella G, Hanna MG, Geneslaw L, Miraflor A, Silva VWK, Busam KJ, Brogi E, Reuter VE, Klimstra DS, Fuchs TJ (2019) Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med 25(8):1301–1309
Feng J, Zhou Z-H (2017) Deep miml network. In: Proceedings of the AAAI conference on artificial intelligence, vol 31
Pinheiro PO, Collobert R (2015) From image-level to pixel-level labeling with convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1713–1721
Wang X, Chen H, Gan C, Lin H, Dou Q, Tsougenis E, Huang Q, Cai M, Heng P-A (2019) Weakly supervised deep learning for whole slide lung cancer image analysis. IEEE Trans Cybern 50(9):3950–3962
Huang Y, Chung AC-s (2018) Improving high resolution histology image classification with deep spatial fusion network. In: Computational pathology and ophthalmic medical image analysis. Springer, pp 19–26
Tellez D, Litjens G, van der Laak J, Ciompi F (2019) Neural image compression for gigapixel histopathology image analysis. IEEE Trans Pattern Anal Mach Intell 43(2):567–578
Tomita N, Abdollahi B, Wei J, Ren B, Suriawinata A, Hassanpour S (2019) Attention-based deep neural networks for detection of cancerous and precancerous esophagus tissue on histopathological slides. JAMA Network Open 2(11):1914645
Tellez D, Höppener D, Verhoef C, Grünhagen D, Nierop P, Drozdzal M, Laak J, Ciompi F (2020) Extending unsupervised neural image compression with supervised multitask learning. In: Medical imaging with deep learning. PMLR, pp 770–783
Koohbanani NA, Unnikrishnan B, Khurram SA, Krishnaswamy P, Rajpoot N (2021) Self-path: self-supervision for classification of pathology images with limited annotations. IEEE Trans Med Imaging 40(10):2845–2856
Pu B, Zhu N, Li K, Li S (2021) Fetal cardiac cycle detection in multi-resource echocardiograms using hybrid classification framework. Fut Gener Comput Syst 115:825–836
Pu B, Lu Y, Chen J, Li S, Zhu N, Wei W, Li K (2022) Mobileunet-fpn: a semantic segmentation model for fetal ultrasound four-chamber segmentation in edge computing environments. IEEE J Biomed Health Inform 26(11): 5540–5550
Hou L, Nguyen V, Kanevsky AB, Samaras D, Kurc TM, Zhao T, Gupta RR, Gao Y, Chen W, Foran D et al (2019) Sparse autoencoder for unsupervised nucleus detection and representation in histopathology images. Pattern Recogn 86:188–200
He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9729–9738
Yang Y, Jiang J (2015) Hybrid sampling-based clustering ensemble with global and local constitutions. IEEE Trans Neural Netw Learn Syst 27(5):952–965
Hu J, Shen L, Albanie S, Sun G, Vedaldi A (2018) Gather-excite: exploiting feature context in convolutional neural networks. In: Advances in neural information processing systems
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Liu S, Huang D, et al (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 385–400
Cireşan DC, Giusti A, Gambardella LM, Schmidhuber J (2013) Mitosis detection in breast cancer histology images with deep neural networks. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 411–418
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
Wu B, Xu C, Dai X, Wan A, Zhang P, Yan Z, Tomizuka M, Gonzalez J, Keutzer K, Vajda P (2020) Visual transformers: token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677
Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang Z-H, Tay FE, Feng J, Yan S (2021) Tokens-to-token VIT: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 558–567
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision. Springer, pp 213–229
Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159
Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH, et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp 6881–6890
Wang Y, Xu Z, Wang X, Shen C, Cheng B, Shen H, Xia H (2021) End-to-end video instance segmentation with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8741–8750
Chen M, Radford A, Child R, Wu J, Jun H, Luan D, Sutskever I (2020) Generative pretraining from pixels. In: International conference on machine learning. PMLR, pp 1691–1703
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth \(16 \times 16\) words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Srinivas A, Lin T-Y, Parmar N, Shlens J, Abbeel P, Vaswani A (2021) Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16519–16529
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning. PMLR, pp 10347–10357
Shao Z, Bian H, Chen Y, Wang Y, Zhang J, Ji X, et al (2021) Transmil: transformer based correlated multiple instance learning for whole slide image classification. In: Advances in neural information processing systems
Chen H, Li C, Wang G, Li X, Rahaman MM, Sun H, Hu W, Li Y, Liu W, Sun C et al (2022) Gashis-transformer: a multi-scale visual transformer approach for gastric histopathological image detection. Pattern Recogn 130:108827
Chen X, Fan H, Girshick R, He K (2020) Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297
Oord Avd, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748
Bejnordi BE, Veta M, Van Diest PJ, Van Ginneken B, Karssemeijer N, Litjens G, Van Der Laak JA, Hermsen M, Manson QF, Balkenhol M et al (2017) Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama 318(22):2199–2210
Li B, Li Y, Eliceiri KW (2021) Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14318–14328
You Y, Gitman I, Ginsburg B (2017) Large batch training of convolutional networks. arXiv preprint arXiv:1708.03888
Lu MY, Williamson DF, Chen TY, Chen RJ, Barbieri M, Mahmood F (2021) Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng 5(6):555–570
Acknowledgements
This work was supported by the Natural Science Foundation of China (Grant No. 61876166), Yunnan Provincial Major Science and Technology Special Plan Project (Grant No. 202002AD080001) and Yunnan Basic Research Program for Distinguished Young Youths Project (Grant No. 202101AV070003).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
No conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Y., Guo, J., Yang, Y. et al. CWC-transformer: a visual transformer approach for compressed whole slide image classification. Neural Comput & Applic 37, 7485–7497 (2025). https://doi.org/10.1007/s00521-022-07857-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07857-3