Boosting Vision-Language Models for Histopathology Classification: Predict All at Once

Zanella, Maxime; Shakeri, Fereshteh; Huang, Yunshi; Bahig, Houda; Ayed, Ismail Ben

doi:10.1007/978-3-031-73471-7_16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15184))

Included in the following conference series:

International Workshop on Foundation Models for General Medical AI

264 Accesses

Abstract

The development of vision-language models (VLMs) for histo-pathology has shown promising new usages and zero-shot performances. However, current approaches, which decompose large slides into smaller patches, focus solely on inductive classification, i.e., prediction for each patch is made independently of the other patches in the target test data. We extend the capability of these large models by introducing a transductive approach. By using text-based predictions and affinity relationships among patches, our approach leverages the strong zero-shot capabilities of these new VLMs without any additional labels. Our experiments cover four histopathology datasets and five different VLMs. Operating solely in the embedding space (i.e., in a black-box setting), our approach is highly efficient, processing $10^5$ patches in just a few seconds, and shows significant accuracy improvements over inductive zero-shot classification. Code available at https://github.com/FereshteShakeri/Histo-TransCLIP.

M. Zanella and F. Shakeri—are Equally Contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Knowledge-Enhanced Visual-Language Pretraining for Computational Pathology

Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning

A visual-language foundation model for computational pathology

Article 19 March 2024

References

Bilgin, C., Demir, C., Nagi, C., Yener, B.: Cell-graph mining for breast tissue modeling and classification. In: 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 5311–5314. IEEE (2007)
Google Scholar
Borkowski, A.A., Bui, M.M., Thomas, L.B., Wilson, C.P., DeLand, L.A., Mastorides, S.M.: Lung and colon cancer histopathological image dataset (lc25000). arXiv preprint arXiv:1912.12142 (2019)
Boudiaf, M., Ziko, I., Rony, J., Dolz, J., Piantanida, P., Ben Ayed, I.: Information maximization for few-shot learning. Adv. Neural. Inf. Process. Syst. 33, 2445–2457 (2020)
Google Scholar
Chen, X., et al.: Recent advances and clinical applications of deep learning in medical image analysis. Med. Image Anal. 79 (2022)
Google Scholar
Dhillon, G.S., Chaudhari, P., Ravichandran, A., Soatto, S.: A baseline for few-shot image classification. In: International Conference on Learning Representations (2019)
Google Scholar
Hartsock, I., Rasool, G.: Vision-language models for medical report generation and visual question answering: A review. CoRR abs/2403.02469 (2024). https://doi.org/10.48550/ARXIV.2403.02469, https://doi.org/10.48550/arXiv.2403.02469
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T., Zou, J.: A visual-language foundation model for pathology image analysis using medical twitter. Nat. Med. 29, 1–10 (2023)
Article Google Scholar
Ikezogwo, W.O., et al.: Quilt-1m: One million image-text pairs for histopathology. arXiv preprint arXiv:2306.11207 (2023)
Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: International Conference on Machine Learning, pp. 4904–4916 (2021)
Google Scholar
Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 200–209 (1999)
Google Scholar
Kather, J.N., Halama, N., Marx, A.: 100,000 histological images of human colorectal cancer and healthy tissue. Zenodo10 5281 (2018)
Google Scholar
Komura, D., Ishikawa, S.: Machine learning methods for histopathological image analysis. Comput. Struct. Biotechnol. J. 16, 34–42 (2018)
Article Google Scholar
Kriegsmann, K., et al.: Deep learning for the detection of anatomical tissue structures and neoplasms of the skin on scanned histopathological tissue sections. Front. Oncol. 12, 1022967 (2022)
Google Scholar
Liu, J., Song, L., Qin, Y.: Prototype rectification for few-shot learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., (eds.) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol. 12346, pp. 741–756. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_43
Lu, M.Y., et al.: A visual-language foundation model for computational pathology. Nat. Med. 30, 863–874 (2024)
Google Scholar
Madabhushi, A.: Digital pathology image analysis: opportunities and challenges. Imaging Med. 1(1), 7 (2009)
Article Google Scholar
Martin, S., Huang, Y., Shakeri, F., Pesquet, J.C., Ben Ayed, I.: Transductive zero-shot and few-shot clip. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 28816–28826 (2024)
Google Scholar
Pantanowitz, L.: Digital images and the future of digital pathology. J. Pathol. Inform. 1 (2010)
Google Scholar
Petushi, S., Garcia, F.U., Haber, M.M., Katsinis, C., Tozeren, A.: Large-scale computations on histology images reveal grade-differentiating parameters for breast cancer. BMC Med. Imaging 6(1), 1–11 (2006)
Article Google Scholar
Qureshi, H., Sertel, O., Rajpoot, N., Wilson, R., Gurcan, M.: Adaptive discriminant wavelet packet transform and local binary patterns for meningioma subtype classification. In: Metaxas, D., Axel, L., Fichtinger, G., Székely, G. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2008. MICCAI 2008. Lecture Notes in Computer Science, vol. 5242, pp. 196–204. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85990-1_24
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763 (2021)
Google Scholar
Sadraoui, A., et al.: A transductive few-shot learning approach for classification of digital histopathological slides from liver cancer. In: IEEE International Symposium on Biomedical Imaging (ISBI) (2024)
Google Scholar
Silva-Rodriguez, J., Chakor, H., Kobbi, R., Dolz, J., Ayed, I.B.: A foundation language-image model of the retina (flair): Encoding expert knowledge in text supervision. ArXiv Preprint (2023)
Google Scholar
Silva-Rodríguez, J., Schmidt, A., Sales, M.A., Molina, R., Naranjo, V.: Proportion constrained weakly supervised histopathology image classification. Comput. Biol. Med. 147, 105714 (2022)
Google Scholar
Tabesh, A., et al.: Multifeature prostate cancer diagnosis and gleason grading of histological images. IEEE Trans. Med. Imaging 26(10), 1366–1378 (2007)
Google Scholar
Vapnik, V.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999). https://doi.org/10.1109/72.788640
Article Google Scholar
Wang, Z., Wu, Z., Agarwal, D., Sun, J.: Medclip: contrastive learning from unpaired medical images and text. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1–12 (2022)
Google Scholar
Wu, C., Zhang, X., Zhang, Y., Wang, Y., Xie, W.: Medklip: medical knowledge enhanced language-image pre-training for x-ray diagnosis. In: ICCV (2023)
Google Scholar
Zanella, M., Gérin, B., Ayed, I.B.: Boosting vision-language models with transduction. arXiv preprint arXiv:2406.01837 (2024)
Zhang, Y., Jiang, H., Miura, Y., Manning, C.D., Langlotz, C.P.: Contrastive learning of medical visual representations from paired images and text. In: MHLC (2022)
Google Scholar
Ziko, I., Dolz, J., Granger, E., Ayed, I.B.: Laplacian regularized few-shot learning. In: International Conference on Machine Learning, pp. 11660–11670. PMLR (2020)
Google Scholar

Download references

Acknowledgement

M. Zanella is funded by the Walloon region under grant No. 2010235 (ARIAC by DIGITALWALLONIA4.AI). F. Shakeri is funded by Natural Sciences and Engineering Research Council of Canada (NSERC) and Canadian Institutes of Health Research (CIHR).

Author information

Authors and Affiliations

Université Catholique de Louvain (UCLouvain), Louvain-La-Neuve, Belgium
Maxime Zanella
Université de Mons (UMons), Mons, Belgium
Maxime Zanella
École de Technologie Supérieure (ÉTS), Montréal, Canada
Fereshteh Shakeri, Yunshi Huang & Ismail Ben Ayed
Centre de Recherche du Centre Hospitalier de l’Université de Montréal (CRCHUM), Montréal, Canada
Fereshteh Shakeri, Yunshi Huang, Houda Bahig & Ismail Ben Ayed

Authors

Maxime Zanella
View author publications
You can also search for this author in PubMed Google Scholar
Fereshteh Shakeri
View author publications
You can also search for this author in PubMed Google Scholar
Yunshi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Houda Bahig
View author publications
You can also search for this author in PubMed Google Scholar
Ismail Ben Ayed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maxime Zanella .

Editor information

Editors and Affiliations

University of Cambridge, Cambridge, UK
Zhongying Deng
Johns Hopkins University, Baltimore, MD, USA
Yiqing Shen
Korea University, Seoul, Korea (Republic of)
Hyunwoo J. Kim
Korea University, Seoul, Korea (Republic of)
Won-Ki Jeong
University of Cambridge, Cambridge, UK
Angelica I. Aviles-Rivero
Shanghai AI Laboratory, Shanghai, China
Junjun He
Shanghai AI Laboratory, Shanghai, China
Shaoting Zhang

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zanella, M., Shakeri, F., Huang, Y., Bahig, H., Ayed, I.B. (2025). Boosting Vision-Language Models for Histopathology Classification: Predict All at Once. In: Deng, Z., et al. Foundation Models for General Medical AI. MedAGI 2024. Lecture Notes in Computer Science, vol 15184. Springer, Cham. https://doi.org/10.1007/978-3-031-73471-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-73471-7_16
Published: 28 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73470-0
Online ISBN: 978-3-031-73471-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Boosting Vision-Language Models for Histopathology Classification: Predict All at Once