Abstract
The development of vision-language models (VLMs) for histo-pathology has shown promising new usages and zero-shot performances. However, current approaches, which decompose large slides into smaller patches, focus solely on inductive classification, i.e., prediction for each patch is made independently of the other patches in the target test data. We extend the capability of these large models by introducing a transductive approach. By using text-based predictions and affinity relationships among patches, our approach leverages the strong zero-shot capabilities of these new VLMs without any additional labels. Our experiments cover four histopathology datasets and five different VLMs. Operating solely in the embedding space (i.e., in a black-box setting), our approach is highly efficient, processing \(10^5\) patches in just a few seconds, and shows significant accuracy improvements over inductive zero-shot classification. Code available at https://github.com/FereshteShakeri/Histo-TransCLIP.
M. Zanella and F. Shakeri—are Equally Contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bilgin, C., Demir, C., Nagi, C., Yener, B.: Cell-graph mining for breast tissue modeling and classification. In: 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 5311–5314. IEEE (2007)
Borkowski, A.A., Bui, M.M., Thomas, L.B., Wilson, C.P., DeLand, L.A., Mastorides, S.M.: Lung and colon cancer histopathological image dataset (lc25000). arXiv preprint arXiv:1912.12142 (2019)
Boudiaf, M., Ziko, I., Rony, J., Dolz, J., Piantanida, P., Ben Ayed, I.: Information maximization for few-shot learning. Adv. Neural. Inf. Process. Syst. 33, 2445–2457 (2020)
Chen, X., et al.: Recent advances and clinical applications of deep learning in medical image analysis. Med. Image Anal. 79 (2022)
Dhillon, G.S., Chaudhari, P., Ravichandran, A., Soatto, S.: A baseline for few-shot image classification. In: International Conference on Learning Representations (2019)
Hartsock, I., Rasool, G.: Vision-language models for medical report generation and visual question answering: A review. CoRR abs/2403.02469 (2024). https://doi.org/10.48550/ARXIV.2403.02469, https://doi.org/10.48550/arXiv.2403.02469
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T., Zou, J.: A visual-language foundation model for pathology image analysis using medical twitter. Nat. Med. 29, 1–10 (2023)
Ikezogwo, W.O., et al.: Quilt-1m: One million image-text pairs for histopathology. arXiv preprint arXiv:2306.11207 (2023)
Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: International Conference on Machine Learning, pp. 4904–4916 (2021)
Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 200–209 (1999)
Kather, J.N., Halama, N., Marx, A.: 100,000 histological images of human colorectal cancer and healthy tissue. Zenodo10 5281 (2018)
Komura, D., Ishikawa, S.: Machine learning methods for histopathological image analysis. Comput. Struct. Biotechnol. J. 16, 34–42 (2018)
Kriegsmann, K., et al.: Deep learning for the detection of anatomical tissue structures and neoplasms of the skin on scanned histopathological tissue sections. Front. Oncol. 12, 1022967 (2022)
Liu, J., Song, L., Qin, Y.: Prototype rectification for few-shot learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., (eds.) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol. 12346, pp. 741–756. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_43
Lu, M.Y., et al.: A visual-language foundation model for computational pathology. Nat. Med. 30, 863–874 (2024)
Madabhushi, A.: Digital pathology image analysis: opportunities and challenges. Imaging Med. 1(1), 7 (2009)
Martin, S., Huang, Y., Shakeri, F., Pesquet, J.C., Ben Ayed, I.: Transductive zero-shot and few-shot clip. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 28816–28826 (2024)
Pantanowitz, L.: Digital images and the future of digital pathology. J. Pathol. Inform. 1 (2010)
Petushi, S., Garcia, F.U., Haber, M.M., Katsinis, C., Tozeren, A.: Large-scale computations on histology images reveal grade-differentiating parameters for breast cancer. BMC Med. Imaging 6(1), 1–11 (2006)
Qureshi, H., Sertel, O., Rajpoot, N., Wilson, R., Gurcan, M.: Adaptive discriminant wavelet packet transform and local binary patterns for meningioma subtype classification. In: Metaxas, D., Axel, L., Fichtinger, G., Székely, G. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2008. MICCAI 2008. Lecture Notes in Computer Science, vol. 5242, pp. 196–204. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85990-1_24
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763 (2021)
Sadraoui, A., et al.: A transductive few-shot learning approach for classification of digital histopathological slides from liver cancer. In: IEEE International Symposium on Biomedical Imaging (ISBI) (2024)
Silva-Rodriguez, J., Chakor, H., Kobbi, R., Dolz, J., Ayed, I.B.: A foundation language-image model of the retina (flair): Encoding expert knowledge in text supervision. ArXiv Preprint (2023)
Silva-Rodríguez, J., Schmidt, A., Sales, M.A., Molina, R., Naranjo, V.: Proportion constrained weakly supervised histopathology image classification. Comput. Biol. Med. 147, 105714 (2022)
Tabesh, A., et al.: Multifeature prostate cancer diagnosis and gleason grading of histological images. IEEE Trans. Med. Imaging 26(10), 1366–1378 (2007)
Vapnik, V.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999). https://doi.org/10.1109/72.788640
Wang, Z., Wu, Z., Agarwal, D., Sun, J.: Medclip: contrastive learning from unpaired medical images and text. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1–12 (2022)
Wu, C., Zhang, X., Zhang, Y., Wang, Y., Xie, W.: Medklip: medical knowledge enhanced language-image pre-training for x-ray diagnosis. In: ICCV (2023)
Zanella, M., Gérin, B., Ayed, I.B.: Boosting vision-language models with transduction. arXiv preprint arXiv:2406.01837 (2024)
Zhang, Y., Jiang, H., Miura, Y., Manning, C.D., Langlotz, C.P.: Contrastive learning of medical visual representations from paired images and text. In: MHLC (2022)
Ziko, I., Dolz, J., Granger, E., Ayed, I.B.: Laplacian regularized few-shot learning. In: International Conference on Machine Learning, pp. 11660–11670. PMLR (2020)
Acknowledgement
M. Zanella is funded by the Walloon region under grant No. 2010235 (ARIAC by DIGITALWALLONIA4.AI). F. Shakeri is funded by Natural Sciences and Engineering Research Council of Canada (NSERC) and Canadian Institutes of Health Research (CIHR).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zanella, M., Shakeri, F., Huang, Y., Bahig, H., Ayed, I.B. (2025). Boosting Vision-Language Models for Histopathology Classification: Predict All at Once. In: Deng, Z., et al. Foundation Models for General Medical AI. MedAGI 2024. Lecture Notes in Computer Science, vol 15184. Springer, Cham. https://doi.org/10.1007/978-3-031-73471-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-73471-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73470-0
Online ISBN: 978-3-031-73471-7
eBook Packages: Computer ScienceComputer Science (R0)