Abstract
Since the emergence of the ImageNet dataset, the pretraining and fine-tuning approach has become widely adopted in computer vision due to the ability of ImageNet-pretrained models to learn a wide variety of visual features. However, a significant challenge arises when adapting these models to domain-specific fields, such as digital pathology, due to substantial gaps between domains. To address this limitation, foundation models (FM) have been trained on large-scale in-domain datasets to learn the intricate features of histopathology images. In cancer diagnosis, whole-slide image (WSI) prediction is essential for patient prognosis, and multiple instance learning (MIL) has been implemented to handle the giga-pixel size of WSI. As MIL frameworks rely on patch-level feature aggregation, this work aims to compare the performance of various feature extractors developed under different pretraining strategies for cancer subtyping on WSI under a MIL framework. Results demonstrate the ability of foundation models to surpass ImageNet-pretrained models for the prediction of six skin cancer subtypes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abels, E., et al.: Computational pathology definitions, best practices, and recommendations for regulatory guidance: a white paper from the digital pathology association. J. Pathol. 249(3), 286–294 (2019). https://doi.org/10.1002/path.5331
Campanella, G., et al.: Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25(8), 1301–1309 (2019). https://doi.org/10.1038/s41591-019-0508-1
Chen, R.J., et al.: Towards a general-purpose foundation model for computational pathology. Nat. Med. 30(3), 850–862 (2024). https://doi.org/10.1038/s41591-024-02857-3
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 1597–1607. PMLR, 13–18 July 2020. https://proceedings.mlr.press/v119/chen20j.html
del Amor, R., et al.: An attention-based weakly supervised framework for spitzoid melanocytic lesion diagnosis in whole slide images. Artif. Intell. Med. 121, 102197 (2021). https://doi.org/10.1016/j.artmed.2021.102197
Del Amor, R., et al.: Constrained multiple instance learning for ulcerative colitis prediction using histological images. Comput. Methods Programs Biomed. 224, 107012 (2022). https://doi.org/10.1016/j.cmpb.2022.107012
del Amor, R., Pérez-Cano, J., López-Pérez, M., Terradez, L., Aneiros-Fernandez, J., Morales, S., Mateos, J., Molina, R., Naranjo, V.: Annotation protocol and crowdsourcing multiple instance learning classification of skin histological images: The cr-ai4skin dataset. Artif. Intell. Med. 145, 102686 (2023). https://doi.org/10.1016/j.artmed.2023.102686
Guan, H., Liu, M.: Domain adaptation for medical image analysis: a survey. IEEE Trans. Biomed. Eng. 69(3), 1173–1185 (2022). https://doi.org/10.1109/TBME.2021.3117407
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T.J., Zou, J.: A visual-language foundation model for pathology image analysis using medical twitter. Nat. Med. 29(9), 2307–2316 (2023). https://doi.org/10.1038/s41591-023-02504-3
Ikezogwo, W., et al.: Quilt-1m: One million image-text pairs for histopathology. In: Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information Processing Systems, vol. 36, pp. 37995–38017. Curran Associates, Inc. (2023). https://proceedings.neurips.cc/paper_files/paper/2023/file/775ec578876fa6812c062644964b9870-Paper-Datasets_and_Benchmarks.pdf
Ilse, M., Tomczak, J., Welling, M.: Attention-based deep multiple instance learning. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 2127–2136. PMLR, 10–15 July 2018. https://proceedings.mlr.press/v80/ilse18a.html
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Li, B., Li, Y., Eliceiri, K.W.: Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14318–14328, June 2021
Liu, M., Liu, Y., Cui, H., Li, C., Ma, J.: Mgct: Mutual-guided cross-modality transformer for survival outcome prediction using integrative histopathology-genomic features. In: 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1306–1312 (2023). https://doi.org/10.1109/BIBM58861.2023.10385897
Lu, M.Y., et al.: A visual-language foundation model for computational pathology. Nat. Med. 30(3), 863–874 (2024). https://doi.org/10.1038/s41591-024-02856-4
van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(86), 2579–2605 (2008). http://jmlr.org/papers/v9/vandermaaten08a.html
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 8748–8763. PMLR, 18–24 July 2021. https://proceedings.mlr.press/v139/radford21a.html
Shao, Z., Bian, H., Chen, Y., Wang, Y., Zhang, J., Ji, X., zhang, y.: Transmil: transformer based correlated multiple instance learning for whole slide image classification. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 2136–2147. Curran Associates, Inc. (2021). https://proceedings.neurips.cc/paper_files/paper/2021/file/10c272d06794d3e5785d5e7c5356e9ff-Paper.pdf
Wang, X., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., Huang, J., Han, X.: Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559 (2022). https://doi.org/10.1016/j.media.2022.102559
Winnepenninckx, V., De Vos, R., Stas, M., van den Oord, J.J.: New phenotypical and ultrastructural findings in spindle cell (desmoplastic/neurotropic) melanoma. Appl. Immunohistochem. Mol. Morphol. 11(4), 369–375 (2003). https://doi.org/10.1097/01.PAI.0000040947.01212.40
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Funding
This work has received funding from the Spanish Ministry of Economy and Competitiveness through projects PID2019-105142RB-C21 (AI4SKIN) and PID2022-140189OB-C21 (ASSIST). The work of Rocío del Amor and Pablo Meseguer has been supported by the Spanish Ministry of Universities under an FPU Grant (FPU20/05263) and valgrAI - Valencian Graduate School and Research Network of Artificial Intelligence, respectively.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Meseguer, P., del Amor, R., Colomer, A., Naranjo, V. (2025). Foundation Models for Slide-Level Cancer Subtyping in Digital Pathology. In: Juan, A.A., Faulin, J., Lopez-Lopez, D. (eds) Decision Sciences. DSA ISC 2024. Lecture Notes in Computer Science, vol 14779. Springer, Cham. https://doi.org/10.1007/978-3-031-78241-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-78241-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78240-4
Online ISBN: 978-3-031-78241-1
eBook Packages: Computer ScienceComputer Science (R0)