Bridging the Pathology Domain Gap: Efficiently Adapting CLIP for Pathology Image Analysis with Limited Labeled Data

Lai, Zhengfeng; Chauhan, Joohi; Dugger, Brittany N.; Chuah, Chen-Nee

doi:10.1007/978-3-031-73039-9_15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15122))

Included in the following conference series:

European Conference on Computer Vision

257 Accesses

Abstract

Contrastive Language-Image Pre-training (CLIP) has shown its proficiency in acquiring distinctive visual representations and exhibiting strong generalization across diverse vision tasks. However, its effectiveness in pathology image analysis, particularly with limited labeled data, remains an ongoing area of investigation due to challenges associated with significant domain shifts and catastrophic forgetting. Thus, it is imperative to devise efficient adaptation strategies in this domain to enable scalable analysis. In this study, we introduce Path-CLIP, a framework tailored for a swift adaptation of CLIP to various pathology tasks. Firstly, we propose Residual Feature Refinement (RFR) with a dynamically adjustable ratio to effectively integrate and balance source and task-specific knowledge. Secondly, we introduce Hidden Representation Perturbation (HRP) and Dual-view Vision Contrastive (DVC) techniques to mitigate overfitting issues. Finally, we present the Doublet Multimodal Contrastive Loss (DMCL) for fine-tuning CLIP for pathology tasks. We demonstrate that Path-CLIP adeptly adapts pre-trained CLIP to downstream pathology tasks, yielding competitive results. Specifically, Path-CLIP achieves over +19% improvement in accuracy when utilizing mere 0.1% of labeled data in PCam with only 10 min of fine-tuning while running on a single GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

PathoTune: Adapting Visual Foundation Model to Pathological Specialists

Benchmarking PathCLIP for Pathology Image Analysis

Article 09 July 2024

Unified Medical Image Pre-training in Language-Guided Common Semantic Space

References

Bándi, P., et al.: Comparison of different methods for tissue segmentation in histopathological whole-slide images. In: Proceedings of the 2017 IEEE International Symposium on Biomedical Imaging, pp. 591–595 (2017)
Google Scholar
Bejnordi, B.E., et al.: Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318(22), 2199–2210 (2017)
Article Google Scholar
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.: MixMatch: a holistic approach to semi-supervised learning. In: NeurIPS, vol. 32 (2019)
Google Scholar
Chen, F., et al.: Unitail: detecting, reading, and matching in retail scene. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13667, pp. 705–722. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20071-7_41
Chapter Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML, pp. 1597–1607 (2020)
Google Scholar
Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: RandAugment: practical automated data augmentation with a reduced search space. In: Proceedings of the IIEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop, pp. 702–703 (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2021)
Google Scholar
Gal, R., Patashnik, O., Maron, H., Bermano, A.H., Chechik, G., Cohen-Or, D.: StyleGAN-NADA: clip-guided domain adaptation of image generators. ACM Trans. Graph. (TOG) 41(4), 1–13 (2022)
Article Google Scholar
Gao, P., et al.: Clip-adapter: better vision-language models with feature adapters. arXiv preprint arXiv:2110.04544 (2021)
Ge, C., et al.: Domain adaptation via prompt learning. arXiv preprint arXiv:2202.06687 (2022)
He, G., Chen, J., Zhu, J.: Preserving pre-trained features helps calibrate fine-tuned language models. In: ICLR (2023)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: ICML, pp. 2790–2799. PMLR (2019)
Google Scholar
Huang, Z., et al.: A visual-language foundation model for pathology image analysis using medical twitter. Nat. Med. 29, 2307–2316 (2023)
Article Google Scholar
Jeremy, et al.: CheXpert. In: AAAI (2019)
Google Scholar
Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: ICML, pp. 4904–4916 (2021)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (Poster) (2015)
Google Scholar
Lai, Z., et al.: BrainSec: automated brain tissue segmentation pipeline for scalable neuropathological analysis. IEEE Access 10, 49064–49079 (2022)
Article Google Scholar
Lai, Z., Wang, C., Oliveira, L.C., Dugger, B.N., Cheung, S.C., Chuah, C.N.: Joint semi-supervised and active learning for segmentation of gigapixel pathology images with cost-effective labeling. In: ICCV Workshop, pp. 591–600 (2021)
Google Scholar
Lam, S.Y., Zeng, Q., Zhang, K., You, C., Voigt, R.: Large language models are partially primed in pronoun interpretation. arXiv preprint arXiv:2305.16917 (2023)
Lee, D.H.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning (ICML) (2013)
Google Scholar
Liu, D., Cui, Y., Yan, L., Mousas, C., Yang, B., Chen, Y.: DenserNet: weakly supervised visual localization using multi-scale feature aggregation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 6101–6109 (2021)
Google Scholar
Liu, M., et al.: A deep learning method for breast cancer classification in the pathology images. IEEE J. Biomed. Health. Inf. 26(10), 5025–5032 (2022)
Article Google Scholar
Lu, Y., Liu, J., Zhang, Y., Liu, Y., Tian, X.: Prompt distribution learning. In: CVPR, pp. 5206–5215 (2022)
Google Scholar
Lyu, W., et al.: A multimodal transformer: fusing clinical notes with structured EHR data for interpretable in-hospital mortality prediction. In: AMIA Annual Symposium Proceedings, vol. 2022, p. 719. American Medical Informatics Association (2022)
Google Scholar
Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
Oskal, K.R.J., Risdal, M., Janssen, E.A.M., Undersrud, E.S., Gulsrud, T.O.: A U-Net based approach to epidermal tissue segmentation in whole slide histopathological images. SN Appl. Sci. 1(7), 1–12 (2019). https://doi.org/10.1007/s42452-019-0694-y
Article Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763 (2021)
Google Scholar
Rahman, T., et al.: Covid. Comput. Biol. Med. (2021)
Google Scholar
Rasheed, H., khattak, M.U., Maaz, M., Khan, S., Khan, F.S.: Finetuned clip models are efficient video learners. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
Google Scholar
Sohn, K., et al.: FixMatch: simplifying semi-supervised learning with consistency and confidence. In: NeurIPS, vol. 33, pp. 596–608 (2020)
Google Scholar
Tian, Y., Newsam, S., Boakye, K.: Fashion image retrieval with text feedback by additive attention compositional learning. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1011–1021 (2023)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS, vol. 30 (2017)
Google Scholar
Veeling, B.S., Linmans, J., Winkens, J., Cohen, T., Welling, M.: Rotation equivariant CNNs for digital pathology. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11071, pp. 210–218. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00934-2_24
Chapter Google Scholar
Wang, X., et al.: TransPath: transformer-based self-supervised learning for histopathological image classification. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12908, pp. 186–195. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87237-3_18
Chapter Google Scholar
Wang, Y., Ma, X., Chen, Z., Luo, Y., Yi, J., Bailey, J.: Symmetric cross entropy for robust learning with noisy labels. In: ICCV (2019)
Google Scholar
Wang, Z., Wu, Z., Agarwal, D., Sun, J.: MedCLIP: contrastive learning from unpaired medical images and text. In: Proceedings of EMNLP (2022)
Google Scholar
Wei, J., et al.: A petri dish for histopathology image analysis. In: Tucker, A., Henriques Abreu, P., Cardoso, J., Pereira Rodrigues, P., Riaño, D. (eds.) AIME 2021. LNCS (LNAI), vol. 12721, pp. 11–24. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-77211-6_2
Chapter Google Scholar
Wu, C., Wu, F., Qi, T., Huang, Y.: NoisyTune: a little noise can help you finetune pretrained language models better. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 680–685 (2022)
Google Scholar
Xu, P., Zhu, X., Clifton, D.A.: Multimodal learning with transformers: a survey. arXiv preprint arXiv:2206.06488 (2022)
Xu, Y., et al.: Dash: semi-supervised learning with dynamic thresholding. In: International Conference on Machine Learning, pp. 11525–11536. PMLR (2021)
Google Scholar
Yan, X., et al.: Representation recovering for self-supervised pre-training on medical images. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2685–2695 (2023)
Google Scholar
Yang, B., et al.: Multimodal prompt learning for product title generation with extremely limited labels. arXiv preprint arXiv:2307.01969 (2023)
Yang, J., Chen, H., Liang, Y., Huang, J., He, L., Yao, J.: ConCL: concept contrastive learning for dense prediction pre-training in pathology images. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13681, pp. 523–539. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19803-8_31
Chapter Google Scholar
You, C., et al.: Rethinking semi-supervised medical image segmentation: a variance-reduction perspective. In: NeurIPS (2023)
Google Scholar
You, C., Dai, W., Min, Y., Staib, L., Duncan, J.S.: Bootstrapping semi-supervised medical image segmentation with anatomical-aware contrastive distillation. In: Frangi, A., de Bruijne, M., Wassermann, D., Navab, N. (eds.) IPMI 2023. LNCS, vol. 13939, pp. 641–653. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-34048-2_49
Chapter Google Scholar
You, C., Dai, W., Min, Y., Staib, L., Sekhon, J., Duncan, J.S.: Action++: improving semi-supervised medical image segmentation with adaptive anatomical contrast. arXiv preprint arXiv:2304.02689 (2023)
Yu, J., Wang, Z., Vasudevan, V., Yeung, L., Seyedhosseini, M., Wu, Y.: CoCa: contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917 (2022)
Yuan, H., Yuan, Z., Tan, C., Huang, F., Huang, S.: Hype: better pre-trained language model fine-tuning with hidden representation perturbation. arXiv preprint arXiv:2212.08853 (2022)
Yuan, L., et al.: Florence: a new foundation model for computer vision. arXiv preprint arXiv:2111.11432 (2021)
Zhai, X., Kolesnikov, A., Houlsby, N., Beyer, L.: Scaling vision transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12104–12113, June 2022
Google Scholar
Zhai, X., et al.: LiT: zero-shot transfer with locked-image text tuning. In: CVPR, pp. 18123–18133 (2022)
Google Scholar
Zhang, B., et al.: FlexMatch: boosting semi-supervised learning with curriculum pseudo labeling. In: NeurIPS, vol. 34 (2021)
Google Scholar
Zhang, R., et al.: PointClip: point cloud understanding by clip. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8552–8562 (2022)
Google Scholar
Zhang, R., et al.: Tip-adapter: training-free adaption of clip for few-shot classification. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13695, pp. 493–510. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19833-5_29
Chapter Google Scholar
Zhang, T., Wu, F., Katiyar, A., Weinberger, K.Q., Artzi, Y.: Revisiting few-sample BERT fine-tuning. In: ICLR (2021)
Google Scholar
Zhang, W., et al.: BoostMIS: boosting medical image semi-supervised learning with adaptive pseudo labeling and informative active annotation. In: CVPR, pp. 20666–20676 (2022)
Google Scholar
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. Int. J. Comput. Vision 130(9), 2337–2348 (2022)
Article Google Scholar

Download references

Acknowledgments

This work was supported by the Noyce Initiative UC Partnerships in Computational Transformation Grant and Child Family Endowed Professorship. Resources for this study were funded in part by grants from the National Institute on Aging of the National Institutes of Health under Award Numbers R01AG062517, P30AG072972, and R01AG056519.

Author information

Authors and Affiliations

University of California, Davis, Davis, USA
Zhengfeng Lai, Brittany N. Dugger & Chen-Nee Chuah
Motilal Nehru National Institute of Technology Allahabad, Allahabad, India
Joohi Chauhan

Authors

Zhengfeng Lai
View author publications
You can also search for this author in PubMed Google Scholar
Joohi Chauhan
View author publications
You can also search for this author in PubMed Google Scholar
Brittany N. Dugger
View author publications
You can also search for this author in PubMed Google Scholar
Chen-Nee Chuah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhengfeng Lai .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 741 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lai, Z., Chauhan, J., Dugger, B.N., Chuah, CN. (2025). Bridging the Pathology Domain Gap: Efficiently Adapting CLIP for Pathology Image Analysis with Limited Labeled Data. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15122. Springer, Cham. https://doi.org/10.1007/978-3-031-73039-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-73039-9_15
Published: 31 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73038-2
Online ISBN: 978-3-031-73039-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Bridging the Pathology Domain Gap: Efficiently Adapting CLIP for Pathology Image Analysis with Limited Labeled Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

PathoTune: Adapting Visual Foundation Model to Pathological Specialists

Benchmarking PathCLIP for Pathology Image Analysis

Unified Medical Image Pre-training in Language-Guided Common Semantic Space

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 741 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Bridging the Pathology Domain Gap: Efficiently Adapting CLIP for Pathology Image Analysis with Limited Labeled Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

PathoTune: Adapting Visual Foundation Model to Pathological Specialists

Benchmarking PathCLIP for Pathology Image Analysis

Unified Medical Image Pre-training in Language-Guided Common Semantic Space

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 741 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation