Skip to main content

Few-Shot Adaptation of Medical Vision-Language Models

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 (MICCAI 2024)

Abstract

Integrating image and text data through multi-modal learning has emerged as a new approach in medical imaging research, following its successful deployment in computer vision. While considerable efforts have been dedicated to establishing medical foundation models and their zero-shot transfer to downstream tasks, the popular few-shot setting remains relatively unexplored. Following on from the currently strong emergence of this setting in computer vision, we introduce the first structured benchmark for adapting medical vision-language models (VLMs) in a strict few-shot regime and investigate various adaptation strategies commonly used in the context of natural images. Furthermore, we evaluate a simple generalization of the linear-probe adaptation baseline, which seeks an optimal blending of the visual prototypes and text embeddings via learnable class-wise multipliers. Surprisingly, such a text-informed linear probe yields competitive performances in comparison to convoluted prompt-learning and adapter-based strategies, while running considerably faster and accommodating the black-box setting. Our extensive experiments span three different medical modalities and specialized foundation models, nine downstream tasks, and several state-of-the-art few-shot adaptation methods. We made our benchmark and code publicly available to trigger further developments in this emergent subject: https://github.com/FereshteShakeri/few-shot-MedVLMs.

F. Shakeri and Y. Huang—Equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ocular disease intelligent recognition (odir) (2019), https://odir2019.grand-challenge.org/

  2. Alsentzer, E., et al.: Publicly available clinical BERT embeddings. In: Clinical Natural Language Processing Workshop (2019)

    Google Scholar 

  3. Chen, G., et al.: Prompt learning with optimal transport for vision-language models. In: International Conference on Learning Representations (2023)

    Google Scholar 

  4. Chen, X., et al.: Recent advances and clinical applications of deep learning in medical image analysis. Medical Image Analysis 79 (2022)

    Google Scholar 

  5. Decencière, E., et al.: Feedback on a publicly distributed image database: The messidor database. Image Analysis & Stereology 33, 231–234 (07 2014)

    Google Scholar 

  6. Fischer, M., Bartler, A., Yang, B.: Prompt tuning for parameter-efficient medical image segmentation. Medical Image Analysis 91, 103024 (2024)

    Article  Google Scholar 

  7. Gao, P., et al.: Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision 132, 581–595 (2023)

    Article  Google Scholar 

  8. hong, Z., Friedman, D., Chen, D.: Factual probing is [mask]: Learning vs. learning to recall. In: Conference of the North American Chapter of the Association for Computational Linguistics (2021)

    Google Scholar 

  9. Huang, Y., Shakeri, F., Dolz, J., Boudiaf, M., Bahig, H., Ayed, I.B.: Lp++: A surprisingly strong linear probe for few-shot clip. In: IEEE Conference on Computer Vision and Pattern Recognition (2024)

    Google Scholar 

  10. Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T., Zou, J.: A visual–language foundation model for pathology image analysis using medical twitter. Nature Medicine 29, 1–10 (2023)

    Google Scholar 

  11. Ikezogwo, W.O., et al.: Quilt-1m: One million image-text pairs for histopathology. In: Neural Information Processing Systems (2023)

    Google Scholar 

  12. Irvin, J., et al.: Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In: AAAI (2019)

    Google Scholar 

  13. Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: International Conference on Machine Learning (2021)

    Google Scholar 

  14. Jiang, Z., Xu, F., Araki, J., Neubig, G.: How can we know what language models know. In: Association for Computational Linguistics (2020)

    Google Scholar 

  15. Jin, K., et al.: Fives: A fundus image dataset for artificial intelligence based vessel segmentation. Scientific Data 9 (2022)

    Google Scholar 

  16. Johnson, A.E., et al.: Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data 6 (2019)

    Google Scholar 

  17. Kather, J.N., Halama, N., Marx, A.: 100,000 histological images of human colorectal cancer and healthy tissue. Zenodo 5281 (2018)

    Google Scholar 

  18. Kriegsmann, K., et al.: Deep learning for the detection of anatomical tissue structures and neoplasms of the skin on scanned histopathological tissue sections. Frontiers in Oncology 12 (2022)

    Google Scholar 

  19. Lin, Z., Yu, S., Kuang, Z., Pathak, D., Ramanan, D.: Multimodality helps unimodality: Cross-modal few-shot learning with multimodal models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 19325–19337 (2023)

    Google Scholar 

  20. Litjens, G., et al.: A survey on deep learning in medical image analysis. Medical Image Analysis 42 (2017)

    Google Scholar 

  21. Moor, M., et al.: Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (4 2023)

    Google Scholar 

  22. Nocedal, J.: Updating quasi-newton matrices with limited storage. Mathematics of Computation 35(151), 773–782 (1980)

    Article  MathSciNet  Google Scholar 

  23. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (2021)

    Google Scholar 

  24. Raghu, M., Zhang, C., Kleinberg, J., Bengio, S.: Transfusion: Understanding transfer learning for medical imaging. In: Advances in neural information processing systems (2019)

    Google Scholar 

  25. Shih, G., et al.: Augmenting the national institutes of health chest radiograph dataset with expert annotations of possible pneumonia. Radiology: Artificial Intelligence 1 (2019)

    Google Scholar 

  26. Shin, T., et al.: Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In: CoRR (2020)

    Google Scholar 

  27. Silva-Rodriguez, J., Chakor, H., Kobbi, R., Dolz, J., Ayed, I.B.: A foundation language-image model of the retina (flair): Encoding expert knowledge in text supervision. ArXiv Preprint (2023)

    Google Scholar 

  28. Silva-Rodríguez, J., Colomer, A., Sales, M.A., Molina, R., Naranjo, V.: Going deeper through the gleason scoring scale: An automatic end-to-end system for histology prostate grading and cribriform pattern detection. Computer methods and programs in biomedicine 195 (2020)

    Google Scholar 

  29. Song, C., Ristenpart, T., Shmatikov, V.: Machine learning models that remember too much. In: Conference on Computer and Communications Security (2017)

    Google Scholar 

  30. Taylor, N., et al.: Clinical prompt learning with frozen language models. IEEE Transactions on Neural Networks and Learning Systems (2023)

    Google Scholar 

  31. Wang, Z., Wu, Z., Agarwal, D., Sun, J.: Medclip: Contrastive learning from unpaired medical images and text. In: Empirical Methods in Natural Language Processing (2022)

    Google Scholar 

  32. Wu, C., Zhang, X., Zhang, Y., Wang, Y., Xie, W.: Medklip: Medical knowledge enhanced language-image pre-training for x-ray diagnosis. In: International Conference on Computer Vision (2023)

    Google Scholar 

  33. Yao, H., Zhang, R., Xu, C.: Visual-language prompt tuning with knowledge-guided context optimization. In: IEEE Conference on Computer Vision and Pattern Recognition (2023)

    Google Scholar 

  34. Zhang, R., et al.: Tip-adapter: Training-free adaption of clip for few-shot classification. In: European Conference on Computer Vision (2022)

    Google Scholar 

  35. Zhang, Y., Jiang, H., Miura, Y., Manning, C.D., Langlotz, C.P.: Contrastive learning of medical visual representations from paired images and text. In: MHLC (2022)

    Google Scholar 

  36. Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Conditional prompt learning for vision-language models. In: IEEE Conference on Computer Vision and Pattern Recognition (2022)

    Google Scholar 

  37. Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. International Journal of Computer Vision 130, 2337–2348 (2022)

    Article  Google Scholar 

  38. Zhu, B., Niu, Y., Han, Y., Wu, Y., Zhang, H.: Prompt-aligned gradient for prompt tuning. In: International Conference on Computer Vision (2023)

    Google Scholar 

Download references

Acknowledgement

This work was funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) and Montreal University Hospital Research Center (CRCHUM). We also thank Calcul Quebec and Compute Canada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fereshteh Shakeri .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shakeri, F. et al. (2024). Few-Shot Adaptation of Medical Vision-Language Models. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15012. Springer, Cham. https://doi.org/10.1007/978-3-031-72390-2_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72390-2_52

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72389-6

  • Online ISBN: 978-3-031-72390-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics