Abstract
Multi-modality learning, exemplified by the language and image pair pre-trained CLIP model, has demonstrated remarkable performance in enhancing zero-shot capabilities and has gained significant attention in the field. However, simply applying language-image pre-trained CLIP to medical image analysis encounters substantial domain shifts, resulting in significant performance degradation due to inherent disparities between natural (non-medical) and medical image characteristics. To address this challenge and uphold or even enhance CLIP’s zero-shot capability in medical image analysis, we develop a novel framework, Core-Periphery feature alignment for CLIP (CP-CLIP), tailored for handling medical images and corresponding clinical reports. Leveraging the foundational core-periphery organization that has been widely observed in brain networks, we augment CLIP by integrating a novel core-periphery-guided neural network. This auxiliary CP network not only aligns text and image features into a unified latent space more efficiently but also ensures the alignment is driven by domain-specific core information, e.g., in medical images and clinical reports. In this way, our approach effectively mitigates and further enhances CLIP’s zero-shot performance in medical image analysis. More importantly, our designed CP-CLIP exhibits excellent explanatory capability, enabling the automatic identification of critical regions in clinical analysis. Extensive experimentation and evaluation across five public datasets underscore the superiority of our CP-CLIP in zero-shot medical image prediction and critical area detection, showing its promising utility in multimodal feature alignment in current medical applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chavoshnejad, P., Chen, L., Yu, X., Hou, J., Filla, N., Zhu, D., Liu, T., Li, G., Razavi, M.J., Wang, X.: An integrated finite element method and machine learning algorithm for brain morphology prediction. Cerebral Cortex 33(15), 9354–9366 (2023)
Huang, Z., Long, G., Wessler, B., Hughes, M.C.: Tmed 2: a dataset for semi-supervised classification of echocardiograms. DataPerf: Benchmarking Data for Data-Centric AI Workshop (2022)
Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund, H., Haghgoo, B., Ball, R., Shpanskaya, K., et al.: Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In: AAAI. vol. 33, pp. 590–597 (2019)
Johnson, A.E., Pollard, T.J., Berkowitz, S.J., Greenbaum, N.R., Lungren, M.P., Deng, C.y., Mark, R.G., Horng, S.: Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data 6(1), 317 (2019)
Liu, Z., Jiang, H., Zhong, T., Wu, Z., Ma, C., Li, Y., Yu, X., et al.: Holistic evaluation of gpt-4v for biomedical imaging. arXiv preprint arXiv:2312.05256 (2023)
Lyu, Y., Yu, X., Zhang, L., Zhu, D.: Classification of mild cognitive impairment by fusing neuroimaging and gene expression data. In: Proceedings of the 15th international conference on PErvasive technologies related to assistive environments. pp. 26–32 (2021)
Lyu, Y., Yu, X., Zhu, D., Zhang, L.: Classification of alzheimer’s disease via vision transformer. In: Proceedings of the 15th international conference on PErvasive technologies related to assistive environments. pp. 463–468 (2022)
Ma, C., Jiang, H., Chen, W., Wu, Z., Yu, X., et al.: Eye-gaze guided multi-modal alignment framework for radiology. arXiv preprint arXiv:2403.12416 (2024)
Moreira, I.C., Amaral, I., Domingues, I., Cardoso, A., Cardoso, M.J., Cardoso, J.S.: Inbreast: toward a full-field digital mammographic database. Academic radiology 19(2), 236–248 (2012)
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PMLR (2021)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: CVPR. pp. 618–626 (2017)
Stephens, K.: Acr, siim name winners of pneumothorax detection machine learning challenge. AXIS Imaging News (2019)
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: CVPR. pp. 2097–2106 (2017)
Wang, Z., Wu, Z., Agarwal, D., Sun, J.: Medclip: Contrastive learning from unpaired medical images and text. arXiv preprint arXiv:2210.10163 (2022)
Xiao, Z., Chen, Y., Yao, J., Zhang, L., Liu, Z., Wu, Z., Yu, X., et al.: Instruction-vit: Multi-modal prompts for instruction learning in vision transformer. Information Fusion p. 102204 (2024)
Yu, X., Hu, D., Zhang, L., Huang, Y., Wu, Z., Liu, T., Wang, L., Lin, W., Zhu, D., Li, G.: Longitudinal infant functional connectivity prediction via conditional intensive triplet network. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 255–264 (2022)
Yu, X., Scheel, N., Zhang, L., Zhu, D.C., Zhang, R., Zhu, D.: Free water in t2 flair white matter hyperintensity lesions. In: Alzheimer’s & Dementia. p. e057398 (2021)
Yu, X., Zhang, L., Dai, H., Lyu, Y., Zhao, L., Wu, Z., Liu, D., Liu, T., Zhu, D.: Core-periphery principle guided redesign of self-attention in transformers. arXiv preprint arXiv:2303.15569 (2023)
Yu, X., Zhang, L., Dai, H., Zhao, L., Lyu, Y., Wu, Z., Liu, T., Dajiang, Z.: Gyri vs. sulci: Disentangling brain core-periphery functional networks via twin-transformer. arXiv preprint arXiv:2302.00146 (2023)
Yu, X., Zhang, L., Lyu, Y., Liu, T., Zhu, D.: Supervised deep tree in alzheimer’s disease. In: IEEE 20th International Symposium on Biomedical Imaging (ISBI). pp. 1–5 (2023)
Yu, X., Zhang, L., Zhao, L., Lyu, Y., Liu, T., Dajiang, Z.: Disentangling spatial-temporal functional brain networks via twin-transformers. arXiv preprint arXiv:2204.09225 (2022)
Yu, X., Zhang, L., Zhu, D., Liu, T.: Robust core-periphery constrained transformer for domain adaptation. arXiv preprint arXiv:2308.13515 (2023)
Zhang, L., Liu, Z., Zhang, L., Wu, Z., Yu, X., Holmes, J., Feng, H., Dai, H., Li, X., Li, Q., Wong, W.W., Vora, S.A., Zhu, D., Liu, T., Liu, W.: Generalizable and promptable artificial intelligence model to augment clinical delineation in radiation oncology. Medical Physics (2024)
Zhang, L., Na, S., Liu, T., Zhu, D., Huang, J.: Multimodal deep fusion in hyperbolic space for mild cognitive impairment study. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 674–684. Springer (2023)
Zhang, L., Wang, L., Gao, J., Risacher, S.L., Yan, J., Li, G., Liu, T., Zhu, D., Initiative, A.D.N., et al.: Deep fusion of brain structure-function in mild cognitive impairment. Medical image analysis 72, 102082 (2021)
Zhang, L., Wang, L., Liu, T., Zhu, D.: Disease2vec: Encoding alzheimer’s progression via disease embedding tree. Pharmacological Research 199, 107038 (2024)
Zhang, L., Wang, L., Zhu, D.: Jointly analyzing alzheimer’s disease related structure-function using deep cross-model attention network. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). pp. 563–567. IEEE (2020)
Zhang, L., Yu, X., Lyu, Y., Liu, T., Zhu, D.: Representative functional connectivity learning for multiple clinical groups in alzheimer’s disease. In: IEEE 20th International Symposium on Biomedical Imaging (ISBI). pp. 1–5 (2023)
Zhang, L., Zaman, A., Wang, L., Yan, J., Zhu, D.: A cascaded multi-modality analysis in mild cognitive impairment. In: Machine Learning in Medical Imaging: 10th International Workshop, MLMI, Proceedings 10. pp. 557–565. Springer (2019)
Zhao, L., Zhang, L., Wu, Z., Chen, Y., Dai, H., Yu, X., Liu, Z., Zhang, T., Hu, X., Jiang, X., et al.: When brain-inspired ai meets agi. Meta-Radiology p. 100005 (2023)
Acknowledgments
This work was supported by National Institutes of Health (R01AG075582 and RF1NS128534).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yu, X., Wu, Z., Zhang, L., Zhang, J., Lyu, Y., Zhu, D. (2024). CP-CLIP: Core-Periphery Feature Alignment CLIP for Zero-Shot Medical Image Analysis. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15003. Springer, Cham. https://doi.org/10.1007/978-3-031-72384-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-72384-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72383-4
Online ISBN: 978-3-031-72384-1
eBook Packages: Computer ScienceComputer Science (R0)