MAdapter: A Better Interaction Between Image and Language for Medical Image Segmentation

Zhang, Xu; Ni, Bo; Yang, Yang; Zhang, Lefei

doi:10.1007/978-3-031-72114-4_41

Xu Zhang¹⁴,
Bo Ni¹⁵,
Yang Yang¹⁶ &
…
Lefei Zhang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15009))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

2003 Accesses

Abstract

Conventional medical image segmentation methods are only based on images, implying a requirement for adequate high-quality labeled images. Text-guided segmentation methods have been widely regarded as a solution to break the performance bottleneck. In this study, we introduce a bidirectional Medical Adaptor (MAdapter) where visual and linguistic features extracted from pre-trained dual encoders undergo interactive fusion. Additionally, a specialized decoder is designed to further align the fusion representation and global textual representation. Besides, we extend the endoscopic polyp datasets with clinical-oriented text annotations, following the guidance of medical professionals. Extensive experiments conducted on both the extended endoscopic polyp dataset and additional lung infection datasets demonstrate the superiority of our method. The code and text annotation are available at https://github.com/XShadow22/MAdapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues

LGA: A Language Guide Adapter for Advancing the SAM Model’s Capabilities in Medical Image Segmentation

Enhancing Label-Efficient Medical Image Segmentation with Text-Guided Diffusion Models

References

Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol. 9351. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Vaswani, A., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 1–11 (2017)
Google Scholar
Li, Z., et al.: LViT: language meets vision transformer in medical image segmentation. IEEE Trans. Med. Imaging 43(1), 96–107 (2024)
Article Google Scholar
Zhong, Y., Xu, M., Liang, K., Chen, K., Wu, M.: Ariadne’s thread: using text prompts to improve segmentation of infected areas from chest X-ray images. In: Greenspan, H., et al. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol. 14223. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43901-8_69
Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: UNet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39(6), 1856–1867 (2019)
Article Google Scholar
Lüddecke, T., Ecker, A.: Image segmentation using text and image prompts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7086–7096 (2022)
Google Scholar
Wang, Z., et al.: CRIS: clip-driven referring image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11686–11695 (2022)
Google Scholar
Xu, Z., Chen, Z., Zhang, Y., Song, Y., Wan, X., Li, G.: Bridging vision and language encoders: parameter-efficient tuning for referring image segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 17503–17512 (2023)
Google Scholar
Poudel, K., Dhakal, M., Bhandari, P., Adhikari, R., Thapaliya, S., Khanal, B.: Exploring transfer learning in medical image segmentation using vision-language models. arXiv preprint arXiv:2308.07706 (2023)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of the International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Liu, C. et al.: M-FLAG: medical vision-language pre-training with frozen language models and latent space geometry optimization. In: Greenspan, H., et al. (eds.) Medical Image Computing and Computer Assisted Intervention - MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol. 14220. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43907-0_61
Lee, G.E., Kim, S.H., Cho, J., Choi, S.T., Choi, S.I. : Text-guided cross-position attention for segmentation: case of medical image. In: Greenspan, H., et al.(eds.) Medical Image Computing and Computer Assisted Intervention - MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol. 14224. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43904-9_52
Degerli, A., Kiranyaz, S., Chowdhury, M.E., Gabbouj, M.: OSegNet: operational segmentation network for COVID-19 detection using chest X-ray images. In: Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), pp. 2306–2310. IEEE (2022)
Google Scholar
Morozov, S.P., et al.: MosMedData: Chest CT scans with COVID-19 related findings dataset. arXiv preprint arXiv:2005.06465 (2022)
Fan, D.P., et al.: PraNet: parallel reverse attention network for polyp segmentation. In: Martel, A.L., et al.(eds.) Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. MICCAI 2020. Lecture Notes in Computer Science, vol. 12266. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59725-2_26
Bernal, J., Sánchez, F.J., Fernández-Esparrach, G., Gil, D., Rodríguez, C., Vilariño, F.: WM-DOVA maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians. CMIG 43, 99–111 (2015)
Google Scholar
Jha, D., et al.: Kvasir-SEG: a segmented polyp dataset. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 451–462. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_37
Silva, J., Histace, A., Romain, O., Dray, X., Granado, B.: Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer. Int. J. Comput. Assist. Radiol. Surg. 9(2), 283–293 (2014)
Article Google Scholar
Tajbakhsh, N., Gurudu, S.R., Liang, J.: Automated polyp detection in colonoscopy videos using shape and context information. IEEE Trans. Med. Imaging 35(2), 630–644 (2015)
Article Google Scholar
Vázquez, D., et al.: A benchmark for endoluminal scene segmentation of colonoscopy images. J. Healthc. Eng. 2017, 4037190 (2017)
Article Google Scholar
Zhang, S., et al.: Large-scale domain-specific pretraining for biomedical vision-language processing. arXiv preprint arXiv:2303.00915 (2023)
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986 (2022)
Google Scholar
Boecking, B., et al.: Making the most of text semantics to improve biomedical vision-language processing. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part XXXVI, pp. 1–21. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20059-5_1

Download references

Acknowledgments

This project is funded by the Cross-Innovative Talent Project of RenMin Hospital of Wuhan University (Project No: JCRCZN-2022-006).

Author information

Authors and Affiliations

School of Computer Science, Wuhan University, Wuhan, China
Xu Zhang & Lefei Zhang
Computer School, Hubei Polytechnic University, Huangshi, China
Bo Ni
Renmin Hospital, Wuhan University, Wuhan, China
Yang Yang

Authors

Xu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Ni
View author publications
You can also search for this author in PubMed Google Scholar
Yang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Lefei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yang Yang or Lefei Zhang .

Editor information

Editors and Affiliations

Children’s National Hospital/George Washington University, Washington, DC, USA
Marius George Linguraru
The Chinese University of Hong Kong, Hong Kong, China
Qi Dou
Technical University of Denmark, Kgs Lyngby, Denmark
Aasa Feragen
Imperial College London, London, UK
Stamatia Giannarou
Imperial College London, London, UK
Ben Glocker
Universitat de Barcelona, Barcelona, Spain
Karim Lekadir
Helmholtz Munich, Technical University of Munich and King’s College London, Munich, Germany
Julia A. Schnabel

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 134 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Ni, B., Yang, Y., Zhang, L. (2024). MAdapter: A Better Interaction Between Image and Language for Medical Image Segmentation. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15009. Springer, Cham. https://doi.org/10.1007/978-3-031-72114-4_41

Download citation

DOI: https://doi.org/10.1007/978-3-031-72114-4_41
Published: 03 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72113-7
Online ISBN: 978-3-031-72114-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

MAdapter: A Better Interaction Between Image and Language for Medical Image Segmentation