Text-Guided Cross-Position Attention for Segmentation: Case of Medical Image

Lee, Go-Eun; Kim, Seon Ho; Cho, Jungchan; Choi, Sang Tae; Choi, Sang-Il

doi:10.1007/978-3-031-43904-9_52

Go-Eun Lee¹⁴,
Seon Ho Kim¹⁵,
Jungchan Cho¹⁶,
Sang Tae Choi¹⁷ &
…
Sang-Il Choi¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14224))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

3650 Accesses

Abstract

We propose a novel text-guided cross-position attention module which aims at applying a multi-modality of text and image to position attention in medical image segmentation. To match the dimension of the text feature to that of the image feature map, we multiply learnable parameters by text features and combine the multi-modal semantics via cross-attention. It allows a model to learn the dependency between various characteristics of text and image. Our proposed model demonstrates superior performance compared to other medical models using image-only data or image-text data. Furthermore, we utilize our module as a region of interest (RoI) generator to classify the inflammation of the sacroiliac joints. The RoIs obtained from the model contribute to improve the performance of classification models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Bhalodia, R., et al.: Improving pneumonia localization via cross-attention on medical images and reports. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12902, pp. 571–581. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87196-3_53
Chapter Google Scholar
Chen, C.F.R., Fan, Q., Panda, R.: Crossvit: cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 357–366 (2021)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Fu, J., et al.: Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)
Google Scholar
Haghanifar, A., Majdabadi, M.M., Choi, Y., Deivalakshmi, S., Ko, S.: Covid-cxnet: detecting covid-19 in frontal chest x-ray images using deep learning. Multimedia Tools Appl. 81(21), 30615–30645 (2022)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Kumar, N., Verma, R., Sharma, S., Bhargava, S., Vahadane, A., Sethi, A.: A dataset and a technique for generalized nuclear segmentation for computational pathology. IEEE Trans. Med. Imaging 36(7), 1550–1560 (2017)
Article Google Scholar
Lee, K.H., Choi, S.T., Lee, G.Y., Ha, Y.J., Choi, S.I.: Method for diagnosing the bone marrow edema of sacroiliac joint in patients with axial spondyloarthritis using magnetic resonance image analysis based on deep learning. Diagnostics 11(7), 1156 (2021)
Article Google Scholar
Li, Z., et al.: Lvit: language meets vision transformer in medical image segmentation. arXiv preprint arXiv:2206.14718 (2022)
Oktay, O., et al.: Attention u-net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763 (2021)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Shome, D., et al.: Covid-transformer: interpretable covid-19 detection using vision transformer for healthcare. Int. J. Environ. Res. Public Health 18(21), 11086 (2021)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: gated axial-attention for medical image segmentation. In: de Bruijne, M., Cattin, P.C., Cotin, S., Padoy, N., Speidel, S., Zheng, Y., Essert, C. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 36–46. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_4
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 1–11 (2017)
Google Scholar
Wang, W., Chen, C., Ding, M., Yu, H., Zha, S., Li, J.: TransBTS: multimodal brain tumor segmentation using transformer. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 109–119. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_11
Chapter Google Scholar
Wang, Z., Wu, Z., Agarwal, D., Sun, J.: Medclip: contrastive learning from unpaired medical images and text. arXiv preprint arXiv:2210.10163 (2022)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Google Scholar
Xing, Z., Yu, L., Wan, L., Han, T., Zhu, L.: Nestedformer: nested modality-aware transformer for brain tumor segmentation. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) MICCAI 2022. LNCS, vol. 13435, pp. 140–150. Springer, Heidelberg (2022)
Chapter Google Scholar
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: a nested U-net architecture for medical image segmentation. In: Stoyanov, D., et al. (eds.) DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1
Chapter Google Scholar

Download references

Acknowledgements

This work was supported by the MSIT (Ministry of Science, ICT), Korea, under the High-Potential Individuals Global Training Program (RS-2022-00155227) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation), the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (2021R1A2B5B01001412), and the Bio & Medical Technology Development Program of the National Research Foundation (NRF) funded by the Korean government (MSIT) (RS-2023-00220408).

Author information

Authors and Affiliations

Dankook University, Yongin, Gyeonggi-do, Korea
Go-Eun Lee & Sang-Il Choi
University of Southern California, Los Angeles, USA
Seon Ho Kim
Gachon University, Seongnam, Gyeonggi-do, Korea
Jungchan Cho
Chung-Ang University College of Medicine, Seoul, Korea
Sang Tae Choi

Authors

Go-Eun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Seon Ho Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jungchan Cho
View author publications
You can also search for this author in PubMed Google Scholar
Sang Tae Choi
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Il Choi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Sang Tae Choi or Sang-Il Choi .

Editor information

Editors and Affiliations

Icahn School of Medicine, Mount Sinai, NYC, NY, USA, Tel Aviv University, Tel Aviv, Israel
Hayit Greenspan
Emory University, Atlanta, GA, USA
Anant Madabhushi
Queen’s University, Kingston, ON, Canada
Parvin Mousavi
The University of British Columbia, Vancouver, BC, Canada
Septimiu Salcudean
Yale University, New Haven, CT, USA
James Duncan
IBM Research, San Jose, CA, USA
Tanveer Syeda-Mahmood
Johns Hopkins University, Baltimore, MD, USA
Russell Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, GE., Kim, S.H., Cho, J., Choi, S.T., Choi, SI. (2023). Text-Guided Cross-Position Attention for Segmentation: Case of Medical Image. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14224. Springer, Cham. https://doi.org/10.1007/978-3-031-43904-9_52

Download citation

DOI: https://doi.org/10.1007/978-3-031-43904-9_52
Published: 01 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43903-2
Online ISBN: 978-3-031-43904-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Text-Guided Cross-Position Attention for Segmentation: Case of Medical Image