Diffusion Transformer U-Net for Medical Image Segmentation

Chowdary, G. Jignesh; Yin, Zhaozheng

doi:10.1007/978-3-031-43901-8_59

G. Jignesh Chowdary¹⁴ &
Zhaozheng Yin¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14223))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

5738 Accesses
2 Citations

Abstract

Diffusion model has shown its power on various generation tasks. When applying the diffusion model in medical image segmentation, there are a few roadblocks to remove: the semantic features required for the conditioning of the diffusion process are not well aligned with the noise embedding; and the U-Net backbone employed in these diffusion models is not sensitive to contextual information that is essential during the reverse diffusion process for accurate pixel-level segmentation. To overcome these limitations, we present a cross-attention module to enhance the conditioning from source images, and a transformer based U-Net with multi-sized windows for the extraction of various scales of contextual information. Evaluated on five benchmark datasets with different imaging modalities including Kvasir-Seg, CVC Clinic DB, ISIC 2017, ISIC 2018, and Refuge, our diffusion transformer U-Net achieves great generalization ability and outperforms all the state-of-the-art models on these datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bernal, J., Sánchez, F.J., Fernández-Esparrach, G., Gil, D., Rodríguez, C., Vilariño, F.: Wm-dova maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians. Comput. Med. Imaging Graph. 43, 99–111 (2015)
Google Scholar
Cao, H., et al.: Swin-unet: Unet-like pure transformer for medical image segmentation. In: ECCV 2022, Part III, pp. 205–218. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25066-8_9
Chen, J., et al.: Transunet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Codella, N., et al.: Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the international skin imaging collaboration (ISIC). arXiv preprint arXiv:1902.03368 (2019)
Codella, N.C., et al.: Skin lesion analysis toward melanoma detection: a challenge at the 2017 international symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (isic). In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pp. 168–172. IEEE (2018)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Gu, R., et al.: Ca-net: comprehensive attention convolutional neural networks for explainable medical image segmentation. IEEE Trans. Med. Imaging 40(2), 699–711 (2020)
Article Google Scholar
Heidari, M., et al.: Hiformer: hierarchical multi-scale representations using transformers for medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6202–6212 (2023)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840–6851 (2020)
Google Scholar
Jha, D., et al.: Kvasir-SEG: a segmented polyp dataset. In: Ro, Y.M., Cheng, W.-H., Kim, J., Chu, W.-T., Cui, P., Choi, J.-W., Hu, M.-C., De Neve, W. (eds.) MMM 2020. LNCS, vol. 11962, pp. 451–462. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_37
Chapter Google Scholar
Lin, A., Xu, J., Li, J., Lu, G.: Contrans: improving transformer with convolutional attention for medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention-MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part V. pp. 297–307. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16443-9_29
Nichol, A.Q., Dhariwal, P.: Improved denoising diffusion probabilistic models. In: International Conference on Machine Learning, pp. 8162–8171. PMLR (2021)
Google Scholar
Oktay, O., et al.: Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018)
Orlando, J.I., et al.: Refuge challenge: a unified framework for evaluating automated methods for glaucoma assessment from fundus photographs. Med. Image Anal. 59, 101570 (2020)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Sharma, P., Gautam, A., Maji, P., Pachori, R.B., Balabantaray, B.K.: Li-segpnet: encoder-decoder mode lightweight segmentation network for colorectal polyps analysis. IEEE Trans. Biomed. Eng. (2022)
Google Scholar
Srivastava, A., et al.: Msrf-net: a multi-scale residual fusion network for biomedical image segmentation. IEEE J. Biomed. Health Inform. 26(5), 2252–2263 (2021)
Article MathSciNet Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Google Scholar
Tschandl, P., Rosendahl, C., Kittler, H.: The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data 5(1), 1–9 (2018)
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Google Scholar
Wang, J., Huang, Q., Tang, F., Meng, J., Su, J., Song, S.: Stepwise feature fusion: Local guides global. In: Medical Image Computing and Computer Assisted Intervention-MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part III. pp. 110–120. Springer (2022)
Google Scholar
Wu, H., Chen, S., Chen, G., Wang, W., Lei, B., Wen, Z.: Fat-net: feature adaptive transformers for automated skin lesion segmentation. Med. Image Anal. 76, 102327 (2022)
Article Google Scholar
Wu, J., Fang, H., Zhang, Y., Yang, Y., Xu, Y.: Medsegdiff: medical image segmentation with diffusion probabilistic model. arXiv preprint arXiv:2211.00611 (2022)
Wu, J., Fu, R., Fang, H., Zhang, Y., Xu, Y.: Medsegdiff-v2: diffusion based medical image segmentation with transformer. arXiv preprint arXiv:2301.11798 (2023)
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: simple and efficient design for semantic segmentation with transformers. Adv. Neural. Inf. Process. Syst. 34, 12077–12090 (2021)
Google Scholar
Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: Unet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39(6), 1856–1867 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Stony Brook University, Stony Brook, NY, USA
G. Jignesh Chowdary & Zhaozheng Yin

Authors

G. Jignesh Chowdary
View author publications
You can also search for this author in PubMed Google Scholar
Zhaozheng Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. Jignesh Chowdary .

Editor information

Editors and Affiliations

Icahn School of Medicine, Mount Sinai, NYC, NY, USA, Tel Aviv University, Tel Aviv, Israel
Hayit Greenspan
Emory University, Atlanta, GA, USA
Anant Madabhushi
Queen's University, Kingston, ON, Canada
Parvin Mousavi
The University of British Columbia, Vancouver, BC, Canada
Septimiu Salcudean
Yale University, New Haven, CT, USA
James Duncan
IBM Research, San Jose, CA, USA
Tanveer Syeda-Mahmood
Johns Hopkins University, Baltimore, MD, USA
Russell Taylor

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2607 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chowdary, G.J., Yin, Z. (2023). Diffusion Transformer U-Net for Medical Image Segmentation. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14223. Springer, Cham. https://doi.org/10.1007/978-3-031-43901-8_59

Download citation

DOI: https://doi.org/10.1007/978-3-031-43901-8_59
Published: 01 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43900-1
Online ISBN: 978-3-031-43901-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Diffusion Transformer U-Net for Medical Image Segmentation