Multiscale transunet +  + : dense hybrid U-Net with transformer for medical image segmentation

Wang, Bo; Wang, ·Fan; Dong, Pengwei; Li, ·Chongyi

doi:10.1007/s11760-021-02115-w

Multiscale transunet + + : dense hybrid U-Net with transformer for medical image segmentation

Original Paper
Published: 27 January 2022

Volume 16, pages 1607–1614, (2022)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Bo Wang¹,
·Fan Wang¹,
Pengwei Dong¹ &
…
·Chongyi Li²

2503 Accesses
13 Citations
1 Altmetric
Explore all metrics

Abstract

Automatic medical image segmentation as assistance to doctors is important for diagnosis and treatment of various diseases. TransUNet that integrates the advantages of transformer and CNN has achieved success in medical image segmentation tasks. However, TransUNet simply combines feature maps between encoder and decoder via skip connections at the same resolution, which leads to be an unnecessarily restrictive fusion design. Moreover, the positional encoding and input tokens in standard transformer blocks of TransUNet have a fixed scale, which are not suitable for dense prediction. To alleviate the above problems, in this paper, we propose a novel architecture named multiscale TransUNet + + (MS-TransUNet + +), which employs a multiscale and flexible feature fusion scheme between encoder and decoder at different levels. The novel skip connections densely bridge the extracted feature representations with different resolutions, and the hybrid CNN-Transformer encoder with long-range dependencies directly passes the high-level features to each stage of decoder. Besides, in order to obtain more effective feature representations, an efficient multi-scale visual transformer is introduced for feature encoder. More importantly, we employ a weighted loss function composed of focal, multiscale structure similarity and Jaccard index to penalize the training error of medical image segmentation, jointly realizing pixel-level, patch-level and map-level optimization. Extensive experimental results demonstrate that our proposed multiscale TransUNet + + can achieve competitive performance for prostate MR and liver CT image segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation

3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation

References

Gu, R., Wang, G., Song, T., et al.: CA-Net: comprehensive attention convolutional neural networks for explainable medical image segmentation. IEEE Trans. Med. Imaging 40(2), 699–711 (2021)
Article Google Scholar
Farhangi, M.M., Frigui, H., Seow, A., et al.: 3-D active contour segmentation based on sparse linear combination of training shapes (SCoTS). IEEE Trans. Med. Imaging 36(11), 2239–2249 (2017)
Article Google Scholar
Tang, Z., Ahmad, S., Yap, P.T., et al.: Multi-atlas segmentation of MR tumor brain images using low-rank based image recovery. IEEE Trans. Med. Imaging 37(10), 2224–2235 (2018)
Article Google Scholar
Roy, A.G., Siddiqui, S., Plsterl, S., et al.: ‘Squeeze & excite’ guided few shot segmentation of volumetric images. Med. Image Anal. 59, 1–12 (2020)
Google Scholar
Zhang, J., Xie, Y., Wang, Y., et al.: Inter-slice context residual learning for 3D medical image segmentation. IEEE Trans. Med. Imaging 40(2), 661–672 (2021)
Article Google Scholar
Litjens, G., Kooi, T., Bejnordi, B.E., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III, pp. 234–241. Springer International Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Milletari, F., Navab, N., Ahmadi, S.-A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV). IEEE (2016)
Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., et al.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: International conference on medical image computing and computer-assisted intervention (2016)
Zhou, Z., Siddiquee, R., Tajbakhsh, N., et al.: UNet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39(6), 1856–1867 (2019)
Article Google Scholar
Huang, H., Lin, L., Tong, R., et al.: Unet 3+: A full-scale connected unet for medical image segmentation. In: IEEE international conference on acoustics, speech and signal processing (2020)
Li, X., Hao, C., Qi, X., et al.: H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes. IEEE Trans. Med. Imaging 37(12), 2663–2674 (2018)
Article Google Scholar
Jose, J. M., Sindagi, V., Hacihaliloglu, I., et al: Kiu-net: Towards accurate segmentation of biomedical images using over-complete representations. In: International conference on medical image computing and computer-assisted intervention. (2020)
Bo, W., Lei, Y., Tian, S., et al.: Deeply supervised 3D fully convolutional networks with group dilated convolution for automatic MRI prostate segmentation. Med. Phys. 46(4), 1707–1718 (2019)
Article Google Scholar
Zhang, L., Zhang, J., Li, Z., et al.: A multiple-channel and atrous convolution network for ultrasound image segmentation. Med. Phys. 47(12), 6270–6285 (2020)
Article Google Scholar
Schlemper, J., Oktay, O., Schaap, M., et al.: Attention gated networks: learning to leverage salient regions in medical images. Med. Image Anal. 53, 197–207 (2019)
Article Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020
Touvron, H., Cord, M., Douze, M., et al.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357 (2021)
Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 , 2021
Chen, J., Lu, Y., Yu, Q., et al.: TransUNet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306, 2021
Hatamizadeh, A., Yang, D., Roth, H., et al.: Unetr: transformers for 3d medical image segmentation. arXiv preprint arXiv:2103.10504, 2021
Valanarasu, J., Oza, P., Hacihaliloglu, I., et al.: Medical transformer: gated axial-attention for medical image segmentation. arXiv preprint arXiv:2102.10662 , 2021
Zhang, Y., Liu, H., Hu, Q., et al.: Transfuse: fusing transformers and cnns for medical image segmentation. arXiv preprint arXiv:2102.08005 , 2021
Wang, W., Chen, C., Ding, M., et al.: Transbts: multimodal brain tumor segmentation using transformer. arXiv preprint arXiv:2103.04430, 2021
Xie, Y., Zhang, J., Shen, C., et al.: CoTr: efficiently bridging CNN and transformer for 3d medical image segmentation. arXiv preprint arXiv:2103.03024, 2021
Sudre, C.H., Li, W., Vercauteren, T., Sebastien Ourselin, M., Cardoso, J.: Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Jorge Cardoso, M., Arbel, Tal, Carneiro, G., Syeda-Mahmood, T., João Manuel, R.S., Tavares, M.M., Bradley, A., Greenspan, H., Papa, J.P., Madabhushi, A., Nascimento, J.C., Cardoso, J.S., Belagiannis, V., Zhi, L. (eds.) Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 240–248. Springer International Publishing, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_28
Chapter Google Scholar
Devlin, J., Chang, M. W., Lee, K., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Conference on Neural Information Processing Systems. (2017)
Gao, Y., Zhou, M., Metaxas, D., et al.: Utnet: a hybrid transformer architecture for medical image segmentation. arXiv preprint arXiv:2107.00781 , 2021
Zhang, Q., Yang, Y.: ResT: an efficient transformer for visual recognition. arXiv preprint arXiv:2105.13677, 2021
Lin, T.Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 318–327 (2020)
Article Google Scholar
Wang, Z., Simoncelli, E. P., Bovik, A. C., et al.: Multiscale structural similarity for image quality assessment. In: Asilomar Conference on Signals, System & Computers. (2003)
Yu, J., Jiang, Y., Wang, Z., et al.: UnitBox: an advanced object detection network. In: Proceedings of the 2016 ACM Multimedia Conference. (2016)
Litjens, G., Toth, R., van de Ven, W., et al.: Evaluation of prostate segmentation algorithms for mri: the promise12 challenge. Med. Image Anal. 18(2), 359–373 (2014)
Article Google Scholar
Bilic, P., Christ, P. F., Vorontsov, E., et al.: The liver tumor segmentation benchmark (lits). arXiv preprint arXiv:1901.04056 (2019)
Meyer, A., Chlebus, G., Rak, G., et al.: Anisotropic 3d multi-stream cnn for accurate prostate segmentation from multi-planar mri. Comput. Methods Programs Biomed. 200, 105821 (2020)
Article Google Scholar
Li, C., Tan, Y., Chen, W., et al.: Attention unet++: a nested attention-aware U-Net for liver CT image segmentation. In: IEEE International conference on image processing (2020)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 62041108); the Natural Science Foundation of Ningxia (No. 2020AAC03029); Innovation and Entrepreneurship Project for Returnees in Ningxia 2020.

Funding

This work was supported by the National Natural Science Foundation of China (No. 62041108); the Natural Science Foundation of Ningxia (No. 2020AAC03029); Innovation and Entrepreneurship Project for Returnees in Ningxia 2020.

Author information

Authors and Affiliations

School of Physics and Electronic-Electrical Engineering, Ningxia University, Yinchuan, 750021, People’s Republic of China
Bo Wang, ·Fan Wang & Pengwei Dong
School of Computer Science and Engineering, Nanyang Technological University, Singapore, 639798, Singapore
·Chongyi Li

Authors

Bo Wang
View author publications
You can also search for this author in PubMed Google Scholar
·Fan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Pengwei Dong
View author publications
You can also search for this author in PubMed Google Scholar
·Chongyi Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Wang.

Ethics declarations

Conflicts of interest

No conflict of interest exists in the submission of this manuscript, and the manuscript is approved by all authors for publication.

Code availability

The code can be shared in the near future for the sake of development.

Data availability

The raw data can be shared if the researchers need to do research on relevant topic and cite it in their papers.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, B., Wang, ·., Dong, P. et al. Multiscale transunet + + : dense hybrid U-Net with transformer for medical image segmentation. SIViP 16, 1607–1614 (2022). https://doi.org/10.1007/s11760-021-02115-w

Download citation

Received: 25 August 2021
Revised: 21 November 2021
Accepted: 05 December 2021
Published: 27 January 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s11760-021-02115-w

Keyword

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiscale transunet + + : dense hybrid U-Net with transformer for medical image segmentation

Abstract

Access this article

Similar content being viewed by others

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation

3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Code availability

Data availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keyword

Navigation

Multiscale transunet + + : dense hybrid U-Net with transformer for medical image segmentation

Abstract

Access this article

Similar content being viewed by others

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation

3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Code availability

Data availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keyword

Search

Navigation