Skip to main content

Advertisement

A swin-transformer-based network with inductive bias ability for medical image segmentation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Accurately segmenting organs, or diseased regions of varying sizes, is a challenge in medical image segmentation tasks with limited datasets. Although Transformer-based methods have self-attention mechanisms, excellent global modelling abilities and can effectively focus on crucial areas in medical images, they still face the intractable issues of computational complexity and excessive reliance on data. The Swin-Transformer has partly addressed the problem of computational complexity, however the method still requires large amounts of training data due to its lack of inductive bias capabilities. When applied to medical image segmentation tasks with small datasets, this leads to suboptimal performances. In contrast, the CNN-based methods can compensate for this limitation. To address this issue further, this paper proposes Swin-IBNet, which combines the Swin-Transformer with a CNN in a novel manner to imbue it with inductive bias capabilities, reducing its reliance on data. During the encoding process of Swin-IBNet, two novel and crucial modules, the feature fusion block (FFB) and the multiscale feature aggregation block (MSFA), are designed. The FFB is responsible for propagating the inductive bias capability to the Swin-Transformer encoder. Different from the previous use of multiscale features, MSFA efficiently leverages multiscale information from different layers through self-learning. This paper not only attempts to analyse the interpretability of the proposed Swin-IBNet but also performs more verifications on the public Synapse, ISIC 2018 and ACDC datasets. The experimental results show that Swin-IBNet is superior to the baseline method, Swin-Unet, and several state-of-the-art methods. Especially on the Synapse dataset, the DSC of Swin-IBNet surpasses that of Swin-Unet by 3.45%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data available statement

All the data used in this work are publicly available to the researchers.

References

  1. Liu C, Xie H, Zha Z, Yu L, Chen Z, Zhang Y (2019) Bidirectional attention-recognition model for fine-grained object classification. IEEE Trans Multimedia 22(7):1785–1795

    Article  MATH  Google Scholar 

  2. Min S, Yao H, Xie H, Zha Z, Zhang Y (2020) Domain-oriented semantic embedding for zero-shot learning. IEEE Trans Multimedia 23:3919–3930

    Article  MATH  Google Scholar 

  3. Min S, Yao H, Xie H, Zha Z, Zhang Y (2020) Multi-objective matrix normalization for fine-grained visual recognition. IEEE Trans Image Process 29:4996–5009

    Article  MATH  Google Scholar 

  4. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst pp 91–99

  5. Barroso-Laguna A, Mikolajczyk K (2022) Key. net: Keypoint detection by handcrafted and learned cnn filters revisited. IEEE Trans Pattern Anal Mach Intell 45(1):698–711

  6. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proc IEEE ICCV, pp 2980–2988

  7. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proc. IEEE Conf comput vis pattern recognit pp 3431–3440

  8. Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proc 15th Eur conf pp 833–851

  9. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Proc. Int. Conf. Med. Image Comput. Comput.-Assisted Intervention., pp. 234–241

  10. Tian Y, Yang G, Wang Z et al (2020) Instance segmentation of apple flowers using the improved mask r-cnn model. Biosys Eng 193:264–278

    Article  MATH  Google Scholar 

  11. Han Z, Jian M, Wang G-G (2022) Convunext: An efficient convolution neural network for medical image segmentation. Knowl-based Syst 253

  12. Vaswani Aea (2017) Attention is all you need. Advances in neural information processing systems., 6000–6010

  13. Wang W et al (2021) Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 568–578

  14. Yang Y, Zhang L, Ren L, Wang X (2023) Mmvit-seg: A lightweight transformer and cnn fusion network for covid-19 segmentation. Comput Methods Programs Biomed 230:106365

    Article  MATH  Google Scholar 

  15. Li X et al (2023) Attransunet: An enhanced hybrid transformer architecture for ultrasound and histopathology image segmentation. Comput Biol Med 152:106365

    Article  MATH  Google Scholar 

  16. Gao C, Ye H, Cao F, Wen C, Zhang Q, Zhang F (2021) Multiscale fused network with additive channel-spatial attention for image segmentation. Knowl-Based Syst 214:106754

    Article  MATH  Google Scholar 

  17. Lin F, Liang Z, Wu S, He J, Chen K, Tian S (2023) Structtoken: Rethinking semantic segmentation with structural prior. IEEE Transactions on circuits and systems for video technology

  18. Park K-B, Lee JY (2022) Swine-net: Hybrid deep learning approach to novel polyp segmentation using convolutional neural network and swin transformer. J Comput Des Eng 9(2):616–632

  19. Liu Y, Wang H, Chen Z, Huangliang K, Zhang H (2022) Transu-net +: Redesigning the skip connection to enhance features in medical image segmentation. Knowl-Based Syst 256:109859

    Article  Google Scholar 

  20. Tang P et al (2022) Unified medical image segmentation by learning from uncertainty in an end-to-end manner. Knowl-Based Syst

  21. Qi M et al (2022) Ftc-net: Fusion of transformer and cnn features for infrared small target detection. IEEE Journal of selected topics in applied earth observations and remote sensing. 15:8613–8623

    Article  MATH  Google Scholar 

  22. Gao G, Xu Z, Li J et al (2023) Ctcnet: A cnn-transformer cooperation network for face image super-resolution. IEEE Trans Image Process pp 1978–1991

  23. Li W, Xue L, Wang X et al (2023) Convtransnet: A cnn-transformer network for change detection with multi-scale global-local representations. IEEE Trans Geosci Remote Sens 61

  24. Dosovitskiy A et al (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: International conference on learning representations (ICLR)

  25. Liu Z et al (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022

  26. Sun L, Zhao G, Zheng Y et al (2022) Spectral-spatial feature tokenization transformer for hyperspectral image classification. IEEE Trans Geosci Remote Sens 60:1–14

    Article  MATH  Google Scholar 

  27. Hong D, Han Z, Yao J et al (2021) Spectralformer: Rethinking hyperspectral image classification with transformers. IEEE Trans Geosci Remote Sens 60:1–15

    Article  MATH  Google Scholar 

  28. Touvron H, Bojanowski P, Caron M et al (2022) Resmlp: Feedforward networks for image classification with data-efficient training. IEEE Trans Pattern Anal Mach Intell 45:5314–5321

    Article  MATH  Google Scholar 

  29. Remote sensing image change detection with transformers (2021) Chen H, SZ. Qi Z. IEEE Trans Geosci Remote Sens 60:1–14

    Google Scholar 

  30. Li K, Wang Y, Zhang J et al (2023) Uniformer: Unifying convolution and self-attention for visual recognition. IEEE Trans Pattern Anal Mach Intell 45:12581–12600

  31. Li Y, Yao T, Pan Y et al (2022) Contextual transformer networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 45:1489–1500

    Article  MATH  Google Scholar 

  32. Chen J al (2021) Transunet: Transformers make strong encoders for medical image segmentation. CoRR. abs/2102.04306, pp 1–13

  33. Cao H al (2022) Swin-unet: Unet-like pure transformer for medical image segmentation. In: European conference on computer vision (ECCV), pp 205–218

  34. Wang L, Li R, Zhang C et al (2022) Unetformer: A unet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS J Photogramm Remote Sens 190:196–214

  35. Zhu Z, He X, Qi G et al (2023) Brain tumor segmentation based on the fusion of deep semantics and edge information in multimodal mri. Inform Fusion 91:376–387

    Article  MATH  Google Scholar 

  36. Yuan F, Zhang Z, Fang Z (2023) An effective cnn and transformer complementary network for medical image segmentation. Pattern Recogn 136:109228

    Article  MATH  Google Scholar 

  37. Zhang C, Jiang W, Zhang Y et al (2022) Transformer and cnn hybrid deep neural network for semantic segmentation of very-high-resolution remote sensing imagery. IEEE Trans Geosci Remote Sens 60:1–20

    MATH  Google Scholar 

  38. Ding W, Wang H, Huang J et al (2023) Ftranscnn: Fusing transformer and a cnn based on fuzzy logic for uncertain medical image segmentation. Inform Fusion 99: 101880

  39. Zhao Z, Li Q, Zhang Z et al (2021) Combining a parallel 2d cnn with a self-attention dilated residual network for ctc-based discrete speech emotion recognition. Neural Netw 141:52–60

    Article  MATH  Google Scholar 

  40. Mi Z, Jiang X, Sun T et al (2020) Gan-generated image detection with self-attention mechanism against gan generator defect. IEEE J Sel Top Signal Process 14:969–981

    Article  MATH  Google Scholar 

  41. Zeng W, Li M (2020) Crop leaf disease recognition based on self-attention convolutional neural network. Comput Electron Agric 172:105341

    Article  MATH  Google Scholar 

  42. Rao D, Xu T, Wu X (2023) Tgfuse: An infrared and visible image fusion approach based on transformer and generative adversarial network. IEEE Trans Image Process

  43. Yu H, Xu Z, Zheng K et al (2022) Mstnet: A multilevel spectral-spatial transformer network for hyperspectral image classification. IEEE Trans Geosci Remote Sens 60:1–13

  44. Wu H, Zhang M, Huang P et al (2024) Cmlformer: Cnn and multi-scale local-context transformer network for remote sensing images semantic segmentation. IEEE J Sel Top Appl Earth Obs Remote Sens pp 1–10

  45. Geng Z, Chen Z, Meng Q et al (2021) Novel transformer based on gated convolutional neural network for dynamic soft sensor modeling of industrial processes. IEEE Trans Industr Inf 18:1521–1529

    Article  MATH  Google Scholar 

  46. Song R, Feng Y, Cheng W et al (2022) Bs2t: Bottleneck spatial-spectral transformer for hyperspectral image classification. IEEE Trans Geosci Remote Sens 60:1–17

    MATH  Google Scholar 

  47. Xie X, Wu D, Xie M et al (2024) Ghostformer: Efficiently amalgamated cnn-transformer architecture for object detection. Pattern Recogn 148:110172

    Article  MATH  Google Scholar 

  48. Kang J, Guan H, Ma L et al (2023) Waterformer: A coupled transformer and cnn network for waterbody detection in optical remotely-sensed imagery. ISPRS J Photogramm Remote Sens 206:222–241

    Article  Google Scholar 

  49. Wang C, Xu M, Jiang Y et al (2022) Translution-snet: A semisupervised hyperspectral image stripe noise removal based on transformer and cnn. IEEE Trans Geosci Remote Sens 60:1–14

    Google Scholar 

  50. Zhang Q, Xu Y, Zhang J et al (2023) Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. Int J Comput Vision 131:1141–1162

    Article  MATH  Google Scholar 

  51. Sartran L, Barrett S, Kuncoro A et al (2022) Transformer grammars: Augmenting transformer language models with syntactic inductive biases at scale. Trans Assoc Comput Linguist 10:1423–1439

    Article  Google Scholar 

  52. Hao S, Li N, Ye Y (2023) Inductive biased swin-transformer with cyclic regressor for remote sensing scene classification. IEEE J Sel Top Appl Earth Obs Remote Sens 16:6265–6278

    Article  MATH  Google Scholar 

  53. Graham B et al (2021) Levit: A vision transformer in convnet’s clothing for faster inference. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 12259–12269

  54. Zhang Q, Yang Y-B (2021) Rest: An efficient transformer for visual recognition. Adv Neural Inform Process Syst 34:15475–15485

  55. Heo B, Yun S, Han D, Chun S, Choe J, Oh SJ (2021) Rethinking spatial dimensions of vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 11916–11925

  56. Zhang Z et al (2022) Nested hierarchical transformer: Towards accurate data-efficient and interpretable visual understanding. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 3417–3425

  57. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognitio (CVPR), pp 2117–2125

  58. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1440–1448

  59. Wang W et al (2022) Pvtv 2: Improved baselines with pyramid vision transformer. Comput Vis Media 8(3):1–10

    Google Scholar 

  60. Xu W, Xu Y, Chang T, Tu Z (2021) Co-scale conv-attentional image transformers. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 9981–9990

  61. Chen C-F, Fan Q, Panda R (2021) Crossvit: Cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF international conference on computer visio (ICCV), pp 357–366

  62. Codella NCF, Gutman D, Celebi ME et al (2018) Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In: 2018 IEEE 15th International symposium on biomedical imaging (ISBI 2018). IEEE, pp 168–172

  63. Fu H, Xu Y, Lin S, Wong DWK, Liu J (2016) Deepvessel: Retinal vessel segmentation via deep learning and conditional random field. In: Medical image computing and computer-assisted intervention–MICCAI 2016: 19th international conference, pp 132–139

  64. Wang H et al (2022) Mixed transformer u-net for medical image segmentation. In: ICASSP 2022-2022 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 2390–2394

  65. Yan X, Tang H, Sun S, Ma H, Kong D, Xie X (2022) After-unet: Axial fusion transformer unet for medical image segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3971–3981

  66. Chang Y, Menghan H, Guangtao Z, Xiao-Ping Z (2022) Transclaw u-net: claw u-net with transformers for medical image segmentation. In: 2022 5th IEEE International conference information communication signal processing (ICICSP), pp 280–284

  67. Xie Y, Zhang J, Shen C, Xia Y (2021) Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation. In: Medical image computing and computer assisted intervention–MICCAI 2021: 24th international conference, pp 171–180

  68. Huang X, Deng Z, Li D, Yuan X, Fu Y (2022) Missformer: An effective transformer for 2d medical image segmentation. IEEE Trans Med Imaging

  69. Center for Biomedical Image Computing & Analytics. https://www.med.upenn.edu/cbica/captk. Accessed 16 Sept 2023

  70. Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2018) Unet++: A nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support, pp 3–11

  71. Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2020) Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans Med Imaging 39(6):1856–1867

    Article  Google Scholar 

  72. Wang J et al (2020) Deep high-resolution representation learning for visual recognition. IEEE Transactions on pattern analysis and machine intelligence, pp 5686–5696

  73. Jha D, Riegler MA, Johansen D, Halvorsen P, Johansen HD (2020) Doubleu-net: A deep convolutional neural network for medical image segmentation. In: 2020 IEEE 33rd International symposium on computer-based medical systems (CBMS), pp 558–564

  74. Jha D et al (2021) A comprehensive study on colorectal polyp segmentation with resunet++ conditional random field and test-time augmentation. IEEE J Biomed Health Inform 25(6):2029–2040

    Article  MATH  Google Scholar 

  75. Srivastava A et al (2022) Msrf-net: A multi-scale residual fusion network for biomedical image segmentation. IEEE J Biomed Health Inform 26(5):2252–2263

    Article  MATH  Google Scholar 

  76. Xu G et al (2022) Levit-unet: Make faster encoders with transformer for biomedical image segmentation. In: Chinese conference on pattern recognition and computer vision (PRCV)

  77. Xu Q, Ma Z, He N, Duan W (2023) Dcsau-net: A deeper and more compact split-attention u-net for medical image segmentation. Comput Biol Med 154:106626

Download references

Acknowledgements

This work was financially supported by the National Natural Science Foundation of China under Grant 62172184, the Science and Technology Development Plan of Jilin Province of China under Grants 20200401077GX and 20200201292JC and the Jilin Educational Scientific Research Leading Group (ZD21003, JS2338).

Author information

Authors and Affiliations

Authors

Contributions

Yan Gao conceived and designed the research, conducted the experiments and authored the paper, which was followed by iterative revisions of the manuscript. Professor Xiangjiu Che acted as the corresponding author, conducting manuscript checks, providing valuable insights, and managed communications with the journal, including correspondence and responses. Huan Xu assisted in manuscript reviews and proofreading. Quanle Liu contributed to graphical content suggestions. Mei Bie participated in the review of the reference materials.

Corresponding author

Correspondence to Xiangjiu Che.

Ethics declarations

Ethical and informed consent for data used

Ethical and informed consent for data used.

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, Y., Xu, H., Liu, Q. et al. A swin-transformer-based network with inductive bias ability for medical image segmentation. Appl Intell 55, 103 (2025). https://doi.org/10.1007/s10489-024-06029-1

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-06029-1

Keywords