A swin-transformer-based network with inductive bias ability for medical image segmentation

Gao, Yan; Xu, Huan; Liu, Quanle; Bie, Mei; Che, Xiangjiu

doi:10.1007/s10489-024-06029-1

A swin-transformer-based network with inductive bias ability for medical image segmentation

Published: 09 December 2024

Volume 55, article number 103, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yan Gao¹,
Huan Xu¹,
Quanle Liu¹,
Mei Bie¹ &
…
Xiangjiu Che ORCID: orcid.org/0000-0003-3799-1963¹

151 Accesses
Explore all metrics

Abstract

Accurately segmenting organs, or diseased regions of varying sizes, is a challenge in medical image segmentation tasks with limited datasets. Although Transformer-based methods have self-attention mechanisms, excellent global modelling abilities and can effectively focus on crucial areas in medical images, they still face the intractable issues of computational complexity and excessive reliance on data. The Swin-Transformer has partly addressed the problem of computational complexity, however the method still requires large amounts of training data due to its lack of inductive bias capabilities. When applied to medical image segmentation tasks with small datasets, this leads to suboptimal performances. In contrast, the CNN-based methods can compensate for this limitation. To address this issue further, this paper proposes Swin-IBNet, which combines the Swin-Transformer with a CNN in a novel manner to imbue it with inductive bias capabilities, reducing its reliance on data. During the encoding process of Swin-IBNet, two novel and crucial modules, the feature fusion block (FFB) and the multiscale feature aggregation block (MSFA), are designed. The FFB is responsible for propagating the inductive bias capability to the Swin-Transformer encoder. Different from the previous use of multiscale features, MSFA efficiently leverages multiscale information from different layers through self-learning. This paper not only attempts to analyse the interpretability of the proposed Swin-IBNet but also performs more verifications on the public Synapse, ISIC 2018 and ACDC datasets. The experimental results show that Swin-IBNet is superior to the baseline method, Swin-Unet, and several state-of-the-art methods. Especially on the Synapse dataset, the DSC of Swin-IBNet surpasses that of Swin-Unet by 3.45%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MBUTransNet: multi-branch U-shaped network fusion transformer architecture for medical image segmentation

Article 06 April 2023

Sub-pixel multi-scale fusion network for medical image segmentation

Article 11 October 2024

Swin-TransUper: Swin Transformer-based UperNet for medical image segmentation

Article 02 April 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data available statement

All the data used in this work are publicly available to the researchers.

References

Liu C, Xie H, Zha Z, Yu L, Chen Z, Zhang Y (2019) Bidirectional attention-recognition model for fine-grained object classification. IEEE Trans Multimedia 22(7):1785–1795
Article MATH Google Scholar
Min S, Yao H, Xie H, Zha Z, Zhang Y (2020) Domain-oriented semantic embedding for zero-shot learning. IEEE Trans Multimedia 23:3919–3930
Article MATH Google Scholar
Min S, Yao H, Xie H, Zha Z, Zhang Y (2020) Multi-objective matrix normalization for fine-grained visual recognition. IEEE Trans Image Process 29:4996–5009
Article MATH Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst pp 91–99
Barroso-Laguna A, Mikolajczyk K (2022) Key. net: Keypoint detection by handcrafted and learned cnn filters revisited. IEEE Trans Pattern Anal Mach Intell 45(1):698–711
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proc IEEE ICCV, pp 2980–2988
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proc. IEEE Conf comput vis pattern recognit pp 3431–3440
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proc 15th Eur conf pp 833–851
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Proc. Int. Conf. Med. Image Comput. Comput.-Assisted Intervention., pp. 234–241
Tian Y, Yang G, Wang Z et al (2020) Instance segmentation of apple flowers using the improved mask r-cnn model. Biosys Eng 193:264–278
Article MATH Google Scholar
Han Z, Jian M, Wang G-G (2022) Convunext: An efficient convolution neural network for medical image segmentation. Knowl-based Syst 253
Vaswani Aea (2017) Attention is all you need. Advances in neural information processing systems., 6000–6010
Wang W et al (2021) Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 568–578
Yang Y, Zhang L, Ren L, Wang X (2023) Mmvit-seg: A lightweight transformer and cnn fusion network for covid-19 segmentation. Comput Methods Programs Biomed 230:106365
Article MATH Google Scholar
Li X et al (2023) Attransunet: An enhanced hybrid transformer architecture for ultrasound and histopathology image segmentation. Comput Biol Med 152:106365
Article MATH Google Scholar
Gao C, Ye H, Cao F, Wen C, Zhang Q, Zhang F (2021) Multiscale fused network with additive channel-spatial attention for image segmentation. Knowl-Based Syst 214:106754
Article MATH Google Scholar
Lin F, Liang Z, Wu S, He J, Chen K, Tian S (2023) Structtoken: Rethinking semantic segmentation with structural prior. IEEE Transactions on circuits and systems for video technology
Park K-B, Lee JY (2022) Swine-net: Hybrid deep learning approach to novel polyp segmentation using convolutional neural network and swin transformer. J Comput Des Eng 9(2):616–632
Liu Y, Wang H, Chen Z, Huangliang K, Zhang H (2022) Transu-net +: Redesigning the skip connection to enhance features in medical image segmentation. Knowl-Based Syst 256:109859
Article Google Scholar
Tang P et al (2022) Unified medical image segmentation by learning from uncertainty in an end-to-end manner. Knowl-Based Syst
Qi M et al (2022) Ftc-net: Fusion of transformer and cnn features for infrared small target detection. IEEE Journal of selected topics in applied earth observations and remote sensing. 15:8613–8623
Article MATH Google Scholar
Gao G, Xu Z, Li J et al (2023) Ctcnet: A cnn-transformer cooperation network for face image super-resolution. IEEE Trans Image Process pp 1978–1991
Li W, Xue L, Wang X et al (2023) Convtransnet: A cnn-transformer network for change detection with multi-scale global-local representations. IEEE Trans Geosci Remote Sens 61
Dosovitskiy A et al (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: International conference on learning representations (ICLR)
Liu Z et al (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Sun L, Zhao G, Zheng Y et al (2022) Spectral-spatial feature tokenization transformer for hyperspectral image classification. IEEE Trans Geosci Remote Sens 60:1–14
Article MATH Google Scholar
Hong D, Han Z, Yao J et al (2021) Spectralformer: Rethinking hyperspectral image classification with transformers. IEEE Trans Geosci Remote Sens 60:1–15
Article MATH Google Scholar
Touvron H, Bojanowski P, Caron M et al (2022) Resmlp: Feedforward networks for image classification with data-efficient training. IEEE Trans Pattern Anal Mach Intell 45:5314–5321
Article MATH Google Scholar
Remote sensing image change detection with transformers (2021) Chen H, SZ. Qi Z. IEEE Trans Geosci Remote Sens 60:1–14
Google Scholar
Li K, Wang Y, Zhang J et al (2023) Uniformer: Unifying convolution and self-attention for visual recognition. IEEE Trans Pattern Anal Mach Intell 45:12581–12600
Li Y, Yao T, Pan Y et al (2022) Contextual transformer networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 45:1489–1500
Article MATH Google Scholar
Chen J al (2021) Transunet: Transformers make strong encoders for medical image segmentation. CoRR. abs/2102.04306, pp 1–13
Cao H al (2022) Swin-unet: Unet-like pure transformer for medical image segmentation. In: European conference on computer vision (ECCV), pp 205–218
Wang L, Li R, Zhang C et al (2022) Unetformer: A unet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS J Photogramm Remote Sens 190:196–214
Zhu Z, He X, Qi G et al (2023) Brain tumor segmentation based on the fusion of deep semantics and edge information in multimodal mri. Inform Fusion 91:376–387
Article MATH Google Scholar
Yuan F, Zhang Z, Fang Z (2023) An effective cnn and transformer complementary network for medical image segmentation. Pattern Recogn 136:109228
Article MATH Google Scholar
Zhang C, Jiang W, Zhang Y et al (2022) Transformer and cnn hybrid deep neural network for semantic segmentation of very-high-resolution remote sensing imagery. IEEE Trans Geosci Remote Sens 60:1–20
MATH Google Scholar
Ding W, Wang H, Huang J et al (2023) Ftranscnn: Fusing transformer and a cnn based on fuzzy logic for uncertain medical image segmentation. Inform Fusion 99: 101880
Zhao Z, Li Q, Zhang Z et al (2021) Combining a parallel 2d cnn with a self-attention dilated residual network for ctc-based discrete speech emotion recognition. Neural Netw 141:52–60
Article MATH Google Scholar
Mi Z, Jiang X, Sun T et al (2020) Gan-generated image detection with self-attention mechanism against gan generator defect. IEEE J Sel Top Signal Process 14:969–981
Article MATH Google Scholar
Zeng W, Li M (2020) Crop leaf disease recognition based on self-attention convolutional neural network. Comput Electron Agric 172:105341
Article MATH Google Scholar
Rao D, Xu T, Wu X (2023) Tgfuse: An infrared and visible image fusion approach based on transformer and generative adversarial network. IEEE Trans Image Process
Yu H, Xu Z, Zheng K et al (2022) Mstnet: A multilevel spectral-spatial transformer network for hyperspectral image classification. IEEE Trans Geosci Remote Sens 60:1–13
Wu H, Zhang M, Huang P et al (2024) Cmlformer: Cnn and multi-scale local-context transformer network for remote sensing images semantic segmentation. IEEE J Sel Top Appl Earth Obs Remote Sens pp 1–10
Geng Z, Chen Z, Meng Q et al (2021) Novel transformer based on gated convolutional neural network for dynamic soft sensor modeling of industrial processes. IEEE Trans Industr Inf 18:1521–1529
Article MATH Google Scholar
Song R, Feng Y, Cheng W et al (2022) Bs2t: Bottleneck spatial-spectral transformer for hyperspectral image classification. IEEE Trans Geosci Remote Sens 60:1–17
MATH Google Scholar
Xie X, Wu D, Xie M et al (2024) Ghostformer: Efficiently amalgamated cnn-transformer architecture for object detection. Pattern Recogn 148:110172
Article MATH Google Scholar
Kang J, Guan H, Ma L et al (2023) Waterformer: A coupled transformer and cnn network for waterbody detection in optical remotely-sensed imagery. ISPRS J Photogramm Remote Sens 206:222–241
Article Google Scholar
Wang C, Xu M, Jiang Y et al (2022) Translution-snet: A semisupervised hyperspectral image stripe noise removal based on transformer and cnn. IEEE Trans Geosci Remote Sens 60:1–14
Google Scholar
Zhang Q, Xu Y, Zhang J et al (2023) Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. Int J Comput Vision 131:1141–1162
Article MATH Google Scholar
Sartran L, Barrett S, Kuncoro A et al (2022) Transformer grammars: Augmenting transformer language models with syntactic inductive biases at scale. Trans Assoc Comput Linguist 10:1423–1439
Article Google Scholar
Hao S, Li N, Ye Y (2023) Inductive biased swin-transformer with cyclic regressor for remote sensing scene classification. IEEE J Sel Top Appl Earth Obs Remote Sens 16:6265–6278
Article MATH Google Scholar
Graham B et al (2021) Levit: A vision transformer in convnet’s clothing for faster inference. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 12259–12269
Zhang Q, Yang Y-B (2021) Rest: An efficient transformer for visual recognition. Adv Neural Inform Process Syst 34:15475–15485
Heo B, Yun S, Han D, Chun S, Choe J, Oh SJ (2021) Rethinking spatial dimensions of vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 11916–11925
Zhang Z et al (2022) Nested hierarchical transformer: Towards accurate data-efficient and interpretable visual understanding. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 3417–3425
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognitio (CVPR), pp 2117–2125
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1440–1448
Wang W et al (2022) Pvtv 2: Improved baselines with pyramid vision transformer. Comput Vis Media 8(3):1–10
Google Scholar
Xu W, Xu Y, Chang T, Tu Z (2021) Co-scale conv-attentional image transformers. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 9981–9990
Chen C-F, Fan Q, Panda R (2021) Crossvit: Cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF international conference on computer visio (ICCV), pp 357–366
Codella NCF, Gutman D, Celebi ME et al (2018) Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In: 2018 IEEE 15th International symposium on biomedical imaging (ISBI 2018). IEEE, pp 168–172
Fu H, Xu Y, Lin S, Wong DWK, Liu J (2016) Deepvessel: Retinal vessel segmentation via deep learning and conditional random field. In: Medical image computing and computer-assisted intervention–MICCAI 2016: 19th international conference, pp 132–139
Wang H et al (2022) Mixed transformer u-net for medical image segmentation. In: ICASSP 2022-2022 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 2390–2394
Yan X, Tang H, Sun S, Ma H, Kong D, Xie X (2022) After-unet: Axial fusion transformer unet for medical image segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3971–3981
Chang Y, Menghan H, Guangtao Z, Xiao-Ping Z (2022) Transclaw u-net: claw u-net with transformers for medical image segmentation. In: 2022 5th IEEE International conference information communication signal processing (ICICSP), pp 280–284
Xie Y, Zhang J, Shen C, Xia Y (2021) Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation. In: Medical image computing and computer assisted intervention–MICCAI 2021: 24th international conference, pp 171–180
Huang X, Deng Z, Li D, Yuan X, Fu Y (2022) Missformer: An effective transformer for 2d medical image segmentation. IEEE Trans Med Imaging
Center for Biomedical Image Computing & Analytics. https://www.med.upenn.edu/cbica/captk. Accessed 16 Sept 2023
Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2018) Unet++: A nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support, pp 3–11
Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2020) Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans Med Imaging 39(6):1856–1867
Article Google Scholar
Wang J et al (2020) Deep high-resolution representation learning for visual recognition. IEEE Transactions on pattern analysis and machine intelligence, pp 5686–5696
Jha D, Riegler MA, Johansen D, Halvorsen P, Johansen HD (2020) Doubleu-net: A deep convolutional neural network for medical image segmentation. In: 2020 IEEE 33rd International symposium on computer-based medical systems (CBMS), pp 558–564
Jha D et al (2021) A comprehensive study on colorectal polyp segmentation with resunet++ conditional random field and test-time augmentation. IEEE J Biomed Health Inform 25(6):2029–2040
Article MATH Google Scholar
Srivastava A et al (2022) Msrf-net: A multi-scale residual fusion network for biomedical image segmentation. IEEE J Biomed Health Inform 26(5):2252–2263
Article MATH Google Scholar
Xu G et al (2022) Levit-unet: Make faster encoders with transformer for biomedical image segmentation. In: Chinese conference on pattern recognition and computer vision (PRCV)
Xu Q, Ma Z, He N, Duan W (2023) Dcsau-net: A deeper and more compact split-attention u-net for medical image segmentation. Comput Biol Med 154:106626

Download references

Acknowledgements

This work was financially supported by the National Natural Science Foundation of China under Grant 62172184, the Science and Technology Development Plan of Jilin Province of China under Grants 20200401077GX and 20200201292JC and the Jilin Educational Scientific Research Leading Group (ZD21003, JS2338).

Author information

Authors and Affiliations

Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, Jilin, China
Yan Gao, Huan Xu, Quanle Liu, Mei Bie & Xiangjiu Che

Authors

Yan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Huan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Quanle Liu
View author publications
You can also search for this author in PubMed Google Scholar
Mei Bie
View author publications
You can also search for this author in PubMed Google Scholar
Xiangjiu Che
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yan Gao conceived and designed the research, conducted the experiments and authored the paper, which was followed by iterative revisions of the manuscript. Professor Xiangjiu Che acted as the corresponding author, conducting manuscript checks, providing valuable insights, and managed communications with the journal, including correspondence and responses. Huan Xu assisted in manuscript reviews and proofreading. Quanle Liu contributed to graphical content suggestions. Mei Bie participated in the review of the reference materials.

Corresponding author

Correspondence to Xiangjiu Che.

Ethics declarations

Ethical and informed consent for data used

Ethical and informed consent for data used.

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gao, Y., Xu, H., Liu, Q. et al. A swin-transformer-based network with inductive bias ability for medical image segmentation. Appl Intell 55, 103 (2025). https://doi.org/10.1007/s10489-024-06029-1

Download citation

Accepted: 16 September 2024
Published: 09 December 2024
DOI: https://doi.org/10.1007/s10489-024-06029-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A swin-transformer-based network with inductive bias ability for medical image segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MBUTransNet: multi-branch U-shaped network fusion transformer architecture for medical image segmentation

Sub-pixel multi-scale fusion network for medical image segmentation

Swin-TransUper: Swin Transformer-based UperNet for medical image segmentation

Data available statement

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical and informed consent for data used

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A swin-transformer-based network with inductive bias ability for medical image segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MBUTransNet: multi-branch U-shaped network fusion transformer architecture for medical image segmentation

Sub-pixel multi-scale fusion network for medical image segmentation

Swin-TransUper: Swin Transformer-based UperNet for medical image segmentation

Explore related subjects

Data available statement

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical and informed consent for data used

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation