Abstract
Accurately segmenting organs, or diseased regions of varying sizes, is a challenge in medical image segmentation tasks with limited datasets. Although Transformer-based methods have self-attention mechanisms, excellent global modelling abilities and can effectively focus on crucial areas in medical images, they still face the intractable issues of computational complexity and excessive reliance on data. The Swin-Transformer has partly addressed the problem of computational complexity, however the method still requires large amounts of training data due to its lack of inductive bias capabilities. When applied to medical image segmentation tasks with small datasets, this leads to suboptimal performances. In contrast, the CNN-based methods can compensate for this limitation. To address this issue further, this paper proposes Swin-IBNet, which combines the Swin-Transformer with a CNN in a novel manner to imbue it with inductive bias capabilities, reducing its reliance on data. During the encoding process of Swin-IBNet, two novel and crucial modules, the feature fusion block (FFB) and the multiscale feature aggregation block (MSFA), are designed. The FFB is responsible for propagating the inductive bias capability to the Swin-Transformer encoder. Different from the previous use of multiscale features, MSFA efficiently leverages multiscale information from different layers through self-learning. This paper not only attempts to analyse the interpretability of the proposed Swin-IBNet but also performs more verifications on the public Synapse, ISIC 2018 and ACDC datasets. The experimental results show that Swin-IBNet is superior to the baseline method, Swin-Unet, and several state-of-the-art methods. Especially on the Synapse dataset, the DSC of Swin-IBNet surpasses that of Swin-Unet by 3.45%.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data available statement
All the data used in this work are publicly available to the researchers.
References
Liu C, Xie H, Zha Z, Yu L, Chen Z, Zhang Y (2019) Bidirectional attention-recognition model for fine-grained object classification. IEEE Trans Multimedia 22(7):1785–1795
Min S, Yao H, Xie H, Zha Z, Zhang Y (2020) Domain-oriented semantic embedding for zero-shot learning. IEEE Trans Multimedia 23:3919–3930
Min S, Yao H, Xie H, Zha Z, Zhang Y (2020) Multi-objective matrix normalization for fine-grained visual recognition. IEEE Trans Image Process 29:4996–5009
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst pp 91–99
Barroso-Laguna A, Mikolajczyk K (2022) Key. net: Keypoint detection by handcrafted and learned cnn filters revisited. IEEE Trans Pattern Anal Mach Intell 45(1):698–711
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proc IEEE ICCV, pp 2980–2988
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proc. IEEE Conf comput vis pattern recognit pp 3431–3440
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proc 15th Eur conf pp 833–851
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Proc. Int. Conf. Med. Image Comput. Comput.-Assisted Intervention., pp. 234–241
Tian Y, Yang G, Wang Z et al (2020) Instance segmentation of apple flowers using the improved mask r-cnn model. Biosys Eng 193:264–278
Han Z, Jian M, Wang G-G (2022) Convunext: An efficient convolution neural network for medical image segmentation. Knowl-based Syst 253
Vaswani Aea (2017) Attention is all you need. Advances in neural information processing systems., 6000–6010
Wang W et al (2021) Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 568–578
Yang Y, Zhang L, Ren L, Wang X (2023) Mmvit-seg: A lightweight transformer and cnn fusion network for covid-19 segmentation. Comput Methods Programs Biomed 230:106365
Li X et al (2023) Attransunet: An enhanced hybrid transformer architecture for ultrasound and histopathology image segmentation. Comput Biol Med 152:106365
Gao C, Ye H, Cao F, Wen C, Zhang Q, Zhang F (2021) Multiscale fused network with additive channel-spatial attention for image segmentation. Knowl-Based Syst 214:106754
Lin F, Liang Z, Wu S, He J, Chen K, Tian S (2023) Structtoken: Rethinking semantic segmentation with structural prior. IEEE Transactions on circuits and systems for video technology
Park K-B, Lee JY (2022) Swine-net: Hybrid deep learning approach to novel polyp segmentation using convolutional neural network and swin transformer. J Comput Des Eng 9(2):616–632
Liu Y, Wang H, Chen Z, Huangliang K, Zhang H (2022) Transu-net +: Redesigning the skip connection to enhance features in medical image segmentation. Knowl-Based Syst 256:109859
Tang P et al (2022) Unified medical image segmentation by learning from uncertainty in an end-to-end manner. Knowl-Based Syst
Qi M et al (2022) Ftc-net: Fusion of transformer and cnn features for infrared small target detection. IEEE Journal of selected topics in applied earth observations and remote sensing. 15:8613–8623
Gao G, Xu Z, Li J et al (2023) Ctcnet: A cnn-transformer cooperation network for face image super-resolution. IEEE Trans Image Process pp 1978–1991
Li W, Xue L, Wang X et al (2023) Convtransnet: A cnn-transformer network for change detection with multi-scale global-local representations. IEEE Trans Geosci Remote Sens 61
Dosovitskiy A et al (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: International conference on learning representations (ICLR)
Liu Z et al (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Sun L, Zhao G, Zheng Y et al (2022) Spectral-spatial feature tokenization transformer for hyperspectral image classification. IEEE Trans Geosci Remote Sens 60:1–14
Hong D, Han Z, Yao J et al (2021) Spectralformer: Rethinking hyperspectral image classification with transformers. IEEE Trans Geosci Remote Sens 60:1–15
Touvron H, Bojanowski P, Caron M et al (2022) Resmlp: Feedforward networks for image classification with data-efficient training. IEEE Trans Pattern Anal Mach Intell 45:5314–5321
Remote sensing image change detection with transformers (2021) Chen H, SZ. Qi Z. IEEE Trans Geosci Remote Sens 60:1–14
Li K, Wang Y, Zhang J et al (2023) Uniformer: Unifying convolution and self-attention for visual recognition. IEEE Trans Pattern Anal Mach Intell 45:12581–12600
Li Y, Yao T, Pan Y et al (2022) Contextual transformer networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 45:1489–1500
Chen J al (2021) Transunet: Transformers make strong encoders for medical image segmentation. CoRR. abs/2102.04306, pp 1–13
Cao H al (2022) Swin-unet: Unet-like pure transformer for medical image segmentation. In: European conference on computer vision (ECCV), pp 205–218
Wang L, Li R, Zhang C et al (2022) Unetformer: A unet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS J Photogramm Remote Sens 190:196–214
Zhu Z, He X, Qi G et al (2023) Brain tumor segmentation based on the fusion of deep semantics and edge information in multimodal mri. Inform Fusion 91:376–387
Yuan F, Zhang Z, Fang Z (2023) An effective cnn and transformer complementary network for medical image segmentation. Pattern Recogn 136:109228
Zhang C, Jiang W, Zhang Y et al (2022) Transformer and cnn hybrid deep neural network for semantic segmentation of very-high-resolution remote sensing imagery. IEEE Trans Geosci Remote Sens 60:1–20
Ding W, Wang H, Huang J et al (2023) Ftranscnn: Fusing transformer and a cnn based on fuzzy logic for uncertain medical image segmentation. Inform Fusion 99: 101880
Zhao Z, Li Q, Zhang Z et al (2021) Combining a parallel 2d cnn with a self-attention dilated residual network for ctc-based discrete speech emotion recognition. Neural Netw 141:52–60
Mi Z, Jiang X, Sun T et al (2020) Gan-generated image detection with self-attention mechanism against gan generator defect. IEEE J Sel Top Signal Process 14:969–981
Zeng W, Li M (2020) Crop leaf disease recognition based on self-attention convolutional neural network. Comput Electron Agric 172:105341
Rao D, Xu T, Wu X (2023) Tgfuse: An infrared and visible image fusion approach based on transformer and generative adversarial network. IEEE Trans Image Process
Yu H, Xu Z, Zheng K et al (2022) Mstnet: A multilevel spectral-spatial transformer network for hyperspectral image classification. IEEE Trans Geosci Remote Sens 60:1–13
Wu H, Zhang M, Huang P et al (2024) Cmlformer: Cnn and multi-scale local-context transformer network for remote sensing images semantic segmentation. IEEE J Sel Top Appl Earth Obs Remote Sens pp 1–10
Geng Z, Chen Z, Meng Q et al (2021) Novel transformer based on gated convolutional neural network for dynamic soft sensor modeling of industrial processes. IEEE Trans Industr Inf 18:1521–1529
Song R, Feng Y, Cheng W et al (2022) Bs2t: Bottleneck spatial-spectral transformer for hyperspectral image classification. IEEE Trans Geosci Remote Sens 60:1–17
Xie X, Wu D, Xie M et al (2024) Ghostformer: Efficiently amalgamated cnn-transformer architecture for object detection. Pattern Recogn 148:110172
Kang J, Guan H, Ma L et al (2023) Waterformer: A coupled transformer and cnn network for waterbody detection in optical remotely-sensed imagery. ISPRS J Photogramm Remote Sens 206:222–241
Wang C, Xu M, Jiang Y et al (2022) Translution-snet: A semisupervised hyperspectral image stripe noise removal based on transformer and cnn. IEEE Trans Geosci Remote Sens 60:1–14
Zhang Q, Xu Y, Zhang J et al (2023) Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. Int J Comput Vision 131:1141–1162
Sartran L, Barrett S, Kuncoro A et al (2022) Transformer grammars: Augmenting transformer language models with syntactic inductive biases at scale. Trans Assoc Comput Linguist 10:1423–1439
Hao S, Li N, Ye Y (2023) Inductive biased swin-transformer with cyclic regressor for remote sensing scene classification. IEEE J Sel Top Appl Earth Obs Remote Sens 16:6265–6278
Graham B et al (2021) Levit: A vision transformer in convnet’s clothing for faster inference. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 12259–12269
Zhang Q, Yang Y-B (2021) Rest: An efficient transformer for visual recognition. Adv Neural Inform Process Syst 34:15475–15485
Heo B, Yun S, Han D, Chun S, Choe J, Oh SJ (2021) Rethinking spatial dimensions of vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 11916–11925
Zhang Z et al (2022) Nested hierarchical transformer: Towards accurate data-efficient and interpretable visual understanding. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 3417–3425
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognitio (CVPR), pp 2117–2125
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1440–1448
Wang W et al (2022) Pvtv 2: Improved baselines with pyramid vision transformer. Comput Vis Media 8(3):1–10
Xu W, Xu Y, Chang T, Tu Z (2021) Co-scale conv-attentional image transformers. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 9981–9990
Chen C-F, Fan Q, Panda R (2021) Crossvit: Cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF international conference on computer visio (ICCV), pp 357–366
Codella NCF, Gutman D, Celebi ME et al (2018) Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In: 2018 IEEE 15th International symposium on biomedical imaging (ISBI 2018). IEEE, pp 168–172
Fu H, Xu Y, Lin S, Wong DWK, Liu J (2016) Deepvessel: Retinal vessel segmentation via deep learning and conditional random field. In: Medical image computing and computer-assisted intervention–MICCAI 2016: 19th international conference, pp 132–139
Wang H et al (2022) Mixed transformer u-net for medical image segmentation. In: ICASSP 2022-2022 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 2390–2394
Yan X, Tang H, Sun S, Ma H, Kong D, Xie X (2022) After-unet: Axial fusion transformer unet for medical image segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3971–3981
Chang Y, Menghan H, Guangtao Z, Xiao-Ping Z (2022) Transclaw u-net: claw u-net with transformers for medical image segmentation. In: 2022 5th IEEE International conference information communication signal processing (ICICSP), pp 280–284
Xie Y, Zhang J, Shen C, Xia Y (2021) Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation. In: Medical image computing and computer assisted intervention–MICCAI 2021: 24th international conference, pp 171–180
Huang X, Deng Z, Li D, Yuan X, Fu Y (2022) Missformer: An effective transformer for 2d medical image segmentation. IEEE Trans Med Imaging
Center for Biomedical Image Computing & Analytics. https://www.med.upenn.edu/cbica/captk. Accessed 16 Sept 2023
Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2018) Unet++: A nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support, pp 3–11
Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2020) Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans Med Imaging 39(6):1856–1867
Wang J et al (2020) Deep high-resolution representation learning for visual recognition. IEEE Transactions on pattern analysis and machine intelligence, pp 5686–5696
Jha D, Riegler MA, Johansen D, Halvorsen P, Johansen HD (2020) Doubleu-net: A deep convolutional neural network for medical image segmentation. In: 2020 IEEE 33rd International symposium on computer-based medical systems (CBMS), pp 558–564
Jha D et al (2021) A comprehensive study on colorectal polyp segmentation with resunet++ conditional random field and test-time augmentation. IEEE J Biomed Health Inform 25(6):2029–2040
Srivastava A et al (2022) Msrf-net: A multi-scale residual fusion network for biomedical image segmentation. IEEE J Biomed Health Inform 26(5):2252–2263
Xu G et al (2022) Levit-unet: Make faster encoders with transformer for biomedical image segmentation. In: Chinese conference on pattern recognition and computer vision (PRCV)
Xu Q, Ma Z, He N, Duan W (2023) Dcsau-net: A deeper and more compact split-attention u-net for medical image segmentation. Comput Biol Med 154:106626
Acknowledgements
This work was financially supported by the National Natural Science Foundation of China under Grant 62172184, the Science and Technology Development Plan of Jilin Province of China under Grants 20200401077GX and 20200201292JC and the Jilin Educational Scientific Research Leading Group (ZD21003, JS2338).
Author information
Authors and Affiliations
Contributions
Yan Gao conceived and designed the research, conducted the experiments and authored the paper, which was followed by iterative revisions of the manuscript. Professor Xiangjiu Che acted as the corresponding author, conducting manuscript checks, providing valuable insights, and managed communications with the journal, including correspondence and responses. Huan Xu assisted in manuscript reviews and proofreading. Quanle Liu contributed to graphical content suggestions. Mei Bie participated in the review of the reference materials.
Corresponding author
Ethics declarations
Ethical and informed consent for data used
Ethical and informed consent for data used.
Conflict of interest
The authors declare that they have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gao, Y., Xu, H., Liu, Q. et al. A swin-transformer-based network with inductive bias ability for medical image segmentation. Appl Intell 55, 103 (2025). https://doi.org/10.1007/s10489-024-06029-1
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-06029-1