Skip to main content

Advertisement

Log in

MSFM-UNET: enhancing medical image segmentation with multi-scale and multi-view frequency fusion

  • Original Paper
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Medical image segmentation benefits greatly from accurate and efficient models. Although CNNs and Transformer-based models are widely regarded as foundational methods in the realm of medical image segmentation, each has inherent drawbacks: Convolutional Neural Networks (CNNs) frequently face challenges when it comes to accurately capturing long-range relationships because of their limited receptive fields. Conversely, Transformers excel at capturing long-range relationships but come with a high computational cost. To address these challenges, State Space Models (SSMs) like Mamba have emerged as a promising alternative, providing an effective method to represent long-range interactions while maintaining a linear complexity. In this study, we present the Multi-Scale and Multi-View Frequency Mamba UNet (MSFM-UNet), a model specifically designed to leverage Mamba’s unique strengths for improving medical image segmentation. Additionally, the Multi-Scale Feature Aggregation (MSFA) effectively merges the feature outputs generated by each encoder block with those from the decoder. Furthermore, the Multi-View Frequency Enhancement (MVFA) is employed to simultaneously capture global and local perspectives, combining frequency domain attributes to improve the representation of features across multiple scales. We performed a comprehensive evaluation of MSFM-UNet on four widely recognized public datasets: ISIC17, ISIC18, Synapse, and ACDC. The experimental results clearly demonstrate that MSFM-UNet outperforms the current leading models in medical image segmentation. The code is made publicly available at https://github.com/qczggaoqiang/MSFM-UNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availibility

The experimental datasets are publicly available.

References

  1. Umirzakova S, Ahmad S, Khan LU, Whangbo T (2024) Medical image super-resolution for smart healthcare applications: a comprehensive survey. Inf Fusion 103:102075

    Article  MATH  Google Scholar 

  2. He A, Wang K, Li T, Du C, Xia S, Fu H (2023) H2former: an efficient hierarchical hybrid transformer for medical image segmentation. IEEE Trans Med Imaging 42(9):2763–2775

    Article  MATH  Google Scholar 

  3. Liu X, Zhao Y, Wang S, Wei J (2024) Transdiff: medical image segmentation method based on swin transformer with diffusion probabilistic model. Appl Intell 54(8):6543–6557

    Article  MATH  Google Scholar 

  4. Azad R, Kazerouni A, Heidari M, Aghdam EK, Molaei A, Jia Y, Jose A, Roy R, Merhof D (2023) Advances in medical image analysis with vision transformers: a comprehensive review. Med Image Anal 91:103000

    Article  Google Scholar 

  5. Codella N, Rotemberg V, Tschandl P, Celebi ME, Dusza S, Gutman D, Helba B, Kalloo A, Liopyris K, Marchetti M et al (2019) Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the international skin imaging collaboration (ISIC). arXiv:1902.03368

  6. Wu H, Pan J, Li Z, Wen Z, Qin J (2020) Automated skin lesion segmentation via an adaptive dual attention module. IEEE Trans Med Imaging 40(1):357–370

    Article  Google Scholar 

  7. Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X et al (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 43(10):3349–3364

    Article  MATH  Google Scholar 

  8. Yu Z, Lee F, Chen Q (2023) Hct-net: hybrid CNN-transformer model based on a neural architecture search network for medical image segmentation. Appl Intell 53(17):19990–20006

    Article  Google Scholar 

  9. He J, Xu C (2023) Hybrid transformer-CNN with boundary-awareness network for 3d medical image segmentation. Appl Intell 53(23):28542–28554

    Article  Google Scholar 

  10. Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241

  11. Zhou Z, Rahman Siddiquee MM, Tajbakhsh N, Liang J (2018) Unet++: a nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support: 4th international workshop, DLMIA 2018, and 8th international workshop, ML-CDS 2018, held in conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings vol 4, pp 3–11

  12. Jha D, Smedsrud PH, Riegler MA, Johansen D, De Lange T, Halvorsen P, Johansen HD (2019) Resunet++: an advanced architecture for medical image segmentation. In: 2019 IEEE international symposium on multimedia (ISM), pp 225–2255

  13. Guan S, Khan AA, Sikdar S, Chitnis PV (2019) Fully dense unet for 2-d sparse photoacoustic tomography artifact removal. IEEE J Biomed Health Inf 24(2):568–576

    Article  Google Scholar 

  14. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  MATH  Google Scholar 

  15. Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818

  16. Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B et al (2018) Attention u-net: learning where to look for the pancreas. arXiv:1804.03999

  17. Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y (2021) Transunet: transformers make strong encoders for medical image segmentation. arXiv:2102.04306

  18. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30

  19. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929

  20. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022

  21. Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2022) Swin-Unet: Unet-like pure transformer for medical image segmentation. In: European conference on computer vision. Springer, pp 205–218

  22. Heidari M, Kazerouni A, Soltany M, Azad R, Aghdam EK, Cohen-Adad J, Merhof D (2023) Hiformer: Hierarchical multi-scale representations using transformers for medical image segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 6202–6212

  23. Gu A, Dao T (2023) Mamba: linear-time sequence modeling with selective state spaces. arXiv:2312.00752

  24. Lu C, Schroecker Y, Gu A, Parisotto E, Foerster J, Singh S, Behbahani F (2024) Structured state space models for in-context reinforcement learning. Adv Neural Inf Process Syst 36

  25. Gu A, Goel K, Ré C (2021) Efficiently modeling long sequences with structured state spaces. arXiv:2111.00396

  26. Liu Y, Tian Y, Zhao Y, Yu H, Xie L, Wang Y, Ye Q, Liu Y (2024) Vmamba: visual state space model. arXiv:2401.10166

  27. Zhu L, Liao B, Zhang Q, Wang X, Liu W, Wang X (2024) Vision mamba: efficient visual representation learning with bidirectional state space model. arXiv:2401.09417

  28. Ma J, Li F, Wang B (2024) U-mamba: enhancing long-range dependency for biomedical image segmentation. arXiv:2401.04722

  29. Xing Z, Ye T, Yang Y, Liu G, Zhu L (2024) Segmamba: long-range sequential modeling mamba for 3d medical image segmentation. arXiv:2401.13560

  30. Ruan J, Xiang S (2024) Vm-unet: vision mamba unet for medical image segmentation. arXiv:2402.02491

  31. Liu Z, Mao H, Wu C-Y, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11976–11986

  32. Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587

  33. Miao J, Wei Y, Wang X, Yang Y (2023) Temporal pixel-level semantic understanding through the vspw dataset. IEEE Trans Pattern Anal Mach Intell

  34. Gao M, Zheng F, Yu JJ, Shan C, Ding G, Han J (2023) Deep learning for video object segmentation: a review. Artif Intell Rev 56(1):457–531

    Article  MATH  Google Scholar 

  35. Xiao X, Lian S, Luo Z, Li S (2018) Weighted res-unet for high-quality retina vessel segmentation. In: 2018 9th International conference on information technology in medicine and education (ITME), pp 327–331

  36. Han Z, Jian M, Wang G-G (2022) Convunext: an efficient convolution neural network for medical image segmentation. Knowl-Based Syst 253:109512

    Article  MATH  Google Scholar 

  37. Dhamija T, Gupta A, Gupta S, Anjum, Katarya R, Singh G (2023) Semantic segmentation in medical images through transfused convolution and transformer networks. Appl Intell 53(1):1132–1148

    Article  MATH  Google Scholar 

  38. Azad R, Heidari M, Wu Y, Merhof D (2022) Contextual attention network: transformer meets u-net. In: International workshop on machine learning in medical imaging, pp 377–386

  39. Wang H, Xie S, Lin L, Iwamoto Y, Han X-H, Chen Y-W, Tong R (2022) Mixed transformer u-net for medical image segmentation. In: ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2390–2394

  40. Azad R, Arimond R, Aghdam EK, Kazerouni A, Merhof D (2023) Dae-former: dual attention-guided efficient transformer for medical image segmentation. In: International workshop on predictive intelligence in medicine, pp 83–95

  41. Liu J, Yang H, Zhou H-Y, Xi Y, Yu L, Yu Y, Liang Y, Shi G, Zhang S, Zheng H et al (2024) Swin-umamba: Mamba-based unet with imagenet-based pretraining. arXiv:2402.03302

  42. Wu R, Liu Y, Liang P, Chang Q (2024) Ultralight vm-unet: parallel vision mamba significantly reduces parameters for skin lesion segmentation. arXiv:2403.20035

  43. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  44. Ruan J, Xie M, Xiang S, Liu T, Fu Y (2022) Mew-unet: multi-axis representation learning in frequency domain for medical image segmentation. arXiv:2210.14007

  45. Ruan J, Xiang S, Xie M, Liu T, Fu Y (2022) Malunet: a multi-attention and light-weight unet for skin lesion segmentation. In: 2022 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 1150–1156

  46. Bernard O, Lalande A, Zotti C, Cervenansky F, Yang X, Heng P-A, Cetin I, Lekadir K, Camara O, Ballester MAG et al (2018) Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans Med Imaging 37(11):2514–2525

    Article  Google Scholar 

  47. Gao Y, Zhou M, Liu D, Metaxas DN (2022) A multi-scale transformer for medical image segmentation: Architectures, model efficiency, and benchmarks. CoRR arXiv:2203.00131

  48. Zhang Y, Liu H, Hu Q (2021) Transfuse: fusing transformers and CNNS for medical image segmentation. In: Medical image computing and computer assisted intervention–MICCAI 2021: 24th international conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I, vol 24, pp 14–24

  49. Liao W, Zhu Y, Wang X, Pan C, Wang Y, Ma L (2024) Lightm-unet: Mamba assists in lightweight unet for medical image segmentation. arXiv:2403.05246

Download references

Funding

This work was supported by the Chongqing Technology Innovation and Application Development Project (CSTB2022TIAD-KPX0176, CSTB2023TIAD-KPX0050), Fundamental Research Funds for the Central Universities (2022CDJYGRH-015), a key joint project of Chongqing Health Commission and Science and Technology Bureau (2024ZDXM007), Project of Chongqing Key Laboratory of Emergency Medicine (2024RCCX01), Key Project of Science and Technology Research Program of Chongqing Municipal Education Commission (KJZD-K202400106).

Author information

Authors and Affiliations

Authors

Contributions

Qiang Gao: Conceptualization, Methodology, Software, Data curation, Writing—original draft. Yi Wang: Supervision, Writing—review and editing. Feiyan Zhou Validation, Writing—review and editing. Jing Wen: Writing—review and editing. Yong Li: Supervision, Validation. Bin Fang: Validation. Peng Chen: Validation. Lan Du: Supervision, Validation. Cunjian Chen: Supervision, Writing—review and editing.

Corresponding author

Correspondence to Yi Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, Q., Wang, Y., Zhou, F. et al. MSFM-UNET: enhancing medical image segmentation with multi-scale and multi-view frequency fusion. Pattern Anal Applic 28, 17 (2025). https://doi.org/10.1007/s10044-024-01384-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10044-024-01384-8

Keywords

Navigation