Abstract
Medical image segmentation benefits greatly from accurate and efficient models. Although CNNs and Transformer-based models are widely regarded as foundational methods in the realm of medical image segmentation, each has inherent drawbacks: Convolutional Neural Networks (CNNs) frequently face challenges when it comes to accurately capturing long-range relationships because of their limited receptive fields. Conversely, Transformers excel at capturing long-range relationships but come with a high computational cost. To address these challenges, State Space Models (SSMs) like Mamba have emerged as a promising alternative, providing an effective method to represent long-range interactions while maintaining a linear complexity. In this study, we present the Multi-Scale and Multi-View Frequency Mamba UNet (MSFM-UNet), a model specifically designed to leverage Mamba’s unique strengths for improving medical image segmentation. Additionally, the Multi-Scale Feature Aggregation (MSFA) effectively merges the feature outputs generated by each encoder block with those from the decoder. Furthermore, the Multi-View Frequency Enhancement (MVFA) is employed to simultaneously capture global and local perspectives, combining frequency domain attributes to improve the representation of features across multiple scales. We performed a comprehensive evaluation of MSFM-UNet on four widely recognized public datasets: ISIC17, ISIC18, Synapse, and ACDC. The experimental results clearly demonstrate that MSFM-UNet outperforms the current leading models in medical image segmentation. The code is made publicly available at https://github.com/qczggaoqiang/MSFM-UNet.








Similar content being viewed by others
Data availibility
The experimental datasets are publicly available.
References
Umirzakova S, Ahmad S, Khan LU, Whangbo T (2024) Medical image super-resolution for smart healthcare applications: a comprehensive survey. Inf Fusion 103:102075
He A, Wang K, Li T, Du C, Xia S, Fu H (2023) H2former: an efficient hierarchical hybrid transformer for medical image segmentation. IEEE Trans Med Imaging 42(9):2763–2775
Liu X, Zhao Y, Wang S, Wei J (2024) Transdiff: medical image segmentation method based on swin transformer with diffusion probabilistic model. Appl Intell 54(8):6543–6557
Azad R, Kazerouni A, Heidari M, Aghdam EK, Molaei A, Jia Y, Jose A, Roy R, Merhof D (2023) Advances in medical image analysis with vision transformers: a comprehensive review. Med Image Anal 91:103000
Codella N, Rotemberg V, Tschandl P, Celebi ME, Dusza S, Gutman D, Helba B, Kalloo A, Liopyris K, Marchetti M et al (2019) Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the international skin imaging collaboration (ISIC). arXiv:1902.03368
Wu H, Pan J, Li Z, Wen Z, Qin J (2020) Automated skin lesion segmentation via an adaptive dual attention module. IEEE Trans Med Imaging 40(1):357–370
Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X et al (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 43(10):3349–3364
Yu Z, Lee F, Chen Q (2023) Hct-net: hybrid CNN-transformer model based on a neural architecture search network for medical image segmentation. Appl Intell 53(17):19990–20006
He J, Xu C (2023) Hybrid transformer-CNN with boundary-awareness network for 3d medical image segmentation. Appl Intell 53(23):28542–28554
Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241
Zhou Z, Rahman Siddiquee MM, Tajbakhsh N, Liang J (2018) Unet++: a nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support: 4th international workshop, DLMIA 2018, and 8th international workshop, ML-CDS 2018, held in conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings vol 4, pp 3–11
Jha D, Smedsrud PH, Riegler MA, Johansen D, De Lange T, Halvorsen P, Johansen HD (2019) Resunet++: an advanced architecture for medical image segmentation. In: 2019 IEEE international symposium on multimedia (ISM), pp 225–2255
Guan S, Khan AA, Sikdar S, Chitnis PV (2019) Fully dense unet for 2-d sparse photoacoustic tomography artifact removal. IEEE J Biomed Health Inf 24(2):568–576
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B et al (2018) Attention u-net: learning where to look for the pancreas. arXiv:1804.03999
Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y (2021) Transunet: transformers make strong encoders for medical image segmentation. arXiv:2102.04306
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2022) Swin-Unet: Unet-like pure transformer for medical image segmentation. In: European conference on computer vision. Springer, pp 205–218
Heidari M, Kazerouni A, Soltany M, Azad R, Aghdam EK, Cohen-Adad J, Merhof D (2023) Hiformer: Hierarchical multi-scale representations using transformers for medical image segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 6202–6212
Gu A, Dao T (2023) Mamba: linear-time sequence modeling with selective state spaces. arXiv:2312.00752
Lu C, Schroecker Y, Gu A, Parisotto E, Foerster J, Singh S, Behbahani F (2024) Structured state space models for in-context reinforcement learning. Adv Neural Inf Process Syst 36
Gu A, Goel K, Ré C (2021) Efficiently modeling long sequences with structured state spaces. arXiv:2111.00396
Liu Y, Tian Y, Zhao Y, Yu H, Xie L, Wang Y, Ye Q, Liu Y (2024) Vmamba: visual state space model. arXiv:2401.10166
Zhu L, Liao B, Zhang Q, Wang X, Liu W, Wang X (2024) Vision mamba: efficient visual representation learning with bidirectional state space model. arXiv:2401.09417
Ma J, Li F, Wang B (2024) U-mamba: enhancing long-range dependency for biomedical image segmentation. arXiv:2401.04722
Xing Z, Ye T, Yang Y, Liu G, Zhu L (2024) Segmamba: long-range sequential modeling mamba for 3d medical image segmentation. arXiv:2401.13560
Ruan J, Xiang S (2024) Vm-unet: vision mamba unet for medical image segmentation. arXiv:2402.02491
Liu Z, Mao H, Wu C-Y, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11976–11986
Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587
Miao J, Wei Y, Wang X, Yang Y (2023) Temporal pixel-level semantic understanding through the vspw dataset. IEEE Trans Pattern Anal Mach Intell
Gao M, Zheng F, Yu JJ, Shan C, Ding G, Han J (2023) Deep learning for video object segmentation: a review. Artif Intell Rev 56(1):457–531
Xiao X, Lian S, Luo Z, Li S (2018) Weighted res-unet for high-quality retina vessel segmentation. In: 2018 9th International conference on information technology in medicine and education (ITME), pp 327–331
Han Z, Jian M, Wang G-G (2022) Convunext: an efficient convolution neural network for medical image segmentation. Knowl-Based Syst 253:109512
Dhamija T, Gupta A, Gupta S, Anjum, Katarya R, Singh G (2023) Semantic segmentation in medical images through transfused convolution and transformer networks. Appl Intell 53(1):1132–1148
Azad R, Heidari M, Wu Y, Merhof D (2022) Contextual attention network: transformer meets u-net. In: International workshop on machine learning in medical imaging, pp 377–386
Wang H, Xie S, Lin L, Iwamoto Y, Han X-H, Chen Y-W, Tong R (2022) Mixed transformer u-net for medical image segmentation. In: ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2390–2394
Azad R, Arimond R, Aghdam EK, Kazerouni A, Merhof D (2023) Dae-former: dual attention-guided efficient transformer for medical image segmentation. In: International workshop on predictive intelligence in medicine, pp 83–95
Liu J, Yang H, Zhou H-Y, Xi Y, Yu L, Yu Y, Liang Y, Shi G, Zhang S, Zheng H et al (2024) Swin-umamba: Mamba-based unet with imagenet-based pretraining. arXiv:2402.03302
Wu R, Liu Y, Liang P, Chang Q (2024) Ultralight vm-unet: parallel vision mamba significantly reduces parameters for skin lesion segmentation. arXiv:2403.20035
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Ruan J, Xie M, Xiang S, Liu T, Fu Y (2022) Mew-unet: multi-axis representation learning in frequency domain for medical image segmentation. arXiv:2210.14007
Ruan J, Xiang S, Xie M, Liu T, Fu Y (2022) Malunet: a multi-attention and light-weight unet for skin lesion segmentation. In: 2022 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 1150–1156
Bernard O, Lalande A, Zotti C, Cervenansky F, Yang X, Heng P-A, Cetin I, Lekadir K, Camara O, Ballester MAG et al (2018) Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans Med Imaging 37(11):2514–2525
Gao Y, Zhou M, Liu D, Metaxas DN (2022) A multi-scale transformer for medical image segmentation: Architectures, model efficiency, and benchmarks. CoRR arXiv:2203.00131
Zhang Y, Liu H, Hu Q (2021) Transfuse: fusing transformers and CNNS for medical image segmentation. In: Medical image computing and computer assisted intervention–MICCAI 2021: 24th international conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I, vol 24, pp 14–24
Liao W, Zhu Y, Wang X, Pan C, Wang Y, Ma L (2024) Lightm-unet: Mamba assists in lightweight unet for medical image segmentation. arXiv:2403.05246
Funding
This work was supported by the Chongqing Technology Innovation and Application Development Project (CSTB2022TIAD-KPX0176, CSTB2023TIAD-KPX0050), Fundamental Research Funds for the Central Universities (2022CDJYGRH-015), a key joint project of Chongqing Health Commission and Science and Technology Bureau (2024ZDXM007), Project of Chongqing Key Laboratory of Emergency Medicine (2024RCCX01), Key Project of Science and Technology Research Program of Chongqing Municipal Education Commission (KJZD-K202400106).
Author information
Authors and Affiliations
Contributions
Qiang Gao: Conceptualization, Methodology, Software, Data curation, Writing—original draft. Yi Wang: Supervision, Writing—review and editing. Feiyan Zhou Validation, Writing—review and editing. Jing Wen: Writing—review and editing. Yong Li: Supervision, Validation. Bin Fang: Validation. Peng Chen: Validation. Lan Du: Supervision, Validation. Cunjian Chen: Supervision, Writing—review and editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gao, Q., Wang, Y., Zhou, F. et al. MSFM-UNET: enhancing medical image segmentation with multi-scale and multi-view frequency fusion. Pattern Anal Applic 28, 17 (2025). https://doi.org/10.1007/s10044-024-01384-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10044-024-01384-8