MSFM-UNET: enhancing medical image segmentation with multi-scale and multi-view frequency fusion

Gao, Qiang; Wang, Yi; Zhou, Feiyan; Wen, Jing; Li, Yong; Fang, Bin; Chen, Peng; Du, Lan; Chen, Cunjian

doi:10.1007/s10044-024-01384-8

MSFM-UNET: enhancing medical image segmentation with multi-scale and multi-view frequency fusion

Original Paper
Published: 07 January 2025

Volume 28, article number 17, (2025)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Qiang Gao¹,
Yi Wang²,
Feiyan Zhou²,
Jing Wen²,
Yong Li²,
Bin Fang²,
Peng Chen³,
Lan Du¹ &
…
Cunjian Chen¹

199 Accesses
Explore all metrics

Abstract

Medical image segmentation benefits greatly from accurate and efficient models. Although CNNs and Transformer-based models are widely regarded as foundational methods in the realm of medical image segmentation, each has inherent drawbacks: Convolutional Neural Networks (CNNs) frequently face challenges when it comes to accurately capturing long-range relationships because of their limited receptive fields. Conversely, Transformers excel at capturing long-range relationships but come with a high computational cost. To address these challenges, State Space Models (SSMs) like Mamba have emerged as a promising alternative, providing an effective method to represent long-range interactions while maintaining a linear complexity. In this study, we present the Multi-Scale and Multi-View Frequency Mamba UNet (MSFM-UNet), a model specifically designed to leverage Mamba’s unique strengths for improving medical image segmentation. Additionally, the Multi-Scale Feature Aggregation (MSFA) effectively merges the feature outputs generated by each encoder block with those from the decoder. Furthermore, the Multi-View Frequency Enhancement (MVFA) is employed to simultaneously capture global and local perspectives, combining frequency domain attributes to improve the representation of features across multiple scales. We performed a comprehensive evaluation of MSFM-UNet on four widely recognized public datasets: ISIC17, ISIC18, Synapse, and ACDC. The experimental results clearly demonstrate that MSFM-UNet outperforms the current leading models in medical image segmentation. The code is made publicly available at https://github.com/qczggaoqiang/MSFM-UNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A decoder-free feature aggregation network for medical image segmentation

Article 23 April 2024

ScaleNet: Rethinking Feature Interaction from a Scale-Wise Perspective for Medical Image Segmentation

EM-Net: Efficient Channel and Frequency Learning with Mamba for 3D Medical Image Segmentation

Data availibility

The experimental datasets are publicly available.

References

Umirzakova S, Ahmad S, Khan LU, Whangbo T (2024) Medical image super-resolution for smart healthcare applications: a comprehensive survey. Inf Fusion 103:102075
Article MATH Google Scholar
He A, Wang K, Li T, Du C, Xia S, Fu H (2023) H2former: an efficient hierarchical hybrid transformer for medical image segmentation. IEEE Trans Med Imaging 42(9):2763–2775
Article MATH Google Scholar
Liu X, Zhao Y, Wang S, Wei J (2024) Transdiff: medical image segmentation method based on swin transformer with diffusion probabilistic model. Appl Intell 54(8):6543–6557
Article MATH Google Scholar
Azad R, Kazerouni A, Heidari M, Aghdam EK, Molaei A, Jia Y, Jose A, Roy R, Merhof D (2023) Advances in medical image analysis with vision transformers: a comprehensive review. Med Image Anal 91:103000
Article Google Scholar
Codella N, Rotemberg V, Tschandl P, Celebi ME, Dusza S, Gutman D, Helba B, Kalloo A, Liopyris K, Marchetti M et al (2019) Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the international skin imaging collaboration (ISIC). arXiv:1902.03368
Wu H, Pan J, Li Z, Wen Z, Qin J (2020) Automated skin lesion segmentation via an adaptive dual attention module. IEEE Trans Med Imaging 40(1):357–370
Article Google Scholar
Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X et al (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 43(10):3349–3364
Article MATH Google Scholar
Yu Z, Lee F, Chen Q (2023) Hct-net: hybrid CNN-transformer model based on a neural architecture search network for medical image segmentation. Appl Intell 53(17):19990–20006
Article Google Scholar
He J, Xu C (2023) Hybrid transformer-CNN with boundary-awareness network for 3d medical image segmentation. Appl Intell 53(23):28542–28554
Article Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241
Zhou Z, Rahman Siddiquee MM, Tajbakhsh N, Liang J (2018) Unet++: a nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support: 4th international workshop, DLMIA 2018, and 8th international workshop, ML-CDS 2018, held in conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings vol 4, pp 3–11
Jha D, Smedsrud PH, Riegler MA, Johansen D, De Lange T, Halvorsen P, Johansen HD (2019) Resunet++: an advanced architecture for medical image segmentation. In: 2019 IEEE international symposium on multimedia (ISM), pp 225–2255
Guan S, Khan AA, Sikdar S, Chitnis PV (2019) Fully dense unet for 2-d sparse photoacoustic tomography artifact removal. IEEE J Biomed Health Inf 24(2):568–576
Article Google Scholar
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article MATH Google Scholar
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B et al (2018) Attention u-net: learning where to look for the pancreas. arXiv:1804.03999
Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y (2021) Transunet: transformers make strong encoders for medical image segmentation. arXiv:2102.04306
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2022) Swin-Unet: Unet-like pure transformer for medical image segmentation. In: European conference on computer vision. Springer, pp 205–218
Heidari M, Kazerouni A, Soltany M, Azad R, Aghdam EK, Cohen-Adad J, Merhof D (2023) Hiformer: Hierarchical multi-scale representations using transformers for medical image segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 6202–6212
Gu A, Dao T (2023) Mamba: linear-time sequence modeling with selective state spaces. arXiv:2312.00752
Lu C, Schroecker Y, Gu A, Parisotto E, Foerster J, Singh S, Behbahani F (2024) Structured state space models for in-context reinforcement learning. Adv Neural Inf Process Syst 36
Gu A, Goel K, Ré C (2021) Efficiently modeling long sequences with structured state spaces. arXiv:2111.00396
Liu Y, Tian Y, Zhao Y, Yu H, Xie L, Wang Y, Ye Q, Liu Y (2024) Vmamba: visual state space model. arXiv:2401.10166
Zhu L, Liao B, Zhang Q, Wang X, Liu W, Wang X (2024) Vision mamba: efficient visual representation learning with bidirectional state space model. arXiv:2401.09417
Ma J, Li F, Wang B (2024) U-mamba: enhancing long-range dependency for biomedical image segmentation. arXiv:2401.04722
Xing Z, Ye T, Yang Y, Liu G, Zhu L (2024) Segmamba: long-range sequential modeling mamba for 3d medical image segmentation. arXiv:2401.13560
Ruan J, Xiang S (2024) Vm-unet: vision mamba unet for medical image segmentation. arXiv:2402.02491
Liu Z, Mao H, Wu C-Y, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11976–11986
Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587
Miao J, Wei Y, Wang X, Yang Y (2023) Temporal pixel-level semantic understanding through the vspw dataset. IEEE Trans Pattern Anal Mach Intell
Gao M, Zheng F, Yu JJ, Shan C, Ding G, Han J (2023) Deep learning for video object segmentation: a review. Artif Intell Rev 56(1):457–531
Article MATH Google Scholar
Xiao X, Lian S, Luo Z, Li S (2018) Weighted res-unet for high-quality retina vessel segmentation. In: 2018 9th International conference on information technology in medicine and education (ITME), pp 327–331
Han Z, Jian M, Wang G-G (2022) Convunext: an efficient convolution neural network for medical image segmentation. Knowl-Based Syst 253:109512
Article MATH Google Scholar
Dhamija T, Gupta A, Gupta S, Anjum, Katarya R, Singh G (2023) Semantic segmentation in medical images through transfused convolution and transformer networks. Appl Intell 53(1):1132–1148
Article MATH Google Scholar
Azad R, Heidari M, Wu Y, Merhof D (2022) Contextual attention network: transformer meets u-net. In: International workshop on machine learning in medical imaging, pp 377–386
Wang H, Xie S, Lin L, Iwamoto Y, Han X-H, Chen Y-W, Tong R (2022) Mixed transformer u-net for medical image segmentation. In: ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2390–2394
Azad R, Arimond R, Aghdam EK, Kazerouni A, Merhof D (2023) Dae-former: dual attention-guided efficient transformer for medical image segmentation. In: International workshop on predictive intelligence in medicine, pp 83–95
Liu J, Yang H, Zhou H-Y, Xi Y, Yu L, Yu Y, Liang Y, Shi G, Zhang S, Zheng H et al (2024) Swin-umamba: Mamba-based unet with imagenet-based pretraining. arXiv:2402.03302
Wu R, Liu Y, Liang P, Chang Q (2024) Ultralight vm-unet: parallel vision mamba significantly reduces parameters for skin lesion segmentation. arXiv:2403.20035
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Ruan J, Xie M, Xiang S, Liu T, Fu Y (2022) Mew-unet: multi-axis representation learning in frequency domain for medical image segmentation. arXiv:2210.14007
Ruan J, Xiang S, Xie M, Liu T, Fu Y (2022) Malunet: a multi-attention and light-weight unet for skin lesion segmentation. In: 2022 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 1150–1156
Bernard O, Lalande A, Zotti C, Cervenansky F, Yang X, Heng P-A, Cetin I, Lekadir K, Camara O, Ballester MAG et al (2018) Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans Med Imaging 37(11):2514–2525
Article Google Scholar
Gao Y, Zhou M, Liu D, Metaxas DN (2022) A multi-scale transformer for medical image segmentation: Architectures, model efficiency, and benchmarks. CoRR arXiv:2203.00131
Zhang Y, Liu H, Hu Q (2021) Transfuse: fusing transformers and CNNS for medical image segmentation. In: Medical image computing and computer assisted intervention–MICCAI 2021: 24th international conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I, vol 24, pp 14–24
Liao W, Zhu Y, Wang X, Pan C, Wang Y, Ma L (2024) Lightm-unet: Mamba assists in lightweight unet for medical image segmentation. arXiv:2403.05246

Download references

Funding

This work was supported by the Chongqing Technology Innovation and Application Development Project (CSTB2022TIAD-KPX0176, CSTB2023TIAD-KPX0050), Fundamental Research Funds for the Central Universities (2022CDJYGRH-015), a key joint project of Chongqing Health Commission and Science and Technology Bureau (2024ZDXM007), Project of Chongqing Key Laboratory of Emergency Medicine (2024RCCX01), Key Project of Science and Technology Research Program of Chongqing Municipal Education Commission (KJZD-K202400106).

Author information

Authors and Affiliations

Department of Data Science and AI, Monash University, Melbourne, VIC, 3800, Australia
Qiang Gao, Lan Du & Cunjian Chen
College of Computer Science, Chongqing University, 400044, Chongqing, China
Yi Wang, Feiyan Zhou, Jing Wen, Yong Li & Bin Fang
Department of Neurosurgery, Chongqing Key Laboratory of Emergency Medicine, Chongqing Emergency Medical Center, Chongqing University Central Hospital, 400014, Chongqing, China
Peng Chen

Authors

Qiang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Feiyan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jing Wen
View author publications
You can also search for this author in PubMed Google Scholar
Yong Li
View author publications
You can also search for this author in PubMed Google Scholar
Bin Fang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lan Du
View author publications
You can also search for this author in PubMed Google Scholar
Cunjian Chen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Qiang Gao: Conceptualization, Methodology, Software, Data curation, Writing—original draft. Yi Wang: Supervision, Writing—review and editing. Feiyan Zhou Validation, Writing—review and editing. Jing Wen: Writing—review and editing. Yong Li: Supervision, Validation. Bin Fang: Validation. Peng Chen: Validation. Lan Du: Supervision, Validation. Cunjian Chen: Supervision, Writing—review and editing.

Corresponding author

Correspondence to Yi Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gao, Q., Wang, Y., Zhou, F. et al. MSFM-UNET: enhancing medical image segmentation with multi-scale and multi-view frequency fusion. Pattern Anal Applic 28, 17 (2025). https://doi.org/10.1007/s10044-024-01384-8

Download citation

Received: 15 October 2024
Accepted: 19 November 2024
Published: 07 January 2025
DOI: https://doi.org/10.1007/s10044-024-01384-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MSFM-UNET: enhancing medical image segmentation with multi-scale and multi-view frequency fusion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A decoder-free feature aggregation network for medical image segmentation

ScaleNet: Rethinking Feature Interaction from a Scale-Wise Perspective for Medical Image Segmentation

EM-Net: Efficient Channel and Frequency Learning with Mamba for 3D Medical Image Segmentation

Data availibility

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

MSFM-UNET: enhancing medical image segmentation with multi-scale and multi-view frequency fusion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A decoder-free feature aggregation network for medical image segmentation

ScaleNet: Rethinking Feature Interaction from a Scale-Wise Perspective for Medical Image Segmentation

EM-Net: Efficient Channel and Frequency Learning with Mamba for 3D Medical Image Segmentation

Data availibility

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation