skip to main content
10.1145/3622896.3622917acmotherconferencesArticle/Chapter ViewAbstractPublication PagesccrisConference Proceedingsconference-collections
research-article

MSTCNet: Parallel Multi-Scale Network For Medical Image Segmentation

Published: 03 October 2023 Publication History

Abstract

Transformer-like architectures, which are the model of choice in the field of natural language processing, have recently been adapted to computer vision (CV) fields and demonstrated remarkable effectiveness on various CV tasks. However, current transformer-based methods require large-scale datasets, which are usually unavailable in medical image analysis, thus resulting in adverse achievement. To this end, we propose a novel segmentation model, named MSTCNet, which constructs a parallel multi-scale transformer (MST) encoder in U-Net. In MST, we devise multi-scale patch partition and multi-scale mix attention to perform multi-scale long-range dependencies modeling. The U-Net encoder paralleled with MST alleviates the burden of large-scale datasets and extract local features supplementarily. We also propose Feature Fusion Head to narrow the gap between convolutional features and transformer features. Sufficient experiments demonstrate that our MSTCNet outperforms state-of-the-art methods on GlaS and ISIC18 datasets and is more suitable for medical image segmentation with small-scale datasets.

References

[1]
Hu Cao, Yueyue Wang, Joy Chen, Dongsheng Jiang, Xiaopeng Zhang, Qi Tian, and Manning Wang. 2021. Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537 (2021).
[2]
Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L Yuille, and Yuyin Zhou. 2021. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021).
[3]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40, 4 (2017), 834–848.
[4]
Noel Codella, Veronica Rotemberg, Philipp Tschandl, M Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, 2019. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368 (2019).
[5]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[6]
Deng-Ping Fan, Ge-Peng Ji, Tao Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, and Ling Shao. 2020. Pranet: Parallel reverse attention network for polyp segmentation. In International conference on medical image computing and computer-assisted intervention. Springer, 263–273.
[7]
Yunhe Gao, Mu Zhou, and Dimitris N Metaxas. 2021. UTNet: a hybrid transformer architecture for medical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 61–71.
[8]
Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, and Wenyu Liu. 2019. Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision. 603–612.
[9]
Md Amirul Islam, Sen Jia, and Neil DB Bruce. 2020. How much position information do convolutional neural networks encode?arXiv preprint arXiv:2001.08248 (2020).
[10]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2117–2125.
[11]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012–10022.
[12]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. 234–241.
[13]
Jo Schlemper, Ozan Oktay, Michiel Schaap, Mattias Heinrich, Bernhard Kainz, Ben Glocker, and Daniel Rueckert. 2019. Attention gated networks: Learning to leverage salient regions in medical images. Medical image analysis 53 (2019), 197–207.
[14]
Korsuk Sirinukunwattana, Josien PW Pluim, Hao Chen, Xiaojuan Qi, Pheng-Ann Heng, Yun Bo Guo, Li Yang Wang, Bogdan J Matuszewski, Elia Bruni, Urko Sanchez, 2017. Gland segmentation in colon histology images: The glas challenge contest. Medical image analysis 35 (2017), 489–502.
[15]
Philipp Tschandl, Cliff Rosendahl, and Harald Kittler. 2018. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data 5, 1 (2018), 1–9.
[16]
Jeya Maria Jose Valanarasu, Poojan Oza, Ilker Hacihaliloglu, and Vishal M Patel. 2021. Medical transformer: Gated axial-attention for medical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 36–46.
[17]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[18]
Haonan Wang, Peng Cao, Jiaqi Wang, and Osmar R Zaiane. 2022. Uctransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 2441–2449.
[19]
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. 2021. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 568–578.
[20]
Xiao Xiao, Shen Lian, Zhiming Luo, and Shaozi Li. 2018. Weighted res-unet for high-quality retina vessel segmentation. In 2018 9th international conference on information technology in medicine and education (ITME). IEEE, 327–331.
[21]
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. 2021. SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems 34 (2021), 12077–12090.
[22]
Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang. 2019. Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE transactions on medical imaging 39, 6 (2019), 1856–1867.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CCRIS '23: Proceedings of the 2023 4th International Conference on Control, Robotics and Intelligent System
August 2023
215 pages
ISBN:9798400708190
DOI:10.1145/3622896
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. attention mechanism
  2. encoder-decoder
  3. medical image segmentation
  4. multi scale
  5. vison transformer

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Remote One-to-Many Wireless Charging Empowers the Internet of Things, Special Program of Leading Entrepreneurship Talent (Team) of Yongjiang River in Nanning City
  • South China University of Technology - Guilin Medical University 5G Intelligent Medical Platform and Demonstration Base Construction, Special Program of Base Construction and Outstanding Scholarship for Science and Technology Department of Guangxi Province

Conference

CCRIS 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 45
    Total Downloads
  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media