research-article

MAGIC: Rethinking Dynamic Convolution Design for Medical Image Segmentation

Authors:

Qingyuan Xiang,

Zheng LiAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 9106 - 9115

https://doi.org/10.1145/3664647.3680754

Published: 28 October 2024 Publication History

Abstract

Recently, dynamic convolution shows performance boost for the CNN-related networks in medical image segmentation. The core idea is to replace static convolutional kernel with a linear combination of multiple convolutional kernels, conditioned on input-dependent attention function. However, the existing dynamic convolution design suffers from two limitations: i) The convolutional kernels are weighted by enforcing a single-dimensional attention function upon the input maps, overlooking the synergy in multi-dimensional information. This results in sub-optimal computations of convolution kernels. ii) The linear kernel aggregation is inefficient, restricting the model's capacity to learn more intricate patterns. In this paper, we rethink the dynamic convolution design to address these limitations and propose multi-dimensional aggregation dynamic convolution (MAGIC). Specifically, our MAGIC introduce a dimensional-reciprocal fusion module to capture correlations among input maps across the spatial, channel, and global dimensions simultaneously for computing convolutional kernels. Furthermore, we design kernel recalculation module, which enhances the efficiency of aggregation through learning the interaction between kernels. As a drop-in replacement for regular convolution, our MAGIC can be flexibly integrated into prevalent pure CNN or hybrid CNN-Transformer backbones. The extensive experiments on four benchmarks demonstrate that our MAGIC outperforms regular convolution and existing dynamic convolution. Code is available at: https://github.com/Segment82/MAGIC

References

[1]

Mohammed A Al-Masni, Mugahed A Al-Antari, Mun-Taek Choi, Seung-Moo Han, and Tae-Seong Kim. 2018. Skin lesion segmentation in dermoscopy images via deep full resolution convolutional networks. Computer methods and programs in biomedicine, Vol. 162 (2018), 221--231.

[2]

Olivier Bernard, Alain Lalande, Clement Zotti, Frederick Cervenansky, Xin Yang, Pheng-Ann Heng, Irem Cetin, Karim Lekadir, Oscar Camara, Miguel Angel Gonzalez Ballester, et al. 2018. Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE transactions on medical imaging, Vol. 37, 11 (2018), 2514--2525.

[3]

Antoni Buades, Bartomeu Coll, and J-M Morel. 2005. A non-local algorithm for image denoising. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), Vol. 2. Ieee, 60--65.

[4]

Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, and Han Hu. 2020. Global context networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 6 (2020), 6881--6895.

Digital Library

[5]

Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N Balasubramanian. 2018. Grad-cam: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, 839--847.

[6]

Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L Yuille, and Yuyin Zhou. 2021. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021).

[7]

Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, and Tat-Seng Chua. 2017. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5659--5667.

[8]

Weidong Chen, Dexiang Hong, Yuankai Qi, Zhenjun Han, Shuhui Wang, Laiyun Qing, Qingming Huang, and Guorong Li. 2022. Multi-attention network for compressed video referring object segmentation. In Proceedings of the 30th ACM International Conference on Multimedia. 4416--4425.

Digital Library

[9]

Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Lu Yuan, and Zicheng Liu. 2020. Dynamic Convolution: Attention Over Convolution Kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]

Noel CF Codella, David Gutman, M Emre Celebi, Brian Helba, Michael A Marchetti, Stephen W Dusza, Aadi Kalloo, Konstantinos Liopyris, Nabin Mishra, Harald Kittler, et al. 2018. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018). IEEE, 168--172.

[11]

Ali Diba, Vivek Sharma, Luc Van Gool, and Rainer Stiefelhagen. 2019. DynamoNet: Dynamic Action and Motion Network. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

[12]

Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Chen, and Baining Guo. 2022. Cswin transformer: A general vision transformer backbone with cross-shaped windows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12124--12134.

[13]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[14]

Xianyong Fang, Yuqing Shi, Qingqing Guo, Linbo Wang, and Zhengyi Liu. 2023. Sub-band based attention for robust polyp segmentation. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. 736--744.

Digital Library

[15]

Shuanglang Feng, Heming Zhao, Fei Shi, Xuena Cheng, Meng Wang, Yuhui Ma, Dehui Xiang, Weifang Zhu, and Xinjian Chen. 2020. CPFNet: Context pyramid fusion network for medical image segmentation. IEEE transactions on medical imaging, Vol. 39, 10 (2020), 3008--3018.

[16]

Zhengyang Geng, Meng-Hao Guo, Hongxu Chen, Xia Li, Ke Wei, and Zhouchen Lin. 2020. Is Attention Better Than Matrix Decomposition?. In International Conference on Learning Representations.

[17]

Hayit Greenspan, Bram Van Ginneken, and Ronald M Summers. 2016. Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE transactions on medical imaging, Vol. 35, 5 (2016), 1153--1159.

[18]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[19]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132--7141.

[20]

Xiaodan Hu, Mohamed A Naiel, Alexander Wong, Mark Lamm, and Paul Fieguth. 2019. RUNet: A robust UNet architecture for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0--0.

[21]

Huimin Huang, Lanfen Lin, Ruofeng Tong, Hongjie Hu, Qiaowei Zhang, Yutaro Iwamoto, Xianhua Han, Yen-Wei Chen, and Jian Wu. 2020. Unet 3: A full-scale connected unet for medical image segmentation. In ICASSP 2020--2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 1055--1059.

[22]

Drew A Hudson and Larry Zitnick. 2021. Generative adversarial transformers. In International conference on machine learning. PMLR, 4487--4499.

[23]

Debesh Jha, Pia H Smedsrud, Michael A Riegler, Dag Johansen, Thomas De Lange, Pål Halvorsen, and Håvard D Johansen. 2019. Resunet: An advanced architecture for medical image segmentation. In 2019 IEEE international symposium on multimedia (ISM). IEEE, 225--2255.

[24]

Xu Jia, Bert De Brabandere, Tinne Tuytelaars, and Luc V Gool. 2016. Dynamic Filter Networks. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Vol. 29. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2016/file/8bf1211fd4b7b94528899de0a43b9fb3-Paper.pdf

[25]

A Emre Kavur, N Sinem Gezer, Mustafa Barics, Sinem Aslan, Pierre-Henri Conze, Vladimir Groza, Duc Duy Pham, Soumick Chatterjee, Philipp Ernst, Savacs Özkan, et al. 2021. CHAOS challenge-combined (CT-MR) healthy abdominal organ segmentation. Medical Image Analysis, Vol. 69 (2021), 101950.

[26]

Taehun Kim, Hyemin Lee, and Daijin Kim. 2021. Uacanet: Uncertainty augmented context attention for polyp segmentation. In Proceedings of the 29th ACM international conference on multimedia. 2167--2175.

Digital Library

[27]

Bennett Landman, Zhoubing Xu, J Igelsias, Martin Styner, Thomas Langerak, and Arno Klein. 2015. Miccai multi-atlas labeling beyond the cranial vault--workshop and challenge. In Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault?Workshop Challenge, Vol. 5. 12.

[28]

Tao Lei, Rui Sun, Xuan Wang, Yingbo Wang, Xi He, and Asoke Nandi. 2023. CiT-Net: convolutional neural networks hand in hand with vision transformers for medical image segmentation. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. 1017--1025.

Digital Library

[29]

Chenxin Li, Mingbao Lin, Zhiyuan Ding, Nie Lin, Yihong Zhuang, Yue Huang, Xinghao Ding, and Liujuan Cao. 2022. Knowledge condensation distillation. In European Conference on Computer Vision. Springer, 19--35.

Digital Library

[30]

Chenxin Li, Wenao Ma, Liyan Sun, Xinghao Ding, Yue Huang, Guisheng Wang, and Yizhou Yu. 2022. Hierarchical deep network with uncertainty-aware semi-supervised learning for vessel segmentation. Neural Computing and Applications (2022), 1--14.

[31]

Chao Li, Aojun Zhou, and Anbang Yao. 2022. Omni-Dimensional Dynamic Convolution. In International Conference on Learning Representations. https://openreview.net/forum?id=DmpCfq6Mg39

[32]

Yunsheng Li, Yinpeng Chen, Xiyang Dai, mengchen liu, Dongdong Chen, Ye Yu, Lu Yuan, Zicheng Liu, Mei Chen, and Nuno Vasconcelos. 2021. Revisiting Dynamic Convolution via Matrix Decomposition. In International Conference on Learning Representations. https://openreview.net/forum?id=YwpZmcAehZ

[33]

Zihan Li, Yuan Zheng, Xiangde Luo, Dandan Shan, and Qingqi Hong. 2023. ScribbleVC: Scribble-supervised Medical Image Segmentation with Vision-Class Embedding. In Proceedings of the 31st ACM International Conference on Multimedia. 3384--3393.

Digital Library

[34]

Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. 2017. A survey on deep learning in medical image analysis. Medical image analysis, Vol. 42 (2017), 60--88.

[35]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision. 10012--10022.

[36]

Ningning Ma, Xiangyu Zhang, Jiawei Huang, and Jian Sun. 2020. Weightnet: Revisiting the design space of weight networks. In European Conference on Computer Vision. Springer, 776--792.

Digital Library

[37]

Dong Nie, Yaozong Gao, Li Wang, and Dinggang Shen. 2018. ASDNet: Attention based semi-supervised deep networks for medical image segmentation. In Medical Image Computing and Computer Assisted Intervention--MICCAI 2018: 21st International Conference, Granada, Spain, September 16--20, 2018, Proceedings, Part IV 11. Springer, 370--378.

Digital Library

[38]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention--MICCAI 2015: 18th International Conference, Munich, Germany, October 5--9, 2015, Proceedings, Part III 18. Springer, 234--241.

[39]

Korsuk Sirinukunwattana, Josien PW Pluim, Hao Chen, Xiaojuan Qi, Pheng-Ann Heng, Yun Bo Guo, Li Yang Wang, Bogdan J Matuszewski, Elia Bruni, Urko Sanchez, et al. 2017. Gland segmentation in colon histology images: The glas challenge contest. Medical image analysis, Vol. 35 (2017), 489--502.

[40]

Liyan Sun, Chenxin Li, Xinghao Ding, Yue Huang, Zhong Chen, Guisheng Wang, Yizhou Yu, and John Paisley. 2022. Few-shot medical image segmentation using a global correlation network with discriminative embedding. Computers in biology and medicine, Vol. 140 (2022), 105067.

[41]

Xin Sun, Changrui Chen, Xiaorui Wang, Junyu Dong, Huiyu Zhou, and Sheng Chen. 2021. Gaussian dynamic convolution for efficient single-image segmentation. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 5 (2021), 2937--2948.

Digital Library

[42]

Yucheng Tang, Dong Yang, Wenqi Li, Holger R Roth, Bennett Landman, Daguang Xu, Vishwesh Nath, and Ali Hatamizadeh. 2022. Self-supervised pre-training of swin transformers for 3d medical image analysis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 20730--20740.

[43]

Yunbin Tu, Liang Li, Li Su, Junping Du, Ke Lu, and Qingming Huang. 2023. Viewpoint-Adaptive Representation Disentanglement Network for Change Captioning. IEEE Transactions on Image Processing, Vol. 32 (2023), 2620--2635.

Digital Library

[44]

Yunbin Tu, Liang Li, Li Su, Shengxiang Gao, Chenggang Yan, Zheng-Jun Zha, Zhengtao Yu, and Qingming Huang. 2022. I^2Transformer: Intra-and inter-relation embedding transformer for TV show captioning. IEEE Transactions on Image Processing, Vol. 31 (2022), 3565--3577.

[45]

Yunbin Tu, Liang Li, Li Su, Zheng-Jun Zha, and Qingming Huang. 2024. SMART: Syntax-Calibrated Multi-Aspect Relation Transformer for Change Captioning. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 46, 7 (2024), 4926--4943.

Digital Library

[46]

Yunbin Tu, Liang Li, Li Su, Zheng-Jun Zha, Chenggang Yan, and Qingming Huang. 2023. Self-supervised Cross-view Representation Reconstruction for Change Captioning. In Proceedings of the IEEE/CVF international conference on computer vision. 2805--2815.

[47]

Hongyi Wang, Shiao Xie, Lanfen Lin, Yutaro Iwamoto, Xian-Hua Han, Yen-Wei Chen, and Ruofeng Tong. 2022. Mixed transformer u-net for medical image segmentation. In ICASSP 2022--2022 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2390--2394.

[48]

Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7794--7803.

[49]

Xierui Wang, Hanning Ying, Xiaoyin Xu, Xiujun Cai, and Min Zhang. 2023. TransLiver: A Hybrid Transformer Model for Multi-phase Liver Lesion Classification. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 329--338.

[50]

Brandon Yang, Gabriel Bender, Quoc V Le, and Jiquan Ngiam. 2019. CondConv: Conditionally Parameterized Convolutions for Efficient Inference. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/file/f2201f5191c4e92cc5af043eebfd0946-Paper.pdf

[51]

Feiniu Yuan, Zhengxiao Zhang, and Zhijun Fang. 2023. An effective CNN and Transformer complementary network for medical image segmentation. Pattern Recognition, Vol. 136 (2023), 109228.

Digital Library

[52]

Boxiang Yun, Xingran Xie, Qingli Li, and Yan Wang. 2023. Uni-Dual: A Generic Unified Dual-Task Medical Self-Supervised Learning Framework. In Proceedings of the 31st ACM International Conference on Multimedia. 3887--3896.

Digital Library

[53]

Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R Manmatha, et al. 2022. Resnest: Split-attention networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2736--2746.

[54]

Yundong Zhang, Huiye Liu, and Qiang Hu. 2021. Transfuse: Fusing transformers and cnns for medical image segmentation. In Medical Image Computing and Computer Assisted Intervention--MICCAI 2021: 24th International Conference, Strasbourg, France, September 27--October 1, 2021, Proceedings, Part I 24. Springer, 14--24.

[55]

Yingying Zhang, Shengsheng Qian, Quan Fang, and Changsheng Xu. 2019. Multi-modal knowledge-aware hierarchical attention network for explainable medical question answering. In Proceedings of the 27th ACM international conference on multimedia. 1089--1097.

Digital Library

[56]

Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip HS Torr, et al. 2021. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6881--6890.

[57]

Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang. 2018. Unet: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4. Springer, 3--11.

Digital Library

Index Terms

MAGIC: Rethinking Dynamic Convolution Design for Medical Image Segmentation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation

Recommendations

LiteTrans: Reconstruct Transformer with Convolution for Medical Image Segmentation
Bioinformatics Research and Applications
Abstract
The combination of convolution and Transformer applied to medical image segmentation has achieved great success. However, it still cannot reach extremely accurate segmentation on complex and low-contrast anatomical structures under lower ...
DSNet: Dynamic Selection Network for Biomedical Image Segmentation
Artificial Neural Networks and Machine Learning – ICANN 2021
Abstract
This paper focuses on uterine segmentation, an important clue for understanding MRI images and medical analysis of expectant mothers, which has long been underestimated. Related works have proven that the receptive field is crucial in computer ...
Hybrid dilation and attention residual U-Net for medical image segmentation
Abstract
Medical image segmentation is a typical task in medical image processing and critical foundation in medical image analysis. U-Net is well-liked in medical image segmentation, but it doesn't fully explore useful features of the channel ...
Graphical abstract

Display Omitted
Highlights
- A medical image segmentation method is proposed based on U-Net.
- A novel channel ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key Research and Development Program of China
National Natural Science Foundation of China
Science and Technology Planning Project of Sichuan Province

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
64
Total Downloads

Downloads (Last 12 months)64
Downloads (Last 6 weeks)17

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten