skip to main content
10.1145/3664647.3680754acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

MAGIC: Rethinking Dynamic Convolution Design for Medical Image Segmentation

Published: 28 October 2024 Publication History

Abstract

Recently, dynamic convolution shows performance boost for the CNN-related networks in medical image segmentation. The core idea is to replace static convolutional kernel with a linear combination of multiple convolutional kernels, conditioned on input-dependent attention function. However, the existing dynamic convolution design suffers from two limitations: i) The convolutional kernels are weighted by enforcing a single-dimensional attention function upon the input maps, overlooking the synergy in multi-dimensional information. This results in sub-optimal computations of convolution kernels. ii) The linear kernel aggregation is inefficient, restricting the model's capacity to learn more intricate patterns. In this paper, we rethink the dynamic convolution design to address these limitations and propose multi-dimensional aggregation dynamic convolution (MAGIC). Specifically, our MAGIC introduce a dimensional-reciprocal fusion module to capture correlations among input maps across the spatial, channel, and global dimensions simultaneously for computing convolutional kernels. Furthermore, we design kernel recalculation module, which enhances the efficiency of aggregation through learning the interaction between kernels. As a drop-in replacement for regular convolution, our MAGIC can be flexibly integrated into prevalent pure CNN or hybrid CNN-Transformer backbones. The extensive experiments on four benchmarks demonstrate that our MAGIC outperforms regular convolution and existing dynamic convolution. Code is available at: https://github.com/Segment82/MAGIC

References

[1]
Mohammed A Al-Masni, Mugahed A Al-Antari, Mun-Taek Choi, Seung-Moo Han, and Tae-Seong Kim. 2018. Skin lesion segmentation in dermoscopy images via deep full resolution convolutional networks. Computer methods and programs in biomedicine, Vol. 162 (2018), 221--231.
[2]
Olivier Bernard, Alain Lalande, Clement Zotti, Frederick Cervenansky, Xin Yang, Pheng-Ann Heng, Irem Cetin, Karim Lekadir, Oscar Camara, Miguel Angel Gonzalez Ballester, et al. 2018. Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE transactions on medical imaging, Vol. 37, 11 (2018), 2514--2525.
[3]
Antoni Buades, Bartomeu Coll, and J-M Morel. 2005. A non-local algorithm for image denoising. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), Vol. 2. Ieee, 60--65.
[4]
Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, and Han Hu. 2020. Global context networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 6 (2020), 6881--6895.
[5]
Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N Balasubramanian. 2018. Grad-cam: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, 839--847.
[6]
Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L Yuille, and Yuyin Zhou. 2021. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021).
[7]
Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, and Tat-Seng Chua. 2017. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5659--5667.
[8]
Weidong Chen, Dexiang Hong, Yuankai Qi, Zhenjun Han, Shuhui Wang, Laiyun Qing, Qingming Huang, and Guorong Li. 2022. Multi-attention network for compressed video referring object segmentation. In Proceedings of the 30th ACM International Conference on Multimedia. 4416--4425.
[9]
Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Lu Yuan, and Zicheng Liu. 2020. Dynamic Convolution: Attention Over Convolution Kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10]
Noel CF Codella, David Gutman, M Emre Celebi, Brian Helba, Michael A Marchetti, Stephen W Dusza, Aadi Kalloo, Konstantinos Liopyris, Nabin Mishra, Harald Kittler, et al. 2018. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018). IEEE, 168--172.
[11]
Ali Diba, Vivek Sharma, Luc Van Gool, and Rainer Stiefelhagen. 2019. DynamoNet: Dynamic Action and Motion Network. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
[12]
Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Chen, and Baining Guo. 2022. Cswin transformer: A general vision transformer backbone with cross-shaped windows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12124--12134.
[13]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[14]
Xianyong Fang, Yuqing Shi, Qingqing Guo, Linbo Wang, and Zhengyi Liu. 2023. Sub-band based attention for robust polyp segmentation. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. 736--744.
[15]
Shuanglang Feng, Heming Zhao, Fei Shi, Xuena Cheng, Meng Wang, Yuhui Ma, Dehui Xiang, Weifang Zhu, and Xinjian Chen. 2020. CPFNet: Context pyramid fusion network for medical image segmentation. IEEE transactions on medical imaging, Vol. 39, 10 (2020), 3008--3018.
[16]
Zhengyang Geng, Meng-Hao Guo, Hongxu Chen, Xia Li, Ke Wei, and Zhouchen Lin. 2020. Is Attention Better Than Matrix Decomposition?. In International Conference on Learning Representations.
[17]
Hayit Greenspan, Bram Van Ginneken, and Ronald M Summers. 2016. Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE transactions on medical imaging, Vol. 35, 5 (2016), 1153--1159.
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[19]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132--7141.
[20]
Xiaodan Hu, Mohamed A Naiel, Alexander Wong, Mark Lamm, and Paul Fieguth. 2019. RUNet: A robust UNet architecture for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0--0.
[21]
Huimin Huang, Lanfen Lin, Ruofeng Tong, Hongjie Hu, Qiaowei Zhang, Yutaro Iwamoto, Xianhua Han, Yen-Wei Chen, and Jian Wu. 2020. Unet 3: A full-scale connected unet for medical image segmentation. In ICASSP 2020--2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 1055--1059.
[22]
Drew A Hudson and Larry Zitnick. 2021. Generative adversarial transformers. In International conference on machine learning. PMLR, 4487--4499.
[23]
Debesh Jha, Pia H Smedsrud, Michael A Riegler, Dag Johansen, Thomas De Lange, Pål Halvorsen, and Håvard D Johansen. 2019. Resunet: An advanced architecture for medical image segmentation. In 2019 IEEE international symposium on multimedia (ISM). IEEE, 225--2255.
[24]
Xu Jia, Bert De Brabandere, Tinne Tuytelaars, and Luc V Gool. 2016. Dynamic Filter Networks. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Vol. 29. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2016/file/8bf1211fd4b7b94528899de0a43b9fb3-Paper.pdf
[25]
A Emre Kavur, N Sinem Gezer, Mustafa Barics, Sinem Aslan, Pierre-Henri Conze, Vladimir Groza, Duc Duy Pham, Soumick Chatterjee, Philipp Ernst, Savacs Özkan, et al. 2021. CHAOS challenge-combined (CT-MR) healthy abdominal organ segmentation. Medical Image Analysis, Vol. 69 (2021), 101950.
[26]
Taehun Kim, Hyemin Lee, and Daijin Kim. 2021. Uacanet: Uncertainty augmented context attention for polyp segmentation. In Proceedings of the 29th ACM international conference on multimedia. 2167--2175.
[27]
Bennett Landman, Zhoubing Xu, J Igelsias, Martin Styner, Thomas Langerak, and Arno Klein. 2015. Miccai multi-atlas labeling beyond the cranial vault--workshop and challenge. In Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault?Workshop Challenge, Vol. 5. 12.
[28]
Tao Lei, Rui Sun, Xuan Wang, Yingbo Wang, Xi He, and Asoke Nandi. 2023. CiT-Net: convolutional neural networks hand in hand with vision transformers for medical image segmentation. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. 1017--1025.
[29]
Chenxin Li, Mingbao Lin, Zhiyuan Ding, Nie Lin, Yihong Zhuang, Yue Huang, Xinghao Ding, and Liujuan Cao. 2022. Knowledge condensation distillation. In European Conference on Computer Vision. Springer, 19--35.
[30]
Chenxin Li, Wenao Ma, Liyan Sun, Xinghao Ding, Yue Huang, Guisheng Wang, and Yizhou Yu. 2022. Hierarchical deep network with uncertainty-aware semi-supervised learning for vessel segmentation. Neural Computing and Applications (2022), 1--14.
[31]
Chao Li, Aojun Zhou, and Anbang Yao. 2022. Omni-Dimensional Dynamic Convolution. In International Conference on Learning Representations. https://openreview.net/forum?id=DmpCfq6Mg39
[32]
Yunsheng Li, Yinpeng Chen, Xiyang Dai, mengchen liu, Dongdong Chen, Ye Yu, Lu Yuan, Zicheng Liu, Mei Chen, and Nuno Vasconcelos. 2021. Revisiting Dynamic Convolution via Matrix Decomposition. In International Conference on Learning Representations. https://openreview.net/forum?id=YwpZmcAehZ
[33]
Zihan Li, Yuan Zheng, Xiangde Luo, Dandan Shan, and Qingqi Hong. 2023. ScribbleVC: Scribble-supervised Medical Image Segmentation with Vision-Class Embedding. In Proceedings of the 31st ACM International Conference on Multimedia. 3384--3393.
[34]
Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. 2017. A survey on deep learning in medical image analysis. Medical image analysis, Vol. 42 (2017), 60--88.
[35]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision. 10012--10022.
[36]
Ningning Ma, Xiangyu Zhang, Jiawei Huang, and Jian Sun. 2020. Weightnet: Revisiting the design space of weight networks. In European Conference on Computer Vision. Springer, 776--792.
[37]
Dong Nie, Yaozong Gao, Li Wang, and Dinggang Shen. 2018. ASDNet: Attention based semi-supervised deep networks for medical image segmentation. In Medical Image Computing and Computer Assisted Intervention--MICCAI 2018: 21st International Conference, Granada, Spain, September 16--20, 2018, Proceedings, Part IV 11. Springer, 370--378.
[38]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention--MICCAI 2015: 18th International Conference, Munich, Germany, October 5--9, 2015, Proceedings, Part III 18. Springer, 234--241.
[39]
Korsuk Sirinukunwattana, Josien PW Pluim, Hao Chen, Xiaojuan Qi, Pheng-Ann Heng, Yun Bo Guo, Li Yang Wang, Bogdan J Matuszewski, Elia Bruni, Urko Sanchez, et al. 2017. Gland segmentation in colon histology images: The glas challenge contest. Medical image analysis, Vol. 35 (2017), 489--502.
[40]
Liyan Sun, Chenxin Li, Xinghao Ding, Yue Huang, Zhong Chen, Guisheng Wang, Yizhou Yu, and John Paisley. 2022. Few-shot medical image segmentation using a global correlation network with discriminative embedding. Computers in biology and medicine, Vol. 140 (2022), 105067.
[41]
Xin Sun, Changrui Chen, Xiaorui Wang, Junyu Dong, Huiyu Zhou, and Sheng Chen. 2021. Gaussian dynamic convolution for efficient single-image segmentation. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 5 (2021), 2937--2948.
[42]
Yucheng Tang, Dong Yang, Wenqi Li, Holger R Roth, Bennett Landman, Daguang Xu, Vishwesh Nath, and Ali Hatamizadeh. 2022. Self-supervised pre-training of swin transformers for 3d medical image analysis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 20730--20740.
[43]
Yunbin Tu, Liang Li, Li Su, Junping Du, Ke Lu, and Qingming Huang. 2023. Viewpoint-Adaptive Representation Disentanglement Network for Change Captioning. IEEE Transactions on Image Processing, Vol. 32 (2023), 2620--2635.
[44]
Yunbin Tu, Liang Li, Li Su, Shengxiang Gao, Chenggang Yan, Zheng-Jun Zha, Zhengtao Yu, and Qingming Huang. 2022. I^2Transformer: Intra-and inter-relation embedding transformer for TV show captioning. IEEE Transactions on Image Processing, Vol. 31 (2022), 3565--3577.
[45]
Yunbin Tu, Liang Li, Li Su, Zheng-Jun Zha, and Qingming Huang. 2024. SMART: Syntax-Calibrated Multi-Aspect Relation Transformer for Change Captioning. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 46, 7 (2024), 4926--4943.
[46]
Yunbin Tu, Liang Li, Li Su, Zheng-Jun Zha, Chenggang Yan, and Qingming Huang. 2023. Self-supervised Cross-view Representation Reconstruction for Change Captioning. In Proceedings of the IEEE/CVF international conference on computer vision. 2805--2815.
[47]
Hongyi Wang, Shiao Xie, Lanfen Lin, Yutaro Iwamoto, Xian-Hua Han, Yen-Wei Chen, and Ruofeng Tong. 2022. Mixed transformer u-net for medical image segmentation. In ICASSP 2022--2022 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2390--2394.
[48]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7794--7803.
[49]
Xierui Wang, Hanning Ying, Xiaoyin Xu, Xiujun Cai, and Min Zhang. 2023. TransLiver: A Hybrid Transformer Model for Multi-phase Liver Lesion Classification. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 329--338.
[50]
Brandon Yang, Gabriel Bender, Quoc V Le, and Jiquan Ngiam. 2019. CondConv: Conditionally Parameterized Convolutions for Efficient Inference. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/file/f2201f5191c4e92cc5af043eebfd0946-Paper.pdf
[51]
Feiniu Yuan, Zhengxiao Zhang, and Zhijun Fang. 2023. An effective CNN and Transformer complementary network for medical image segmentation. Pattern Recognition, Vol. 136 (2023), 109228.
[52]
Boxiang Yun, Xingran Xie, Qingli Li, and Yan Wang. 2023. Uni-Dual: A Generic Unified Dual-Task Medical Self-Supervised Learning Framework. In Proceedings of the 31st ACM International Conference on Multimedia. 3887--3896.
[53]
Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R Manmatha, et al. 2022. Resnest: Split-attention networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2736--2746.
[54]
Yundong Zhang, Huiye Liu, and Qiang Hu. 2021. Transfuse: Fusing transformers and cnns for medical image segmentation. In Medical Image Computing and Computer Assisted Intervention--MICCAI 2021: 24th International Conference, Strasbourg, France, September 27--October 1, 2021, Proceedings, Part I 24. Springer, 14--24.
[55]
Yingying Zhang, Shengsheng Qian, Quan Fang, and Changsheng Xu. 2019. Multi-modal knowledge-aware hierarchical attention network for explainable medical question answering. In Proceedings of the 27th ACM international conference on multimedia. 1089--1097.
[56]
Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip HS Torr, et al. 2021. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6881--6890.
[57]
Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang. 2018. Unet: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4. Springer, 3--11.

Index Terms

  1. MAGIC: Rethinking Dynamic Convolution Design for Medical Image Segmentation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dimensional-reciprocal fusion
    2. kernel recalculation
    3. medical image segmentation
    4. multi-dimensional aggregation dynamic convolution

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 64
      Total Downloads
    • Downloads (Last 12 months)64
    • Downloads (Last 6 weeks)17
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media