ABSTRACT
In the realm of medical imaging, distinct magnetic resonance imaging (MRI) modalities can provide complementary medical insights. However, it is not uncommon for one or more modalities to be absent due to image corruption, artifacts, acquisition protocols, allergies to contrast agents, or cost constraints, posing a significant challenge for perceiving the modality-absent state in incomplete modality segmentation.In this work, we introduce a novel incomplete multi-modal segmentation framework called Modal-aware Visual Prompting (MAVP), which draws inspiration from the widely used pre-training and prompt adjustment protocol employed in natural language processing (NLP). In contrast to previous prompts that typically use textual network embeddings, we utilize embeddings as the prompts generated by a modality state classifier that focuses on the missing modality states. Additionally, we integrate modality state prompts into both the extraction stage of each modality and the modality fusion stage to facilitate intra/inter-modal adaptation. Our approach achieves state-of-the-art performance in various modality-incomplete scenarios compared to incomplete modality-specific solutions.
Supplemental Material
- Reza Azad, Nika Khosravi, and Dorit Merhof. 2022. SMU-Net: Style matching U-Net for brain tumor segmentation with missing modalities. In International Conference on Medical Imaging with Deep Learning. PMLR, 48--62.Google Scholar
- Hyojin Bahng, Ali Jahanian, Swami Sankaranarayanan, and Phillip Isola. 2022. Exploring Visual Prompts for Adapting Large-Scale Models. arxiv: 2203.17274 [cs.CV]Google Scholar
- Tejus A Bale and Marc K Rosenblum. 2022. The 2021 WHO classification of tumors of the central nervous system: an update on pediatric low-grade gliomas and glioneuronal tumors. Brain Pathology, Vol. 32, 4 (2022), e13060.Google ScholarCross Ref
- Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Globerson, and Alexei Efros. 2022. Visual prompting via image inpainting. Advances in Neural Information Processing Systems, Vol. 35 (2022), 25005--25017.Google Scholar
- Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. 2022. BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Dublin, Ireland, 1--9. https://doi.org/10.18653/v1/2022.acl-short.1Google ScholarCross Ref
- Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.Google Scholar
- Cheng Chen, Qi Dou, Yueming Jin, Hao Chen, Jing Qin, and Pheng-Ann Heng. 2019. Robust multimodal brain tumor segmentation via feature disentanglement and gated fusion. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 447--456.Google ScholarDigital Library
- Cheng Chen, Qi Dou, Yueming Jin, Quande Liu, and Pheng Ann Heng. 2021. Learning with privileged multimodal knowledge for unimodal segmentation. IEEE Transactions on Medical Imaging, Vol. 41, 3 (2021), 621--632.Google ScholarCross Ref
- Shoufa Chen, Chongjian GE, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. 2022. AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition. In Advances in Neural Information Processing Systems, Vol. 35. Curran Associates, Inc., 16664--16678.Google Scholar
- Lee R Dice. 1945. Measures of the amount of ecologic association between species. Ecology, Vol. 26, 3 (1945), 297--302.Google ScholarCross Ref
- Yuhang Ding, Xin Yu, and Yi Yang. 2021a. Modeling the probabilistic distribution of unlabeled data for one-shot medical image segmentation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 'AAAI Press', 1246--1254.Google ScholarCross Ref
- Yuhang Ding, Xin Yu, and Yi Yang. 2021b. RFNet: Region-aware fusion network for incomplete multi-modal brain tumor segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 'IEEE', 3975--3984.Google ScholarCross Ref
- Reuben Dorent, Samuel Joutard, Marc Modat, Sébastien Ourselin, and Tom Vercauteren. 2019. Hetero-modal variational encoder-decoder for joint modality completion and segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 74--82.Google ScholarDigital Library
- Ian Goodfellow et al. 2014. Generative adversarial nets. Adv. Neural Inf. Process. Syst., Vol. 27 (2014).Google Scholar
- Mohammad Havaei, Nicolas Guizard, Nicolas Chapados, and Yoshua Bengio. 2016. Hemis: Hetero-modal image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 469--477.Google ScholarDigital Library
- Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016).Google Scholar
- Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning. PMLR, 2790--2799.Google Scholar
- Minhao Hu, Matthis Maillard, Ya Zhang, Tommaso Ciceri, Giammarco La Barbera, Isabelle Bloch, and Pietro Gori. 2020. Knowledge distillation from multi-modal to mono-modal segmentation networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 772--781.Google ScholarDigital Library
- Ziqi Huang, Li Lin, Pujin Cheng, Kai Pan, and Xiaoying Tang. 2022. DS 3-Net: Difficulty-Perceived Common-to-T1ce Semi-supervised Multimodal MRI Synthesis Network. In Medical Image Computing and Computer Assisted Intervention. Springer, 571--581.Google Scholar
- Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. 2022a. Visual prompt tuning. In Computer Vision - ECCV 2022 - 17th European Conference, Vol. '13693'. Springer, 'Israel', '709--727'.Google Scholar
- Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. 2022b. Visual prompt tuning. In ECCV. 'Springer', '709--727'.Google Scholar
- Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, and Fahad Shahbaz Khan. 2022. MaPLe: Multi-modal Prompt Learning. ArXiv:2210.03117, Vol. 'abs/2210.03117' (2022).Google Scholar
- Sein Kim, Namkyeong Lee, Junseok Lee, Dongmin Hyun, and Chanyoung Park. 2022. Heterogeneous Graph Learning for Multi-modal Medical Data Analysis. In AAAI.Google Scholar
- Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proc. Int. Conf. Learn. Represent.Google Scholar
- Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. 2023. Segment Anything. arXiv:2304.02643 (2023).Google Scholar
- Dongwook Lee, Won-Jin Moon, and Jong Chul Ye. 2020. Assessing the importance of magnetic resonance contrasts using collaborative generative adversarial networks. Nat. Mach. Intell., Vol. 2, 1 (2020), 34--42.Google ScholarCross Ref
- Ho Hin Lee, Shunxing Bao, Yuankai Huo, and Bennett A Landman. 2022. 3D UX-Net: A Large Kernel Volumetric ConvNet Modernizing Hierarchical Transformer for Medical Image Segmentation. arXiv preprint arXiv:2209.15076, Vol. 'abs/2209.15076' (2022).Google Scholar
- Yi-Lun Lee, Yi-Hsuan Tsai, Wei-Chen Chiu, and Chen-Yu Lee. 2023. Multimodal Prompting with Missing Modalities for Visual Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 'IEEE'.Google Scholar
- Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP. Association for Computational Linguistics, 'Punta Cana', '3045--3059'.Google ScholarCross Ref
- Rongjian Li, Wenlu Zhang, Heung-Il Suk, Li Wang, Jiang Li, Dinggang Shen, and Shuiwang Ji. 2014. Deep learning based imaging data completion for improved brain disease diagnosis. In International conference on medical image computing and computer-assisted intervention. Springer, 305--312.Google ScholarDigital Library
- Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 4582--4597. https://doi.org/10.18653/v1/2021.acl-long.353Google ScholarCross Ref
- Sheng Liang, Mengjie Zhao, and Hinrich Schütze. 2022. Modular and Parameter-Efficient Multimodal Fusion with Prompting. In Findings of the Association for Computational Linguistics: ACL 2022. 'Association for Computational Linguistics', '2976--2985'.Google Scholar
- Han Liu, Yubo Fan, Hao Li, Jiacheng Wang, Dewei Hu, Can Cui, Ho Hin Lee, Huahong Zhang, and Ipek Oguz. 2022. ModDrop: A Dynamic Filter Network with Intra-subject Co-training for Multiple Sclerosis Lesion Segmentation with Missing Modalities. In Medical Image Computing and Computer Assisted Intervention. Springer, 444--453.Google Scholar
- Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023 b. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv., Vol. '55', '9' (2023), '195:1--195:35'.Google Scholar
- Weihuang Liu, Xi Shen, Chi-Man Pun, and Xiaodong Cun. 2023 a. Explicit Visual Prompting for Low-Level Structure Segmentations. In CPVR. 'IEEE'.Google Scholar
- Yanbei Liu, Lianxi Fan, Changqing Zhang, Tao Zhou, Zhitao Xiao, Lei Geng, and Dinggang Shen. 2021. Incomplete multi-modal representation learning for Alzheimer's disease diagnosis. Medical Image Analysis, Vol. 69 (2021), 101953.Google ScholarCross Ref
- Mengmeng Ma, Jian Ren, Long Zhao, Davide Testuggine, and Xi Peng. 2022. Are Multimodal Transformers Robust to Missing Modality?. In CVPR. 'IEEE', '18156--18165'.Google Scholar
- Mengmeng Ma, Jian Ren, Long Zhao, Sergey Tulyakov, Cathy Wu, and Xi Peng. 2021. SMIL: Multimodal learning with severely missing modality. In AAAI. 'AAAI Press', '2302--2310'.Google Scholar
- Bjoern H Menze, Andras Jakab, Stefan Bauer, Jayashree Kalpathy-Cramer, Keyvan Farahani, Justin Kirby, Yuliya Burren, Nicole Porz, Johannes Slotboom, Roland Wiest, et al. 2014. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE transactions on medical imaging, Vol. 34, 10 (2014), 1993--2024.Google Scholar
- OpenAI. 2023. GPT-4 Technical Report. ArXiv, Vol. abs/2303.08774 (2023).Google Scholar
- Himashi Peiris, Munawar Hayat, Zhaolin Chen, Gary Egan, and Mehrtash Harandi. 2022. A Robust Volumetric Transformer for Accurate 3D Tumor Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 162--172.Google ScholarDigital Library
- Shengju Qian, Hao Shao, Yi Zhu, Mu Li, and Jiaya Jia. 2021. Blending anti-aliasing into vision transformer. Advances in Neural Information Processing Systems, Vol. 34 (2021), 5416--5429.Google Scholar
- Shengju Qian, Yi Zhu, Wenbo Li, Mu Li, and Jiaya Jia. 2022. What Makes for Good Tokenizers in Vision Transformer? IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 'abs/2212.11115' (2022).Google Scholar
- Yansheng Qiu, Delin Chen, Hongdou Yao, Yongchao Xu, and Zheng Wang. 2023. Scratch Each Other's Back: Incomplete Multi-modal Brain Tumor Segmentation Via Category Aware Group Self-Support Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision.Google ScholarCross Ref
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 19. Springer, 234--241.Google Scholar
- Saikat Roy, Gregor Koehler, Constantin Ulrich, Michael Baumgartner, Jens Petersen, Fabian Isensee, Paul F Jaeger, and Klaus Maier-Hein. 2023. MedNeXt: Transformer-driven Scaling of ConvNets for Medical Image Segmentation. arXiv preprint arXiv:2303.09975, Vol. 'abs/2303.09975' (2023).Google Scholar
- Liyue Shen, Wentao Zhu, Xiaosong Wang, Lei Xing, John M Pauly, Baris Turkbey, Stephanie Anne Harmon, Thomas Hogue Sanford, Sherif Mehralivand, Peter L Choyke, et al. 2020. Multi-domain image completion for random missing input data. IEEE transactions on medical imaging, Vol. 40, 4 (2020), 1113--1122.Google Scholar
- Zhi-Xuan Tan, Harold Soh, and Desmond C. Ong. 2020. Factorized Inference in Deep Markov Models for Incomplete Multimodal Time Series. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 10334--10341.Google Scholar
- Yucheng Tang, Dong Yang, Wenqi Li, Holger R Roth, Bennett Landman, Daguang Xu, Vishwesh Nath, and Ali Hatamizadeh. 2022. Self-supervised pre-training of swin transformers for 3d medical image analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 'IEEE', 20730--20740.Google ScholarCross Ref
- Maria Tsimpoukelli, Jacob L Menick, Serkan Cabi, SM Eslami, Oriol Vinyals, and Felix Hill. 2021. Multimodal few-shot learning with frozen language models. NeurIPS (2021), '200--212'.Google Scholar
- Melissa Vibberts. 2021. Incomplete Scans and Lost Revenue In MRI. https://blog.beekley.com/incomplete-scans-and-lost-revenue-in-mri.Google Scholar
- Shuxin Wang, Shilei Cao, Dong Wei, Renzhen Wang, Kai Ma, Liansheng Wang, Deyu Meng, and Yefeng Zheng. 2020. LT-Net: Label transfer by learning reversible voxel-wise correspondence for one-shot medical image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 'IEEE', 9162--9171.Google ScholarCross Ref
- Yixin Wang, Yang Zhang, Yang Liu, Zihao Lin, Jiang Tian, Cheng Zhong, Zhongchao Shi, Jianping Fan, and Zhiqiang He. 2021. Acn: Adversarial co-training network for brain tumor segmentation with missing modalities. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 410--420.Google ScholarDigital Library
- Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoxi Sun, Han Zhang, Chen-Yu Lee, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, et al. 2022a. DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning. In ECCV. 'Springer', '631--648'.Google Scholar
- Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. 2022b. Learning to prompt for continual learning. In CVPR. 'IEEE', '139--149'.Google Scholar
- Jinyu Yang, Zhe Li, Feng Zheng, Ales Leonardis, and Jingkuan Song. 2022. Prompting for Multi-Modal Tracking. In ACM MM. 'ACM', '3492--3500'.Google Scholar
- Biting Yu, Luping Zhou, Lei Wang, Yinghuan Shi, Jurgen Fripp, and Pierrick Bourgeat. 2019. Ea-GANs: edge-aware generative adversarial networks for cross-modality MR image synthesis. IEEE Trans. Med. Imag., Vol. 38, 7 (2019), 1750--1762.Google ScholarCross Ref
- Jiandian Zeng, Tianyi Liu, and Jiantao Zhou. 2022. Tag-assisted Multimodal Sentiment Analysis under Uncertain Missing Modalities. In SIGIR. ACM, 'Madrid, Spain', '1545--1554'.Google Scholar
- Changqing Zhang, Yajie Cui, Zongbo Han, Joey Tianyi Zhou, Huazhu Fu, and Qinghua Hu. 2020. Deep partial multi-view learning. IEEE transactions on pattern analysis and machine intelligence, Vol. '44', '5' (2020), '2402--2415'.Google Scholar
- Yao Zhang, Nanjun He, Jiawei Yang, Yuexiang Li, Dong Wei, Yawen Huang, Yang Zhang, Zhiqiang He, and Yefeng Zheng. 2022. mmFormer: Multimodal Medical Transformer for Incomplete Multimodal Learning of Brain Tumor Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. 107--117.Google ScholarDigital Library
- Amy Zhao, Guha Balakrishnan, Fredo Durand, John V Guttag, and Adrian V Dalca. 2019. Data augmentation using learned transformations for one-shot medical image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 'IEEE', 8543--8553.Google ScholarCross Ref
- Jinming Zhao, Ruichen Li, and Qin Jin. 2021. Missing modality imagination network for emotion recognition with uncertain missing modalities. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP). 'Association for Computational Linguistics', 2608--2618' pages.Google Scholar
- Jinming Zhao, Ruichen Li, Qin Jin, Xinchao Wang, and Haizhou Li. 2022a. Memobert: Pre-training model with prompt-based learning for multimodal emotion recognition. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4703--4707.Google ScholarCross Ref
- Zechen Zhao, Heran Yang, and Jian Sun. 2022b. Modality-Adaptive Feature Interaction for Brain Tumor Segmentation with Missing Modalities. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 183--192.Google Scholar
- Ziyuan Zhao, Fangcheng Zhou, Kaixin Xu, Zeng Zeng, Cuntai Guan, and S. Kevin Zhou. 2023. LE-UDA: Label-Efficient Unsupervised Domain Adaptation for Medical Image Segmentation. IEEE Transactions on Medical Imaging, Vol. 42, 3 (2023), 633--646. https://doi.org/10.1109/TMI.2022.3214766Google ScholarCross Ref
- Ziyuan Zhao, Fangcheng Zhou, Zeng Zeng, Cuntai Guan, and S. Kevin Zhou. 2022c. Meta-hallucinator: Towards Few-Shot Cross-Modality Cardiac Image Segmentation. In Medical Image Computing and Computer Assisted Intervention - MICCAI 2022, Linwei Wang, Qi Dou, P. Thomas Fletcher, Stefanie Speidel, and Shuo Li (Eds.). Springer Nature Switzerland, Cham, 128--139.Google Scholar
- Hong-Yu Zhou, Jiansen Guo, Yinghao Zhang, Lequan Yu, Liansheng Wang, and Yizhou Yu. 2021. nnFormer: Interleaved Transformer for Volumetric Segmentation. CoRR, Vol. abs/2109.03201 (2021). https://arxiv.org/abs/2109.03201Google Scholar
- Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. 2022. Learning to prompt for vision-language models. Int. J. Comput. Vis., Vol. '130', '9' (2022), '2337--2348'.Google Scholar
Index Terms
- Modal-aware Visual Prompting for Incomplete Multi-modal Brain Tumor Segmentation
Recommendations
Multi-Modal Image Processing and Visualization: Application to PET-CT
CGI '16: Proceedings of the 33rd Computer Graphics InternationalMulti-modality medical imaging, such as positron emission tomography and computed tomography (PET-CT) depicts biological and physiological functions (from PET) within a higher resolution anatomical reference frame (from CT). Although it may seem counter-...
Modality-Adaptive Feature Interaction for Brain Tumor Segmentation with Missing Modalities
Medical Image Computing and Computer Assisted Intervention – MICCAI 2022AbstractMulti-modal Magnetic Resonance Imaging (MRI) plays a crucial role in brain tumor segmentation. However, missing modality is a common phenomenon in clinical practice, leading to performance degradation in tumor segmentation. Considering that there ...
Automated brain tumour segmentation techniques- A review
Automatic segmentation of brain tumour is the process of separating abnormal tissues from normal tissues, such as white matter WM, gray matter GM, and cerebrospinal fluid CSF. The process of segmentation is still challenging due to the diversity of ...
Comments