SFusion: Self-attention Based N-to-One Multimodal Fusion Block

Liu, Zecheng; Wei, Jia; Li, Rui; Zhou, Jianlong

doi:10.1007/978-3-031-43895-0_15

Zecheng Liu¹⁴,
Jia Wei¹⁴,
Rui Li¹⁵ &
…
Jianlong Zhou¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14221))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

5262 Accesses

Abstract

People perceive the world with different senses, such as sight, hearing, smell, and touch. Processing and fusing information from multiple modalities enables Artificial Intelligence to understand the world around us more easily. However, when there are missing modalities, the number of available modalities is different in diverse situations, which leads to an N-to-One fusion problem. To solve this problem, we propose a self-attention based fusion block called SFusion. Different from preset formulations or convolution based methods, the proposed block automatically learns to fuse available modalities without synthesizing or zero-padding missing ones. Specifically, the feature representations extracted from upstream processing model are projected as tokens and fed into self-attention module to generate latent multimodal correlations. Then, a modal attention mechanism is introduced to build a shared representation, which can be applied by the downstream decision model. The proposed SFusion can be easily integrated into existing multimodal analysis networks. In this work, we apply SFusion to different backbone networks for human activity recognition and brain tumor segmentation tasks. Extensive experimental results show that the SFusion block achieves better performance than the competing fusion strategies. Our code is available at https://github.com/scut-cszcl/SFusion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MDA-ViT: Multimodal image fusion using dual attention vision transformer

Article 15 August 2024

MutualFormer: Multi-modal Representation Learning via Cross-Diffusion Attention

Article 24 April 2024

CentralNet: A Multilayer Approach for Multimodal Fusion

References

Bakas, S., Menze, B., Davatzikos, C., Kalpathy-Cramer, J., Farahani, K., et al.: MICCAI Brain Tumor Segmentation (BraTS) 2020 Benchmark: Prediction of Survival and Pseudoprogression (Mar 2020). https://doi.org/10.5281/zenodo.3718904
Chartsias, A., Joyce, T., Giuffrida, M.V., Tsaftaris, S.A.: Multimodal mr synthesis via modality-invariant latent representation. IEEE Trans. Med. Imaging 37(3), 803–814 (2018). https://doi.org/10.1109/TMI.2017.2764326
Article Google Scholar
Chavarriaga, R., et al.: The opportunity challenge: a benchmark database for on-body sensor-based activity recognition. Pattern Recogn. Lett. 34(15), 2033–2042 (2013)
Article Google Scholar
Chen, C., Jafari, R., Kehtarnavaz, N.: Utd-mhad: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International conference on image processing (ICIP), pp. 168–172. IEEE (2015)
Google Scholar
Chen, C., Dou, Q., Jin, Y., Chen, H., Qin, J., Heng, P.-A.: Robust multimodal brain tumor segmentation via feature disentanglement and gated fusion. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11766, pp. 447–456. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32248-9_50
Chapter Google Scholar
Chen, C., Dou, Q., Jin, Y., Liu, Q., Heng, P.A.: Learning with privileged multimodal knowledge for unimodal segmentation. IEEE Trans. Medical Imaging (2021). https://doi.org/10.1109/TMI.2021.3119385
Choi, J.H., Lee, J.S.: Confidence-based deep multimodal fusion for activity recognition. In: Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers, pp. 1548–1556 (2018)
Google Scholar
Choi, J.H., Lee, J.S.: Embracenet: a robust deep learning architecture for multimodal classification. Information Fusion 51, 259–270 (2019)
Article Google Scholar
Choi, J.H., Lee, J.S.: Embracenet for activity: a deep multimodal fusion architecture for activity recognition. In: Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers, pp. 693–698 (2019)
Google Scholar
Dorent, R., Joutard, S., Modat, M., Ourselin, S., Vercauteren, T.: Hetero-modal variational encoder-decoder for joint modality completion and segmentation. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11765, pp. 74–82. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32245-8_9
Chapter Google Scholar
Graves, M.J., Mitchell, D.G.: Body mri artifacts in clinical practice: a physicist’s and radiologist’s perspective. J. Magn. Reson. Imaging 38(2), 269–287 (2013)
Article Google Scholar
Guo, Z., Li, X., Huang, H., Guo, N., Li, Q.: Deep learning-based image segmentation on multimodal medical imaging. IEEE Trans. Radiation Plasma Med. Sci. 3(2), 162–169 (2019)
Article Google Scholar
Havaei, M., Guizard, N., Chapados, N., Bengio, Y.: HeMIS: hetero-modal image segmentation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 469–477. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_54
Chapter Google Scholar
Hu, M., et al.: Knowledge distillation from multi-modal to mono-modal segmentation networks. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12261, pp. 772–781. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59710-8_75
Chapter Google Scholar
Isensee, F., et al.: nnu-net: self-adapting framework for u-net-based medical image segmentation. arXiv preprint arXiv:1809.10486 (2018)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lau, K., Adler, J., Sjölund, J.: A unified representation network for segmentation with missing modalities. arXiv preprint arXiv:1908.06683 (2019)
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: ICML (2011)
Google Scholar
Ouyang, J., Adeli, E., Pohl, K.M., Zhao, Q., Zaharchuk, G.: Representation disentanglement for multi-modal brain MRI analysis. In: Feragen, A., Sommer, S., Schnabel, J., Nielsen, M. (eds.) IPMI 2021. LNCS, vol. 12729, pp. 321–333. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78191-0_25
Chapter Google Scholar
Shen, L., et al.: Multi-domain image completion for random missing input data. IEEE Trans. Med. Imaging 40(4), 1113–1122 (2021). https://doi.org/10.1109/TMI.2020.3046444
Article MathSciNet Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Google Scholar
Wang, L., Gjoreski, H., Ciliberto, M., Mekki, S., Valentin, S., Roggen, D.: Enabling reproducible research in sensor-based transportation mode recognition with the sussex-huawei dataset. IEEE Access 7, 10870–10891 (2019)
Google Scholar
Wang, Y., et al.: ACN: adversarial co-training network for brain tumor segmentation with missing modalities. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12907, pp. 410–420. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87234-2_39
Chapter Google Scholar
Yang, Q., Guo, X., Chen, Z., Woo, P.Y., Yuan, Y.: D2-net: dual disentanglement network for brain tumor segmentation with missing modalities. IEEE Trans. Med. Imaging (2022)
Google Scholar
Zhou, T., Canu, S., Vera, P., Ruan, S.: Latent correlation representation learning for brain tumor segmentation with missing mri modalities. IEEE Trans. Image Process. 30, 4263–4274 (2021)
Article Google Scholar

Download references

Acknowledgements

This work is supported in part by the Guangdong Provincial Natural Science Foundation (2023A1515011431), the Guangzhou Science and Technology Planning Project (202201010092), the National Natural Science Foundation of China (72074105), NSF-1850492 and NSF-2045804.

Author information

Authors and Affiliations

School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
Zecheng Liu & Jia Wei
Golisano College of Computing and Information Sciences, Rochester Institute of Technology, Rochester, NY, USA
Rui Li
Data Science Institute, University of Technology Sydney, Ultimo, NSW, 2007, Australia
Jianlong Zhou

Authors

Zecheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jia Wei
View author publications
You can also search for this author in PubMed Google Scholar
Rui Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianlong Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jia Wei .

Editor information

Editors and Affiliations

Icahn School of Medicine, Mount Sinai, NYC, NY, USA, Tel Aviv University, Tel Aviv, Israel
Hayit Greenspan
Emory University, Atlanta, GA, USA
Anant Madabhushi
Queen’s University, Kingston, ON, Canada
Parvin Mousavi
The University of British Columbia, Vancouver, BC, Canada
Septimiu Salcudean
Yale University, New Haven, CT, USA
James Duncan
IBM Research, San Jose, CA, USA
Tanveer Syeda-Mahmood
Johns Hopkins University, Baltimore, MD, USA
Russell Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Z., Wei, J., Li, R., Zhou, J. (2023). SFusion: Self-attention Based N-to-One Multimodal Fusion Block. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14221. Springer, Cham. https://doi.org/10.1007/978-3-031-43895-0_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-43895-0_15
Published: 01 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43894-3
Online ISBN: 978-3-031-43895-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

SFusion: Self-attention Based N-to-One Multimodal Fusion Block

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

MDA-ViT: Multimodal image fusion using dual attention vision transformer

MutualFormer: Multi-modal Representation Learning via Cross-Diffusion Attention

CentralNet: A Multilayer Approach for Multimodal Fusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

SFusion: Self-attention Based N-to-One Multimodal Fusion Block

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

MDA-ViT: Multimodal image fusion using dual attention vision transformer

MutualFormer: Multi-modal Representation Learning via Cross-Diffusion Attention

CentralNet: A Multilayer Approach for Multimodal Fusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation