CAR-MFL: Cross-Modal Augmentation by Retrieval for Multimodal Federated Learning with Missing Modalities

Poudel, Pranav; Shrestha, Prashant; Amgain, Sanskar; Shrestha, Yash Raj; Gyawali, Prashnna; Bhattarai, Binod

doi:10.1007/978-3-031-72117-5_10

Pranav Poudel^15,18,
Prashant Shrestha¹⁵,
Sanskar Amgain¹⁵,
Yash Raj Shrestha¹⁶,
Prashnna Gyawali¹⁷ &
…
Binod Bhattarai¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15010))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

1493 Accesses

Abstract

Multimodal AI has demonstrated superior performance over unimodal approaches by leveraging diverse data sources for more comprehensive analysis. However, applying this effectiveness in healthcare is challenging due to the limited availability of public datasets. Federated learning presents an exciting solution, allowing the use of extensive databases from hospitals and health centers without centralizing sensitive data, thus maintaining privacy and security. Yet, research in multimodal federated learning, particularly in scenarios with missing modalities-a common issue in healthcare datasets-remains scarce, highlighting a critical area for future exploration. Toward this, we propose a novel method for multimodal federated learning with missing modalities. Our contribution lies in a novel cross-modal data augmentation by retrieval, leveraging the small publicly available dataset to fill the missing modalities in the clients. Our method learns the parameters in a federated manner, ensuring privacy protection and improving performance in multiple challenging multimodal benchmarks in the medical domain, surpassing several competitive baselines. Code Available: https://github.com/bhattarailab/CAR-MFL.

P. Shrestha and S. Amgain—Equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Multimodal Contrastive Federated Learning for Digital Healthcare

Article 02 September 2023

Federated Learning on Multimodal Data: A Comprehensive Survey

Article 01 June 2023

A survey of multimodal federated learning: background, applications, and perspectives

Article 29 July 2024

References

Acosta, J.N., Falcone, G.J., Rajpurkar, P., Topol, E.J.: Multimodal biomedical AI. Nat. Med. 28(9), 1773–1784 (2022)
Article Google Scholar
Chen, J., Pan, R.: Medical report generation based on multimodal federated learning. Comput. Med. Imaging Graph. 113, 102342 (2024)
Article Google Scholar
Chen, Y., Liu, C., Huang, W., Cheng, S., Arcucci, R., Xiong, Z.: Generative text-guided 3D vision-language pretraining for unified medical image segmentation. arXiv preprint arXiv:2306.04811 (2023)
Chen, Z., Diao, S., Wang, B., Li, G., Wan, X.: Towards unifying medical vision-and-language pre-training via soft prompts. arXiv preprint arXiv:2302.08958 (2023)
Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inform. Assoc. 23(2), 304–310 (2016)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Gross, R., Airoldi, E., Malin, B., Sweeney, L.: Integrating utility into face de-identification. In: Danezis, G., Martin, D. (eds.) PET 2005. LNCS, vol. 3856, pp. 227–242. Springer, Heidelberg (2006). https://doi.org/10.1007/11767831_15
Chapter Google Scholar
Hao, W., et al.: Towards fair federated learning with zero-shot data augmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3310–3319 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Irvin, J., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 590–597 (2019)
Google Scholar
Johnson, A.E., et al.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6(1), 317 (2019)
Google Scholar
Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S., Stich, S., Suresh, A.T.: Scaffold: stochastic controlled averaging for federated learning. In: International Conference on Machine Learning, pp. 5132–5143. PMLR (2020)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lau, K., Adler, J., Sjölund, J.: A unified representation network for segmentation with missing modalities. arXiv preprint arXiv:1908.06683 (2019)
Le, H.Q., Thwal, C.M., Qiao, Y., Tun, Y.L., Nguyen, M.N., Hong, C.S.: Cross-modal prototype based multimodal federated learning under severely missing modality. arXiv preprint arXiv:2401.13898 (2024)
Lee, H., et al.: Unified chest X-ray and radiology report generation model with multi-view chest X-rays. arXiv preprint arXiv:2302.12172 (2023)
McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282. PMLR (2017)
Google Scholar
Moon, J.H., Lee, H., Shin, W., Kim, Y.H., Choi, E.: Multi-modal understanding and generation for medical images and text via vision-language pre-training. IEEE J. Biomed. Health Inform. 26(12), 6070–6080 (2022)
Article Google Scholar
Qayyum, A., Ahmad, K., Ahsan, M.A., Al-Fuqaha, A., Qadir, J.: Collaborative federated learning for healthcare: multi-modal Covid-19 diagnosis at the edge. IEEE Open J. Comput. Soc. 3, 172–184 (2022)
Article Google Scholar
Sachin, D., Annappa, B., Ambasange, S., Tony, A.E.: A multimodal contrastive federated learning for digital healthcare. SN Comput. Sci. 4(5), 674 (2023)
Article Google Scholar
Seibold, C., Reiß, S., Sarfraz, M.S., Stiefelhagen, R., Kleesiek, J.: Breaking with fixed set pathology recognition through report-guided contrastive training. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. LNCS, vol. 13435, pp. 690–700. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16443-9_66
Shrestha, P., Amgain, S., Khanal, B., Linte, C.A., Bhattarai, B.: Medical vision language pretraining: a survey. arXiv preprint arXiv:2312.06224 (2023)
Thrasher, J., et al.: Multimodal federated learning in healthcare: a review. arXiv preprint arXiv:2310.09650 (2023)
van Tulder, G., de Bruijne, M.: Learning cross-modality representations from multi-modal images. IEEE Trans. Med. Imaging 38(2), 638–648 (2018)
Article Google Scholar
Venugopalan, J., Tong, L., Hassanzadeh, H.R., Wang, M.D.: Multimodal deep learning models for early detection of Alzheimer’s disease stage. Sci. Rep. 11(1), 3254 (2021)
Article Google Scholar
Wang, M., et al.: Federated uncertainty-aware aggregation for fundus diabetic retinopathy staging. arXiv preprint arXiv:2303.13033 (2023)
Yan, Y., Feng, C.M., Li, Y., Goh, R.S.M., Zhu, L.: Federated pseudo modality generation for incomplete multi-modal MRI reconstruction. arXiv preprint arXiv:2308.10910 (2023)
You, K., et al.: CXR-CLIP: toward large scale chest X-ray language-image pre-training. In: Greenspan, H., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. LNCS, vol. 14221, pp. 101–111. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43895-0_10
Yu, Q., Liu, Y., Wang, Y., Xu, K., Liu, J.: Multimodal federated learning via contrastive representation ensemble. arXiv preprint arXiv:2302.08888 (2023)
Zheng, T., Li, A., Chen, Z., Wang, H., Luo, J.: AutoFed: heterogeneity-aware federated multimodal learning for robust autonomous driving. arXiv preprint arXiv:2302.08646 (2023)
Zhou, Q., Zheng, G.: FedContrast-GPA: heterogeneous federated optimization via local contrastive learning and global process-aware aggregation. In: Greenspan, H., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. LNCS, vol. 14221, pp. 660–670. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43895-0_62

Download references

Acknowledgments

This project is supported by the University of Aberdeen Startup grant CF10834-10.

Author information

Authors and Affiliations

University of Aberdeen, Aberdeen, UK
Binod Bhattarai
NepAl Applied Mathematics and Informatics Institute for Research, Lalitpur, Nepal
Pranav Poudel, Prashant Shrestha & Sanskar Amgain
University of Lausanne, Lausanne, Switzerland
Yash Raj Shrestha
West Virginia University, Morgantown, USA
Prashnna Gyawali
Fogsphere (Redev.AI), London, UK
Pranav Poudel

Authors

Pranav Poudel
View author publications
You can also search for this author in PubMed Google Scholar
Prashant Shrestha
View author publications
You can also search for this author in PubMed Google Scholar
Sanskar Amgain
View author publications
You can also search for this author in PubMed Google Scholar
Yash Raj Shrestha
View author publications
You can also search for this author in PubMed Google Scholar
Prashnna Gyawali
View author publications
You can also search for this author in PubMed Google Scholar
Binod Bhattarai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Binod Bhattarai .

Editor information

Editors and Affiliations

Children’s National Hospital/George Washington University, Washington, DC, USA
Marius George Linguraru
The Chinese University of Hong Kong, Hong Kong, China
Qi Dou
Technical University of Denmark, Kgs Lyngby, Denmark
Aasa Feragen
Imperial College London, London, UK
Stamatia Giannarou
Imperial College London, London, UK
Ben Glocker
Universitat de Barcelona, Barcelona, Spain
Karim Lekadir
Helmholtz Munich, Technical University of Munich and King’s College London, Munich, Germany
Julia A. Schnabel

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4841 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Poudel, P., Shrestha, P., Amgain, S., Shrestha, Y.R., Gyawali, P., Bhattarai, B. (2024). CAR-MFL: Cross-Modal Augmentation by Retrieval for Multimodal Federated Learning with Missing Modalities. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15010. Springer, Cham. https://doi.org/10.1007/978-3-031-72117-5_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-72117-5_10
Published: 03 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72116-8
Online ISBN: 978-3-031-72117-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

CAR-MFL: Cross-Modal Augmentation by Retrieval for Multimodal Federated Learning with Missing Modalities