Skip to main content

CAR-MFL: Cross-Modal Augmentation by Retrieval for Multimodal Federated Learning with Missing Modalities

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 (MICCAI 2024)

Abstract

Multimodal AI has demonstrated superior performance over unimodal approaches by leveraging diverse data sources for more comprehensive analysis. However, applying this effectiveness in healthcare is challenging due to the limited availability of public datasets. Federated learning presents an exciting solution, allowing the use of extensive databases from hospitals and health centers without centralizing sensitive data, thus maintaining privacy and security. Yet, research in multimodal federated learning, particularly in scenarios with missing modalities-a common issue in healthcare datasets-remains scarce, highlighting a critical area for future exploration. Toward this, we propose a novel method for multimodal federated learning with missing modalities. Our contribution lies in a novel cross-modal data augmentation by retrieval, leveraging the small publicly available dataset to fill the missing modalities in the clients. Our method learns the parameters in a federated manner, ensuring privacy protection and improving performance in multiple challenging multimodal benchmarks in the medical domain, surpassing several competitive baselines. Code Available: https://github.com/bhattarailab/CAR-MFL.

P. Shrestha and S. Amgain—Equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Acosta, J.N., Falcone, G.J., Rajpurkar, P., Topol, E.J.: Multimodal biomedical AI. Nat. Med. 28(9), 1773–1784 (2022)

    Article  Google Scholar 

  2. Chen, J., Pan, R.: Medical report generation based on multimodal federated learning. Comput. Med. Imaging Graph. 113, 102342 (2024)

    Article  Google Scholar 

  3. Chen, Y., Liu, C., Huang, W., Cheng, S., Arcucci, R., Xiong, Z.: Generative text-guided 3D vision-language pretraining for unified medical image segmentation. arXiv preprint arXiv:2306.04811 (2023)

  4. Chen, Z., Diao, S., Wang, B., Li, G., Wan, X.: Towards unifying medical vision-and-language pre-training via soft prompts. arXiv preprint arXiv:2302.08958 (2023)

  5. Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inform. Assoc. 23(2), 304–310 (2016)

    Article  Google Scholar 

  6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  7. Gross, R., Airoldi, E., Malin, B., Sweeney, L.: Integrating utility into face de-identification. In: Danezis, G., Martin, D. (eds.) PET 2005. LNCS, vol. 3856, pp. 227–242. Springer, Heidelberg (2006). https://doi.org/10.1007/11767831_15

    Chapter  Google Scholar 

  8. Hao, W., et al.: Towards fair federated learning with zero-shot data augmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3310–3319 (2021)

    Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  10. Irvin, J., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 590–597 (2019)

    Google Scholar 

  11. Johnson, A.E., et al.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6(1), 317 (2019)

    Google Scholar 

  12. Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S., Stich, S., Suresh, A.T.: Scaffold: stochastic controlled averaging for federated learning. In: International Conference on Machine Learning, pp. 5132–5143. PMLR (2020)

    Google Scholar 

  13. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  14. Lau, K., Adler, J., Sjölund, J.: A unified representation network for segmentation with missing modalities. arXiv preprint arXiv:1908.06683 (2019)

  15. Le, H.Q., Thwal, C.M., Qiao, Y., Tun, Y.L., Nguyen, M.N., Hong, C.S.: Cross-modal prototype based multimodal federated learning under severely missing modality. arXiv preprint arXiv:2401.13898 (2024)

  16. Lee, H., et al.: Unified chest X-ray and radiology report generation model with multi-view chest X-rays. arXiv preprint arXiv:2302.12172 (2023)

  17. McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282. PMLR (2017)

    Google Scholar 

  18. Moon, J.H., Lee, H., Shin, W., Kim, Y.H., Choi, E.: Multi-modal understanding and generation for medical images and text via vision-language pre-training. IEEE J. Biomed. Health Inform. 26(12), 6070–6080 (2022)

    Article  Google Scholar 

  19. Qayyum, A., Ahmad, K., Ahsan, M.A., Al-Fuqaha, A., Qadir, J.: Collaborative federated learning for healthcare: multi-modal Covid-19 diagnosis at the edge. IEEE Open J. Comput. Soc. 3, 172–184 (2022)

    Article  Google Scholar 

  20. Sachin, D., Annappa, B., Ambasange, S., Tony, A.E.: A multimodal contrastive federated learning for digital healthcare. SN Comput. Sci. 4(5), 674 (2023)

    Article  Google Scholar 

  21. Seibold, C., Reiß, S., Sarfraz, M.S., Stiefelhagen, R., Kleesiek, J.: Breaking with fixed set pathology recognition through report-guided contrastive training. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. LNCS, vol. 13435, pp. 690–700. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16443-9_66

  22. Shrestha, P., Amgain, S., Khanal, B., Linte, C.A., Bhattarai, B.: Medical vision language pretraining: a survey. arXiv preprint arXiv:2312.06224 (2023)

  23. Thrasher, J., et al.: Multimodal federated learning in healthcare: a review. arXiv preprint arXiv:2310.09650 (2023)

  24. van Tulder, G., de Bruijne, M.: Learning cross-modality representations from multi-modal images. IEEE Trans. Med. Imaging 38(2), 638–648 (2018)

    Article  Google Scholar 

  25. Venugopalan, J., Tong, L., Hassanzadeh, H.R., Wang, M.D.: Multimodal deep learning models for early detection of Alzheimer’s disease stage. Sci. Rep. 11(1), 3254 (2021)

    Article  Google Scholar 

  26. Wang, M., et al.: Federated uncertainty-aware aggregation for fundus diabetic retinopathy staging. arXiv preprint arXiv:2303.13033 (2023)

  27. Yan, Y., Feng, C.M., Li, Y., Goh, R.S.M., Zhu, L.: Federated pseudo modality generation for incomplete multi-modal MRI reconstruction. arXiv preprint arXiv:2308.10910 (2023)

  28. You, K., et al.: CXR-CLIP: toward large scale chest X-ray language-image pre-training. In: Greenspan, H., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. LNCS, vol. 14221, pp. 101–111. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43895-0_10

  29. Yu, Q., Liu, Y., Wang, Y., Xu, K., Liu, J.: Multimodal federated learning via contrastive representation ensemble. arXiv preprint arXiv:2302.08888 (2023)

  30. Zheng, T., Li, A., Chen, Z., Wang, H., Luo, J.: AutoFed: heterogeneity-aware federated multimodal learning for robust autonomous driving. arXiv preprint arXiv:2302.08646 (2023)

  31. Zhou, Q., Zheng, G.: FedContrast-GPA: heterogeneous federated optimization via local contrastive learning and global process-aware aggregation. In: Greenspan, H., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. LNCS, vol. 14221, pp. 660–670. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43895-0_62

Download references

Acknowledgments

This project is supported by the University of Aberdeen Startup grant CF10834-10.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Binod Bhattarai .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4841 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Poudel, P., Shrestha, P., Amgain, S., Shrestha, Y.R., Gyawali, P., Bhattarai, B. (2024). CAR-MFL: Cross-Modal Augmentation by Retrieval for Multimodal Federated Learning with Missing Modalities. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15010. Springer, Cham. https://doi.org/10.1007/978-3-031-72117-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72117-5_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72116-8

  • Online ISBN: 978-3-031-72117-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics