Skip to main content

Fine-Grained Multi-modal Fundus Image Generation Based on Diffusion Models for Glaucoma Classification

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14557))

Included in the following conference series:

  • 406 Accesses

Abstract

With the emergence of Foundation Model, the generation quality and generalisation ability of image generation method have been further improved. However, medical image generation is still a challenging and promising task. Recently, diffusion-based models are more prominent in multi-modal image generation for its flexibility. Therefore, in order to solve the problem of lack of high-quality medical images and high annotation costs, we propose a fine-grained multi-modal fundus image generation method based on foundation models to research an efficient way of data augmentation. First, we adopt optic fundus images, fundus vessel images and class textual information to form a weakly supervised fine-tuning dataset. Then, based on the Stable-Diffusion and Control-Net model, we fine-tune our method by LoRA model to generate high-resolution fundus images of special diseases in a targeted manner. Furthermore, we use these synthetic fundus images in conjunction with existing datasets for data augmentation or model fine-tuning to improve performance in the glaucoma classification task. Extensive experiments have shown that our method produces high quality medical fundus images and can be well applied to real-world medical imaging tasks. Moreover, experimental results show that we are able to generate fundus images that act as an augmentation, meaning that the generation of foundation models is effective in certain domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://huggingface.co/runwayml/stable-diffusion-v1-5.

  2. 2.

    https://refuge.grand-challenge.org/.

References

  1. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223. PMLR (2017)

    Google Scholar 

  2. Blattmann, A., et al.: Align your latents: high-resolution video synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 22563–22575, June 2023

    Google Scholar 

  3. Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: International Conference on Learning Representations (2018)

    Google Scholar 

  4. Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020)

    Google Scholar 

  5. Cao, Y., et al.: A comprehensive survey of AI-generated content (AIGC): a history of generative AI from GAN to chatGPT. arXiv abs/2303.04226 (2023)

    Google Scholar 

  6. Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: Advances in Neural Information Processing Systems, vol. 34, pp. 8780–8794 (2021)

    Google Scholar 

  7. Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP. In: International Conference on Learning Representations (2016)

    Google Scholar 

  8. Goodfellow, I., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)

    Article  MathSciNet  Google Scholar 

  9. Guo, J., Pang, Z., Yang, F., Shen, J., Zhang, J.: Study on the method of fundus image generation based on improved GAN. Math. Probl. Eng. 2020, 1–13 (2020)

    Google Scholar 

  10. Hu, E.J., et al.: Lora: low-rank adaptation of large language models. In: International Conference on Learning Representations (2021)

    Google Scholar 

  11. Kawar, B., et al.: Imagic: text-based real image editing with diffusion models. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6007–6017 (2022)

    Google Scholar 

  12. van den Oord, A., Vinyals, O., Kavukcuoglu, K.: Neural discrete representation learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  13. Pandey, K., Mukherjee, A., Rai, P., Kumar, A.: Diffusevae: efficient, controllable and high-fidelity generation from low-dimensional latents. Trans. Mach. Learn. Res. 2022 (2022)

    Google Scholar 

  14. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (2021)

    Google Scholar 

  15. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training (2018)

    Google Scholar 

  16. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)

    Google Scholar 

  17. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents. arXiv abs/2204.06125 (2022)

    Google Scholar 

  18. Ramesh, A., et al.: Zero-shot text-to-image generation. In: International Conference on Machine Learning, pp. 8821–8831. PMLR (2021)

    Google Scholar 

  19. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10674–10685 (2021)

    Google Scholar 

  20. Rombach, R., Blattmann, A., Ommer, B.: Text-guided synthesis of artistic images with retrieval-augmented diffusion models. arXiv abs/2207.13038 (2022)

    Google Scholar 

  21. Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dreambooth: fine tuning text-to-image diffusion models for subject-driven generation. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 22500–22510 (2022)

    Google Scholar 

  22. Shenkut, D., Kumar, B.V.K.V.: Fundus GAN - GAN-based fundus image synthesis for training retinal image classifiers. In: 2022 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 2185–2189 (2022)

    Google Scholar 

  23. Sivaswamy, J., et al.: Drishti-gs: retinal image dataset for optic nerve head(ONH) segmentation. In: 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI), pp. 53–56 (2014)

    Google Scholar 

  24. Wang, Z., Wang, J., Liu, Z., Qiu, Q.: Binary latent diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22576–22585 (2023)

    Google Scholar 

  25. Yi, J., Chen, C., Yang, G.: Retinal artery/vein classification by multi-channel multi-scale fusion network. Appl. Intell. (2023)

    Google Scholar 

  26. Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3836–3847 (2023)

    Google Scholar 

  27. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2242–2251 (2017)

    Google Scholar 

Download references

Acknowledgement

This work is supported by fund for building world-class universities (disciplines) of Renmin University of China. The computer resources were provided by Public Computing Cloud Platform of Renmin University of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gang Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, X. et al. (2024). Fine-Grained Multi-modal Fundus Image Generation Based on Diffusion Models for Glaucoma Classification. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14557. Springer, Cham. https://doi.org/10.1007/978-3-031-53302-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-53302-0_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-53301-3

  • Online ISBN: 978-3-031-53302-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics