Fine-Grained Multi-modal Fundus Image Generation Based on Diffusion Models for Glaucoma Classification

Liu, Xinyue; Yang, Gang; Zhou, Yang; Yang, Yajie; Huang, Weichen; Ding, Dayong; Wu, Jun

doi:10.1007/978-3-031-53302-0_5

Xinyue Liu¹⁵,
Gang Yang^14,15,
Yang Zhou¹⁶,
Yajie Yang¹⁶,
Weichen Huang¹⁵,
Dayong Ding¹⁶ &
…
Jun Wu¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14557))

Included in the following conference series:

International Conference on Multimedia Modeling

1240 Accesses
1 Citations

Abstract

With the emergence of Foundation Model, the generation quality and generalisation ability of image generation method have been further improved. However, medical image generation is still a challenging and promising task. Recently, diffusion-based models are more prominent in multi-modal image generation for its flexibility. Therefore, in order to solve the problem of lack of high-quality medical images and high annotation costs, we propose a fine-grained multi-modal fundus image generation method based on foundation models to research an efficient way of data augmentation. First, we adopt optic fundus images, fundus vessel images and class textual information to form a weakly supervised fine-tuning dataset. Then, based on the Stable-Diffusion and Control-Net model, we fine-tune our method by LoRA model to generate high-resolution fundus images of special diseases in a targeted manner. Furthermore, we use these synthetic fundus images in conjunction with existing datasets for data augmentation or model fine-tuning to improve performance in the glaucoma classification task. Extensive experiments have shown that our method produces high quality medical fundus images and can be well applied to real-world medical imaging tasks. Moreover, experimental results show that we are able to generate fundus images that act as an augmentation, meaning that the generation of foundation models is effective in certain domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Controllable fundus image generation based on conditional generative adversarial networks with mask guidance

Article 21 October 2023

Towards Generalizable Diabetic Retinopathy Grading in Unseen Domains

FundusGAN: A One-Stage Single Input GAN for Fundus Synthesis

Notes

References

Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223. PMLR (2017)
Google Scholar
Blattmann, A., et al.: Align your latents: high-resolution video synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 22563–22575, June 2023
Google Scholar
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: International Conference on Learning Representations (2018)
Google Scholar
Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020)
Google Scholar
Cao, Y., et al.: A comprehensive survey of AI-generated content (AIGC): a history of generative AI from GAN to chatGPT. arXiv abs/2303.04226 (2023)
Google Scholar
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: Advances in Neural Information Processing Systems, vol. 34, pp. 8780–8794 (2021)
Google Scholar
Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP. In: International Conference on Learning Representations (2016)
Google Scholar
Goodfellow, I., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Article MathSciNet Google Scholar
Guo, J., Pang, Z., Yang, F., Shen, J., Zhang, J.: Study on the method of fundus image generation based on improved GAN. Math. Probl. Eng. 2020, 1–13 (2020)
Google Scholar
Hu, E.J., et al.: Lora: low-rank adaptation of large language models. In: International Conference on Learning Representations (2021)
Google Scholar
Kawar, B., et al.: Imagic: text-based real image editing with diffusion models. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6007–6017 (2022)
Google Scholar
van den Oord, A., Vinyals, O., Kavukcuoglu, K.: Neural discrete representation learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Pandey, K., Mukherjee, A., Rai, P., Kumar, A.: Diffusevae: efficient, controllable and high-fidelity generation from low-dimensional latents. Trans. Mach. Learn. Res. 2022 (2022)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (2021)
Google Scholar
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training (2018)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)
Google Scholar
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents. arXiv abs/2204.06125 (2022)
Google Scholar
Ramesh, A., et al.: Zero-shot text-to-image generation. In: International Conference on Machine Learning, pp. 8821–8831. PMLR (2021)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10674–10685 (2021)
Google Scholar
Rombach, R., Blattmann, A., Ommer, B.: Text-guided synthesis of artistic images with retrieval-augmented diffusion models. arXiv abs/2207.13038 (2022)
Google Scholar
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dreambooth: fine tuning text-to-image diffusion models for subject-driven generation. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 22500–22510 (2022)
Google Scholar
Shenkut, D., Kumar, B.V.K.V.: Fundus GAN - GAN-based fundus image synthesis for training retinal image classifiers. In: 2022 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 2185–2189 (2022)
Google Scholar
Sivaswamy, J., et al.: Drishti-gs: retinal image dataset for optic nerve head(ONH) segmentation. In: 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI), pp. 53–56 (2014)
Google Scholar
Wang, Z., Wang, J., Liu, Z., Qiu, Q.: Binary latent diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22576–22585 (2023)
Google Scholar
Yi, J., Chen, C., Yang, G.: Retinal artery/vein classification by multi-channel multi-scale fusion network. Appl. Intell. (2023)
Google Scholar
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3836–3847 (2023)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2242–2251 (2017)
Google Scholar

Download references

Acknowledgement

This work is supported by fund for building world-class universities (disciplines) of Renmin University of China. The computer resources were provided by Public Computing Cloud Platform of Renmin University of China.

Author information

Authors and Affiliations

MOE Key Lab of DEKE, Renmin University of China, Beijing, China
Gang Yang
School of Information, Renmin University of China, Beijing, China
Xinyue Liu, Gang Yang & Weichen Huang
Vistel Inc., Beijing, China
Yang Zhou, Yajie Yang & Dayong Ding
School of Electronics and Information, Northwestern Polytechnical University, Xi’an, China
Jun Wu

Authors

Xinyue Liu
View author publications
You can also search for this author in PubMed Google Scholar
Gang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yajie Yang
View author publications
You can also search for this author in PubMed Google Scholar
Weichen Huang
View author publications
You can also search for this author in PubMed Google Scholar
Dayong Ding
View author publications
You can also search for this author in PubMed Google Scholar
Jun Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gang Yang .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
Delft University of Technology, Delft, The Netherlands
Alan Hanjalic
Delft University of Technology, Delft, The Netherlands
Cynthia Liem
University of Amsterdam, Amsterdam, The Netherlands
Marcel Worring
Reykjavik University, Reykjavik, Iceland
Björn Þór Jónsson
Microsoft Research Lab – Asia, Beijing, China
Bei Liu
The University of Tokyo, Tokyo, Japan
Yoko Yamakata

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, X. et al. (2024). Fine-Grained Multi-modal Fundus Image Generation Based on Diffusion Models for Glaucoma Classification. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14557. Springer, Cham. https://doi.org/10.1007/978-3-031-53302-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-53302-0_5
Published: 29 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53301-3
Online ISBN: 978-3-031-53302-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fine-Grained Multi-modal Fundus Image Generation Based on Diffusion Models for Glaucoma Classification