Skip to main content

Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15142))

Included in the following conference series:

  • 367 Accesses

Abstract

In the field of Few-Shot Image Generation (FSIG) using Deep Generative Models (DGMs), accurately estimating the distribution of target domain with minimal samples poses a significant challenge. This requires a method that can both capture the broad diversity and the true characteristics of the target domain distribution. We present Conditional Relaxing Diffusion Inversion (CRDI), an innovative ‘training-free’ approach designed to enhance distribution diversity in synthetic image generation. Distinct from conventional methods, CRDI does not rely on fine-tuning based on only a few samples. Instead, it focuses on reconstructing each target image instance and expanding diversity through a few-shot learning. The approach initiates by identifying a Sample-wise Guidance Embedding (SGE) for the diffusion model, which serves a purpose analogous to the explicit latent codes in certain Generative Adversarial Network (GAN) models. Subsequently, the method involves a scheduler that progressively introduces perturbations to the SGE, thereby augmenting diversity. Comprehensive experiments demonstrate that our method outperforms GAN-based reconstruction techniques and achieves comparable performance to state-of-the-art (SOTA) FSIG methods. Additionally, it effectively mitigates overfitting and catastrophic forgetting, common drawbacks of fine-tuning approaches. Code is available at GitHub.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN: how to embed images into the StyleGAN latent space? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4432–4441 (2019)

    Google Scholar 

  2. Abdollahzadeh, M., Malekzadeh, T., Teo, C.T., Chandrasegaran, K., Liu, G., Cheung, N.M.: A survey on generative modeling with limited data, few shots, and zero shot. arXiv preprint arXiv:2307.14397 (2023)

  3. Anderson, B.D.: Reverse-time diffusion equation models. Stochast. Process. Appl. 12(3), 313–326 (1982)

    Article  MathSciNet  Google Scholar 

  4. Baranchuk, D., Rubachev, I., Voynov, A., Khrulkov, V., Babenko, A.: Label-efficient semantic segmentation with diffusion models. arXiv preprint arXiv:2112.03126 (2021)

  5. Bartunov, S., Vetrov, D.: Few-shot generative modelling with generative matching networks. In: International Conference on Artificial Intelligence and Statistics, pp. 670–678. PMLR (2018)

    Google Scholar 

  6. Clouâtre, L., Demers, M.: FIGR: few-shot image generation with reptile. arXiv preprint arXiv:1901.02199 (2019)

  7. Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. Adv. Neural. Inf. Process. Syst. 34, 8780–8794 (2021)

    Google Scholar 

  8. Efron, B.: Tweedie’s formula and selection bias. J. Am. Stat. Assoc. 106(496), 1602–1614 (2011)

    Article  MathSciNet  Google Scholar 

  9. Gal, R., et al.: An image is worth one word: personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618 (2022)

  10. Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27 (2014)

    Google Scholar 

  11. Haussmann, U.G., Pardoux, E.: Time reversal of diffusions. Ann. Probab. 1188–1205 (1986)

    Google Scholar 

  12. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  13. Higgins, I., et al.: Beta-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations (2016)

    Google Scholar 

  14. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840–6851 (2020)

    Google Scholar 

  15. Hu, E.J., et al.: LoRA: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)

  16. Hu, T., et al.: Phasic content fusing diffusion model with directional distribution consistency for few-shot model adaption. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2406–2415 (2023)

    Google Scholar 

  17. Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. Adv. Neural. Inf. Process. Syst. 33, 12104–12114 (2020)

    Google Scholar 

  18. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)

    Google Scholar 

  19. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF Conference on Computer Vision And Pattern Recognition, pp. 8110–8119 (2020)

    Google Scholar 

  20. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)

  21. Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)

    Article  MathSciNet  Google Scholar 

  22. Kong, Z., Ping, W., Huang, J., Zhao, K., Catanzaro, B.: DiffWave: a versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761 (2020)

  23. Kwon, M., Jeong, J., Uh, Y.: Diffusion models already have a semantic latent space. arXiv preprint arXiv:2210.10960 (2022)

  24. Li, C.L., Zaheer, M., Zhang, Y., Poczos, B., Salakhutdinov, R.: Point cloud GAN. arXiv preprint arXiv:1810.05795 (2018)

  25. Li, Y., Zhang, R., Lu, J., Shechtman, E.: Few-shot image generation with elastic weight consolidation. arXiv preprint arXiv:2012.02780 (2020)

  26. Ly, A., Marsman, M., Verhagen, J., Grasman, R.P., Wagenmakers, E.J.: A tutorial on fisher information. J. Math. Psychol. 80, 40–55 (2017)

    Article  MathSciNet  Google Scholar 

  27. Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)

    Google Scholar 

  28. Meng, C., et al.: SDEdit: guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021)

  29. Mo, S., Cho, M., Shin, J.: Freeze the discriminator: a simple baseline for fine-tuning GANs. arXiv preprint arXiv:2002.10964 (2020)

  30. Mondal, A.K., Tiwary, P., Singla, P., Prathosh, A.: Few-shot cross-domain image generation via inference-time latent-code learning. In: The Eleventh International Conference on Learning Representations (2022)

    Google Scholar 

  31. Noguchi, A., Harada, T.: Image generation from small datasets via batch statistics adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2750–2758 (2019)

    Google Scholar 

  32. Ojha, U., et al.: Few-shot image generation via cross-domain correspondence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10743–10752 (2021)

    Google Scholar 

  33. Oord, A.V.D., et al.: WaveNet: a generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)

  34. Pandey, K., Mukherjee, A., Rai, P., Kumar, A.: DiffuseVAE: efficient, controllable and high-fidelity generation from low-dimensional latents. arXiv preprint arXiv:2201.00308 (2022)

  35. Preechakul, K., Chatthee, N., Wizadwongsa, S., Suwajanakorn, S.: Diffusion autoencoders: toward a meaningful and decodable representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10619–10629 (2022)

    Google Scholar 

  36. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  37. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)

  38. Ramesh, A., et al.: Zero-shot text-to-image generation. In: International Conference on Machine Learning, pp. 8821–8831. PMLR (2021)

    Google Scholar 

  39. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)

    Google Scholar 

  40. Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22500–22510 (2023)

    Google Scholar 

  41. Sadat, S., Buhmann, J., Bradely, D., Hilliges, O., Weber, R.M.: CADS: unleashing the diversity of diffusion models through condition-annealed sampling. arXiv preprint arXiv:2310.17347 (2023)

  42. Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. Adv. Neural. Inf. Process. Syst. 35, 36479–36494 (2022)

    Google Scholar 

  43. Saito, M., Matsumoto, E., Saito, S.: Temporal generative adversarial nets with singular value clipping. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2830–2839 (2017)

    Google Scholar 

  44. Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)

  45. Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020)

  46. Song, Y., Keller, A., Sebe, N., Welling, M.: Flow factorized representation learning. Adv. Neural. Inf. Process. Syst. 36 (2024)

    Google Scholar 

  47. Stein, C.M.: Estimation of the mean of a multivariate normal distribution. Ann. Stat. 1135–1151 (1981)

    Google Scholar 

  48. Taigman, Y., Polyak, A., Wolf, L.: Unsupervised cross-domain image generation. arXiv preprint arXiv:1611.02200 (2016)

  49. Wallace, B., Gokul, A., Naik, N.: EDICT: exact diffusion inversion via coupled transformations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22532–22541 (2023)

    Google Scholar 

  50. Wang, X., Tang, X.: Face photo-sketch synthesis and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(11), 1955–1967 (2008)

    Article  Google Scholar 

  51. Wang, Y., Gonzalez-Garcia, A., Berga, D., Herranz, L., Khan, F.S., Weijer, J.V.D.: MineGAN: effective knowledge transfer from GANs to target domains with few images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9332–9341 (2020)

    Google Scholar 

  52. Wang, Y., Wu, C., Herranz, L., Van de Weijer, J., Gonzalez-Garcia, A., Raducanu, B.: Transferring gans: generating images from limited data. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 218–234 (2018)

    Google Scholar 

  53. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  54. Xiang, W., Yang, H., Huang, D., Wang, Y.: Denoising diffusion autoencoders are unified self-supervised learners. arXiv preprint arXiv:2303.09769 (2023)

  55. Xiao, J., Li, L., Wang, C., Zha, Z.J., Huang, Q.: Few shot generative model adaption via relaxed spatial structural alignment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11204–11213 (2022)

    Google Scholar 

  56. Yang, G., Huang, X., Hao, Z., Liu, M.Y., Belongie, S., Hariharan, B.: PointFlow: 3D point cloud generation with continuous normalizing flows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4541–4550 (2019)

    Google Scholar 

  57. Yang, Y., Wang, R., Qian, Z., Zhu, Y., Wu, Y.: Diffusion in diffusion: cyclic one-way diffusion for text-vision-conditioned generation. arXiv preprint arXiv:2306.08247 (2023)

  58. Yaniv, J., Newman, Y., Shamir, A.: The face of art: landmark detection and geometric style in portraits. ACM Trans. Graph. (TOG) 38(4), 1–15 (2019)

    Article  Google Scholar 

  59. Yu, F., Zhang, Y., Song, S., Seff, A., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)

  60. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)

    Google Scholar 

  61. Zhao, Y., Chandrasegaran, K., Abdollahzadeh, M., Cheung, N.M.M.: Few-shot image generation via adaptation-aware kernel modulation. Adv. Neural. Inf. Process. Syst. 35, 19427–19440 (2022)

    Google Scholar 

  62. Zhao, Y., et al.: Exploring incompatible knowledge transfer in few-shot image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7380–7391 (2023)

    Google Scholar 

  63. Zhu, J., Ma, H., Chen, J., Yuan, J.: Few-shot image generation with diffusion models. arXiv preprint arXiv:2211.03264 (2022)

  64. Zhu, Y., Wu, Y., Deng, Z., Russakovsky, O., Yan, Y.: Boundary guided mixing trajectory for semantic control with diffusion models. arXiv preprint arXiv:2302.08357 (2023)

Download references

Acknowledgment

This work was partially supported by Veritone, Adobe, and has utilized Queen Mary’s Apocrita HPC facility from QMUL Research-IT.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Cao .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 29159 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cao, Y., Gong, S. (2025). Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15142. Springer, Cham. https://doi.org/10.1007/978-3-031-72907-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72907-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72906-5

  • Online ISBN: 978-3-031-72907-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics