Skip to main content

Eta Inversion: Designing an Optimal Eta Function for Diffusion-Based Real Image Editing

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15072))

Included in the following conference series:

  • 179 Accesses

Abstract

Diffusion models have achieved remarkable success in the domain of text-guided image generation and, more recently, in text-guided image editing. A commonly adopted strategy for editing real images involves inverting the diffusion process to obtain a noisy representation of the original image, which is then denoised to achieve the desired edits. However, current methods for diffusion inversion often struggle to produce edits that are both faithful to the specified text prompt and closely resemble the source image. To overcome these limitations, we introduce a novel and adaptable diffusion inversion technique for real image editing, which is grounded in a theoretical analysis of the role of \(\eta \) in the DDIM sampling equation for enhanced editability. By designing a universal diffusion inversion method with a time- and region-dependent \(\eta \) function, we enable flexible control over the editing extent. Through a comprehensive series of quantitative and qualitative assessments, involving a comparison with a broad array of recent methods, we demonstrate the superiority of our approach. Our method not only sets a new benchmark in the field but also significantly outperforms existing strategies.

W. Kang and K. Galim—Authors contribute equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Brooks, T., Holynski, A., Efros, A.A.: Instructpix2pix: learning to follow image editing instructions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18392–18402 (2023)

    Google Scholar 

  2. Cao, M., Wang, X., Qi, Z., Shan, Y., Qie, X., Zheng, Y.: Masactrl: tuning-free mutual self-attention control for consistent image synthesis and editing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22560–22570 (2023)

    Google Scholar 

  3. Cao, Y., Chen, J., Luo, Y., Zhou, X.: Exploring the optimal choice for generative processes in diffusion models: ordinary vs stochastic differential equations. In: Thirty-Seventh Conference on Neural Information Processing Systems (2023)

    Google Scholar 

  4. Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  5. Chang, H., et al.: Muse: text-to-image generation via masked generative transformers. In: International Conference on Machine Learning, pp. 4055–4075. PMLR (2023)

    Google Scholar 

  6. Chang, H., Zhang, H., Jiang, L., Liu, C., Freeman, W.T.: Maskgit: masked generative image transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11315–11325 (2022)

    Google Scholar 

  7. Chefer, H., Alaluf, Y., Vinker, Y., Wolf, L., Cohen-Or, D.: Attend-and-excite: attention-based semantic guidance for text-to-image diffusion models. ACM Trans. Graph. (TOG) 42(4), 1–10 (2023)

    Article  Google Scholar 

  8. Couairon, G., Verbeek, J., Schwenk, H., Cord, M.: Diffedit: diffusion-based semantic image editing with mask guidance. In: The Eleventh International Conference on Learning Representations (2023)

    Google Scholar 

  9. Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. Adv. Neural. Inf. Process. Syst. 34, 8780–8794 (2021)

    Google Scholar 

  10. Ge, Y., et al.: Making LLaMA SEE and draw with SEED tokenizer. In: The Twelfth International Conference on Learning Representations (2024)

    Google Scholar 

  11. Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27 (2014)

    Google Scholar 

  12. Han, L., et al.: Proxedit: improving tuning-free real image editing with proximal guidance. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 4291–4301 (2024)

    Google Scholar 

  13. Hertz, A., Mokady, R., Tenenbaum, J., Aberman, K., Pritch, Y., Cohen-or, D.: Prompt-to-prompt image editing with cross-attention control. In: The Eleventh International Conference on Learning Representations (2023)

    Google Scholar 

  14. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840–6851 (2020)

    Google Scholar 

  15. Ho, J., Salimans, T.: Classifier-free diffusion guidance. In: NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications (2021)

    Google Scholar 

  16. Huberman-Spiegelglas, I., Kulikov, V., Michaeli, T.: An edit friendly ddpm noise space: inversion and manipulations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12469–12478 (2024)

    Google Scholar 

  17. Ju, X., Zeng, A., Bian, Y., Liu, S., Xu, Q.: Direct inversion: boosting diffusion-based editing with 3 lines of code. arXiv preprint arXiv:2310.01506 (2023)

  18. Kang, M., et al.: Scaling up gans for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10124–10134 (2023)

    Google Scholar 

  19. Li, J., Li, D., Xiong, C., Hoi, S.: Blip: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: International Conference on Machine Learning, pp. 12888–12900. PMLR (2022)

    Google Scholar 

  20. Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning. In: Thirty-Seventh Conference on Neural Information Processing Systems (2023)

    Google Scholar 

  21. Lu, C., Zheng, K., Bao, F., Chen, J., Li, C., Zhu, J.: Maximum likelihood training for score-based diffusion odes by high order denoising score matching. In: International Conference on Machine Learning, pp. 14429–14460. PMLR (2022)

    Google Scholar 

  22. Meng, C., et al.: SDEdit: guided image synthesis and editing with stochastic differential equations. In: International Conference on Learning Representations (2022)

    Google Scholar 

  23. Miyake, D., Iohara, A., Saito, Y., Tanaka, T.: Negative-prompt inversion: fast image inversion for editing with text-guided diffusion models. arXiv preprint arXiv:2305.16807 (2023)

  24. Mokady, R., Hertz, A., Aberman, K., Pritch, Y., Cohen-Or, D.: Null-text inversion for editing real images using guided diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6038–6047 (2023)

    Google Scholar 

  25. Mou, C., Wang, X., Song, J., Shan, Y., Zhang, J.: Diffeditor: boosting accuracy and flexibility on diffusion-based image editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8488–8497 (2024)

    Google Scholar 

  26. Nichol, A.Q., et al.: Glide: towards photorealistic image generation and editing with text-guided diffusion models. In: International Conference on Machine Learning, pp. 16784–16804. PMLR (2022)

    Google Scholar 

  27. Nie, S., Guo, H.A., Lu, C., Zhou, Y., Zheng, C., Li, C.: The blessing of randomness: SDE beats ODE in general diffusion-based image editing. In: The Twelfth International Conference on Learning Representations (2024)

    Google Scholar 

  28. OpenAI: Gpt-4 technical report (2023)

    Google Scholar 

  29. Parmar, G., Kumar Singh, K., Zhang, R., Li, Y., Lu, J., Zhu, J.Y.: Zero-shot image-to-image translation. In: ACM SIGGRAPH 2023 Conference Proceedings, pp. 1–11 (2023)

    Google Scholar 

  30. von Platen, P., et al.: Diffusers: state-of-the-art diffusion models (2022). https://github.com/huggingface/diffusers

  31. Podell, D., et al.: SDXL: improving latent diffusion models for high-resolution image synthesis. In: The Twelfth International Conference on Learning Representations (2024)

    Google Scholar 

  32. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  33. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents, 1(2), 3. arXiv preprint arXiv:2204.06125 (2022)

  34. Ramesh, A., et al.: Zero-shot text-to-image generation. In: International Conference on Machine Learning, pp. 8821–8831. PMLR (2021)

    Google Scholar 

  35. Richardson, E., et al.: Encoding in style: a stylegan encoder for image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2287–2296 (2021)

    Google Scholar 

  36. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)

    Google Scholar 

  37. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation (2015)

    Google Scholar 

  38. Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. Adv. Neural. Inf. Process. Syst. 35, 36479–36494 (2022)

    Google Scholar 

  39. Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: International Conference on Learning Representations (2021)

    Google Scholar 

  40. Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. In: International Conference on Learning Representations (2021)

    Google Scholar 

  41. Tumanyan, N., Geyer, M., Bagon, S., Dekel, T.: Plug-and-play diffusion features for text-driven image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1921–1930 (2023)

    Google Scholar 

  42. Wallace, B., Gokul, A., Naik, N.: Edict: exact diffusion inversion via coupled transformations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22532–22541 (2023)

    Google Scholar 

  43. Wu, C.H., De la Torre, F.: Unifying diffusion models’ latent space, with applications to cyclediffusion and guidance. arXiv preprint arXiv:2210.05559 (2022)

  44. Xia, W., et al.: Gan inversion: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 3121–3138 (2022)

    Google Scholar 

  45. Yue, Z., Wang, J., Sun, Q., Ji, L., Chang, E.I.C., Zhang, H.: Exploring diffusion time-steps for unsupervised representation learning. In: The Twelfth International Conference on Learning Representations (2024)

    Google Scholar 

  46. Zhang, Q., Chen, Y.: Fast sampling of diffusion models with exponential integrator. In: The Eleventh International Conference on Learning Representations (2023)

    Google Scholar 

  47. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hyung Il Koo .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 23957 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kang, W., Galim, K., Koo, H.I. (2025). Eta Inversion: Designing an Optimal Eta Function for Diffusion-Based Real Image Editing. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15072. Springer, Cham. https://doi.org/10.1007/978-3-031-72630-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72630-9_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72629-3

  • Online ISBN: 978-3-031-72630-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics