Skip to main content

ViPer: Visual Personalization of Generative Models via Individual Preference Learning

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

Different users find different images generated for the same prompt desirable. This gives rise to personalized image generation which involves creating images aligned with an individual’s visual preference. Current generative models are, however, tuned to produce outputs that appeal to a broad audience are unpersonalized. Using them to generate images aligned with individual users relies on iterative manual prompt engineering by the user which is inefficient and undesirable.

We propose to personalize the image generation process by, first, capturing the generic preferences of the user in a one-time process by inviting them to comment on a small selection of images, explaining why they like or dislike each. Based on these comments, we infer a user’s structured liked and disliked visual attributes, i.e., their visual preference, using a large language model. These attributes are used to guide a text-to-image model toward producing images that are tuned towards the individual user’s visual preference. Through a series of user studies and large language model guided evaluations, we demonstrate that the proposed method results in generations that are well aligned with individual users’ visual preferences. Our code and model weights are open sourced at https://viper.epfl.ch.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    openai.com.

  2. 2.

    claude.ai.

References

  1. Brooks, T., Holynski, A., Efros, A.A.: InstructPix2Pix: learning to follow image editing instructions (2023)

    Google Scholar 

  2. Chung, J., Hyun, S., Heo, J.P.: Style injection in diffusion: a training-free approach for adapting large-scale diffusion models for style transfer (2023)

    Google Scholar 

  3. Clark, K., Vicol, P., Swersky, K., Fleet, D.J.: Directly fine-tuning diffusion models on differentiable rewards (2023)

    Google Scholar 

  4. Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis (2021)

    Google Scholar 

  5. Gal, R., et al.: An image is worth one word: personalizing text-to-image generation using textual inversion (2022)

    Google Scholar 

  6. Hao, Y., Chi, Z., Dong, L., Wei, F.: Optimizing prompts for text-to-image generation (2023)

    Google Scholar 

  7. He, F., Li, G., Zhang, M., Yan, L., Si, L., Li, F.: FreeStyle: free lunch for text-guided style transfer using diffusion models (2024)

    Google Scholar 

  8. He, R., et al.: Is synthetic data from generative models ready for image recognition? arXiv preprint arXiv:2210.07574 (2022)

  9. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models (2020)

    Google Scholar 

  10. Ho, J., Salimans, T.: Classifier-free diffusion guidance (2022)

    Google Scholar 

  11. Kawar, B., et al.: Imagic: text-based real image editing with diffusion models (2023)

    Google Scholar 

  12. Kirstain, Y., Polyak, A., Singer, U., Matiana, S., Penna, J., Levy, O.: Pick-a-Pic: an open dataset of user preferences for text-to-image generation (2023)

    Google Scholar 

  13. Laurençon, H., et al.: OBELICS: an open web-scale filtered dataset of interleaved image-text documents (2023)

    Google Scholar 

  14. Laurençon, H., Tronchon, L., Cord, M., Sanh, V.: What matters when building vision-language models? (2024)

    Google Scholar 

  15. Li, W., et al.: UPainting: unified text-to-image diffusion generation with cross-modal guidance (2022)

    Google Scholar 

  16. Liu, N., Li, S., Du, Y., Torralba, A., Tenenbaum, J.B.: Compositional visual generation with composable diffusion models. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13677, pp. 423–439. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-19790-1_26

    Chapter  MATH  Google Scholar 

  17. Luo, E., Hao, M., Wei, L., Zhang, X.: scDiffusion: conditional generation of high-quality single-cell data using diffusion model (2024)

    Google Scholar 

  18. Mokady, R., Hertz, A., Aberman, K., Pritch, Y., Cohen-Or, D.: Null-text inversion for editing real images using guided diffusion models (2022)

    Google Scholar 

  19. OpenAI, et al.: GPT-4 technical report (2024)

    Google Scholar 

  20. Parmar, G., Singh, K.K., Zhang, R., Li, Y., Lu, J., Zhu, J.Y.: Zero-shot image-to-image translation (2023)

    Google Scholar 

  21. Podell, D., et al.: SDXL: improving latent diffusion models for high-resolution image synthesis (2023)

    Google Scholar 

  22. Prabhudesai, M., Goyal, A., Pathak, D., Fragkiadaki, K.: Aligning text-to-image diffusion models with reward backpropagation (2023)

    Google Scholar 

  23. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models (2022)

    Google Scholar 

  24. Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation (2023)

    Google Scholar 

  25. Ruta, D., Tarrés, G.C., Gilbert, A., Shechtman, E., Kolkin, N., Collomosse, J.: DIFF-NST: diffusion interleaving for deformable neural style transfer (2023)

    Google Scholar 

  26. von Rütte, D., Fedele, E., Thomm, J., Wolf, L.: FABRIC: personalizing diffusion models with iterative feedback (2023)

    Google Scholar 

  27. Sarıyıldız, M.B., Alahari, K., Larlus, D., Kalantidis, Y.: Fake it till you make it: learning transferable representations from synthetic ImageNet clones. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8011–8021 (2023)

    Google Scholar 

  28. Schuhmann, C., Beaumont, R.: Laion-aesthetics. https://laion.ai/blog/laion-aesthetics

  29. Sohn, K., et al.: StyleDrop: text-to-image generation in any style (2023)

    Google Scholar 

  30. Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations (2021)

    Google Scholar 

  31. Tang, Z., Rybin, D., Chang, T.H.: Zeroth-order optimization meets human feedback: provable learning via ranking oracles (2024)

    Google Scholar 

  32. Wallace, B., et al.: Diffusion model alignment using direct preference optimization (2023)

    Google Scholar 

  33. Wen, Y., Jain, N., Kirchenbauer, J., Goldblum, M., Geiping, J., Goldstein, T.: Hard prompts made easy: gradient-based discrete optimization for prompt tuning and discovery (2023)

    Google Scholar 

  34. Wu, X., Sun, K., Zhu, F., Zhao, R., Li, H.: Human preference score: Better aligning text-to-image models with human preference (2023)

    Google Scholar 

  35. Xu, J., et al.: ImageReward: learning and evaluating human preferences for text-to-image generation (2023)

    Google Scholar 

  36. Yeo, T., et al.: Controlled training data generation with diffusion models. arXiv preprint arXiv:2403.15309 (2024)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sogand Salehi .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 86102 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Salehi, S., Shafiei, M., Yeo, T., Bachmann, R., Zamir, A. (2025). ViPer: Visual Personalization of Generative Models via Individual Preference Learning. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15132. Springer, Cham. https://doi.org/10.1007/978-3-031-72904-1_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72904-1_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72903-4

  • Online ISBN: 978-3-031-72904-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics