Abstract
Different users find different images generated for the same prompt desirable. This gives rise to personalized image generation which involves creating images aligned with an individual’s visual preference. Current generative models are, however, tuned to produce outputs that appeal to a broad audience are unpersonalized. Using them to generate images aligned with individual users relies on iterative manual prompt engineering by the user which is inefficient and undesirable.
We propose to personalize the image generation process by, first, capturing the generic preferences of the user in a one-time process by inviting them to comment on a small selection of images, explaining why they like or dislike each. Based on these comments, we infer a user’s structured liked and disliked visual attributes, i.e., their visual preference, using a large language model. These attributes are used to guide a text-to-image model toward producing images that are tuned towards the individual user’s visual preference. Through a series of user studies and large language model guided evaluations, we demonstrate that the proposed method results in generations that are well aligned with individual users’ visual preferences. Our code and model weights are open sourced at https://viper.epfl.ch.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
References
Brooks, T., Holynski, A., Efros, A.A.: InstructPix2Pix: learning to follow image editing instructions (2023)
Chung, J., Hyun, S., Heo, J.P.: Style injection in diffusion: a training-free approach for adapting large-scale diffusion models for style transfer (2023)
Clark, K., Vicol, P., Swersky, K., Fleet, D.J.: Directly fine-tuning diffusion models on differentiable rewards (2023)
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis (2021)
Gal, R., et al.: An image is worth one word: personalizing text-to-image generation using textual inversion (2022)
Hao, Y., Chi, Z., Dong, L., Wei, F.: Optimizing prompts for text-to-image generation (2023)
He, F., Li, G., Zhang, M., Yan, L., Si, L., Li, F.: FreeStyle: free lunch for text-guided style transfer using diffusion models (2024)
He, R., et al.: Is synthetic data from generative models ready for image recognition? arXiv preprint arXiv:2210.07574 (2022)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models (2020)
Ho, J., Salimans, T.: Classifier-free diffusion guidance (2022)
Kawar, B., et al.: Imagic: text-based real image editing with diffusion models (2023)
Kirstain, Y., Polyak, A., Singer, U., Matiana, S., Penna, J., Levy, O.: Pick-a-Pic: an open dataset of user preferences for text-to-image generation (2023)
Laurençon, H., et al.: OBELICS: an open web-scale filtered dataset of interleaved image-text documents (2023)
Laurençon, H., Tronchon, L., Cord, M., Sanh, V.: What matters when building vision-language models? (2024)
Li, W., et al.: UPainting: unified text-to-image diffusion generation with cross-modal guidance (2022)
Liu, N., Li, S., Du, Y., Torralba, A., Tenenbaum, J.B.: Compositional visual generation with composable diffusion models. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13677, pp. 423–439. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-19790-1_26
Luo, E., Hao, M., Wei, L., Zhang, X.: scDiffusion: conditional generation of high-quality single-cell data using diffusion model (2024)
Mokady, R., Hertz, A., Aberman, K., Pritch, Y., Cohen-Or, D.: Null-text inversion for editing real images using guided diffusion models (2022)
OpenAI, et al.: GPT-4 technical report (2024)
Parmar, G., Singh, K.K., Zhang, R., Li, Y., Lu, J., Zhu, J.Y.: Zero-shot image-to-image translation (2023)
Podell, D., et al.: SDXL: improving latent diffusion models for high-resolution image synthesis (2023)
Prabhudesai, M., Goyal, A., Pathak, D., Fragkiadaki, K.: Aligning text-to-image diffusion models with reward backpropagation (2023)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models (2022)
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation (2023)
Ruta, D., Tarrés, G.C., Gilbert, A., Shechtman, E., Kolkin, N., Collomosse, J.: DIFF-NST: diffusion interleaving for deformable neural style transfer (2023)
von Rütte, D., Fedele, E., Thomm, J., Wolf, L.: FABRIC: personalizing diffusion models with iterative feedback (2023)
Sarıyıldız, M.B., Alahari, K., Larlus, D., Kalantidis, Y.: Fake it till you make it: learning transferable representations from synthetic ImageNet clones. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8011–8021 (2023)
Schuhmann, C., Beaumont, R.: Laion-aesthetics. https://laion.ai/blog/laion-aesthetics
Sohn, K., et al.: StyleDrop: text-to-image generation in any style (2023)
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations (2021)
Tang, Z., Rybin, D., Chang, T.H.: Zeroth-order optimization meets human feedback: provable learning via ranking oracles (2024)
Wallace, B., et al.: Diffusion model alignment using direct preference optimization (2023)
Wen, Y., Jain, N., Kirchenbauer, J., Goldblum, M., Geiping, J., Goldstein, T.: Hard prompts made easy: gradient-based discrete optimization for prompt tuning and discovery (2023)
Wu, X., Sun, K., Zhu, F., Zhao, R., Li, H.: Human preference score: Better aligning text-to-image models with human preference (2023)
Xu, J., et al.: ImageReward: learning and evaluating human preferences for text-to-image generation (2023)
Yeo, T., et al.: Controlled training data generation with diffusion models. arXiv preprint arXiv:2403.15309 (2024)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Salehi, S., Shafiei, M., Yeo, T., Bachmann, R., Zamir, A. (2025). ViPer: Visual Personalization of Generative Models via Individual Preference Learning. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15132. Springer, Cham. https://doi.org/10.1007/978-3-031-72904-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-72904-1_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72903-4
Online ISBN: 978-3-031-72904-1
eBook Packages: Computer ScienceComputer Science (R0)