ViPer: Visual Personalization of Generative Models via Individual Preference Learning

Salehi, Sogand; Shafiei, Mahdi; Yeo, Teresa; Bachmann, Roman; Zamir, Amir

doi:10.1007/978-3-031-72904-1_23

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15132))

Included in the following conference series:

European Conference on Computer Vision

278 Accesses

Abstract

Different users find different images generated for the same prompt desirable. This gives rise to personalized image generation which involves creating images aligned with an individual’s visual preference. Current generative models are, however, tuned to produce outputs that appeal to a broad audience are unpersonalized. Using them to generate images aligned with individual users relies on iterative manual prompt engineering by the user which is inefficient and undesirable.

We propose to personalize the image generation process by, first, capturing the generic preferences of the user in a one-time process by inviting them to comment on a small selection of images, explaining why they like or dislike each. Based on these comments, we infer a user’s structured liked and disliked visual attributes, i.e., their visual preference, using a large language model. These attributes are used to guide a text-to-image model toward producing images that are tuned towards the individual user’s visual preference. Through a series of user studies and large language model guided evaluations, we demonstrate that the proposed method results in generations that are well aligned with individual users’ visual preferences. Our code and model weights are open sourced at https://viper.epfl.ch.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Large-Scale Reinforcement Learning for Diffusion Models

ComFusion: Enhancing Personalized Generation by Instance-Scene Compositing and Fusion

HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation

Notes

1.
openai.com.
2.
claude.ai.

References

Brooks, T., Holynski, A., Efros, A.A.: InstructPix2Pix: learning to follow image editing instructions (2023)
Google Scholar
Chung, J., Hyun, S., Heo, J.P.: Style injection in diffusion: a training-free approach for adapting large-scale diffusion models for style transfer (2023)
Google Scholar
Clark, K., Vicol, P., Swersky, K., Fleet, D.J.: Directly fine-tuning diffusion models on differentiable rewards (2023)
Google Scholar
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis (2021)
Google Scholar
Gal, R., et al.: An image is worth one word: personalizing text-to-image generation using textual inversion (2022)
Google Scholar
Hao, Y., Chi, Z., Dong, L., Wei, F.: Optimizing prompts for text-to-image generation (2023)
Google Scholar
He, F., Li, G., Zhang, M., Yan, L., Si, L., Li, F.: FreeStyle: free lunch for text-guided style transfer using diffusion models (2024)
Google Scholar
He, R., et al.: Is synthetic data from generative models ready for image recognition? arXiv preprint arXiv:2210.07574 (2022)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models (2020)
Google Scholar
Ho, J., Salimans, T.: Classifier-free diffusion guidance (2022)
Google Scholar
Kawar, B., et al.: Imagic: text-based real image editing with diffusion models (2023)
Google Scholar
Kirstain, Y., Polyak, A., Singer, U., Matiana, S., Penna, J., Levy, O.: Pick-a-Pic: an open dataset of user preferences for text-to-image generation (2023)
Google Scholar
Laurençon, H., et al.: OBELICS: an open web-scale filtered dataset of interleaved image-text documents (2023)
Google Scholar
Laurençon, H., Tronchon, L., Cord, M., Sanh, V.: What matters when building vision-language models? (2024)
Google Scholar
Li, W., et al.: UPainting: unified text-to-image diffusion generation with cross-modal guidance (2022)
Google Scholar
Liu, N., Li, S., Du, Y., Torralba, A., Tenenbaum, J.B.: Compositional visual generation with composable diffusion models. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13677, pp. 423–439. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-19790-1_26
Chapter MATH Google Scholar
Luo, E., Hao, M., Wei, L., Zhang, X.: scDiffusion: conditional generation of high-quality single-cell data using diffusion model (2024)
Google Scholar
Mokady, R., Hertz, A., Aberman, K., Pritch, Y., Cohen-Or, D.: Null-text inversion for editing real images using guided diffusion models (2022)
Google Scholar
OpenAI, et al.: GPT-4 technical report (2024)
Google Scholar
Parmar, G., Singh, K.K., Zhang, R., Li, Y., Lu, J., Zhu, J.Y.: Zero-shot image-to-image translation (2023)
Google Scholar
Podell, D., et al.: SDXL: improving latent diffusion models for high-resolution image synthesis (2023)
Google Scholar
Prabhudesai, M., Goyal, A., Pathak, D., Fragkiadaki, K.: Aligning text-to-image diffusion models with reward backpropagation (2023)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models (2022)
Google Scholar
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation (2023)
Google Scholar
Ruta, D., Tarrés, G.C., Gilbert, A., Shechtman, E., Kolkin, N., Collomosse, J.: DIFF-NST: diffusion interleaving for deformable neural style transfer (2023)
Google Scholar
von Rütte, D., Fedele, E., Thomm, J., Wolf, L.: FABRIC: personalizing diffusion models with iterative feedback (2023)
Google Scholar
Sarıyıldız, M.B., Alahari, K., Larlus, D., Kalantidis, Y.: Fake it till you make it: learning transferable representations from synthetic ImageNet clones. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8011–8021 (2023)
Google Scholar
Schuhmann, C., Beaumont, R.: Laion-aesthetics. https://laion.ai/blog/laion-aesthetics
Sohn, K., et al.: StyleDrop: text-to-image generation in any style (2023)
Google Scholar
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations (2021)
Google Scholar
Tang, Z., Rybin, D., Chang, T.H.: Zeroth-order optimization meets human feedback: provable learning via ranking oracles (2024)
Google Scholar
Wallace, B., et al.: Diffusion model alignment using direct preference optimization (2023)
Google Scholar
Wen, Y., Jain, N., Kirchenbauer, J., Goldblum, M., Geiping, J., Goldstein, T.: Hard prompts made easy: gradient-based discrete optimization for prompt tuning and discovery (2023)
Google Scholar
Wu, X., Sun, K., Zhu, F., Zhao, R., Li, H.: Human preference score: Better aligning text-to-image models with human preference (2023)
Google Scholar
Xu, J., et al.: ImageReward: learning and evaluating human preferences for text-to-image generation (2023)
Google Scholar
Yeo, T., et al.: Controlled training data generation with diffusion models. arXiv preprint arXiv:2403.15309 (2024)

Download references

Author information

Authors and Affiliations

Swiss Federal Institute of Technology Lausanne (EPFL), Lausanne, Switzerland
Sogand Salehi, Mahdi Shafiei, Teresa Yeo, Roman Bachmann & Amir Zamir

Authors

Sogand Salehi
View author publications
You can also search for this author in PubMed Google Scholar
Mahdi Shafiei
View author publications
You can also search for this author in PubMed Google Scholar
Teresa Yeo
View author publications
You can also search for this author in PubMed Google Scholar
Roman Bachmann
View author publications
You can also search for this author in PubMed Google Scholar
Amir Zamir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sogand Salehi .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 86102 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Salehi, S., Shafiei, M., Yeo, T., Bachmann, R., Zamir, A. (2025). ViPer: Visual Personalization of Generative Models via Individual Preference Learning. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15132. Springer, Cham. https://doi.org/10.1007/978-3-031-72904-1_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-72904-1_23
Published: 21 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72903-4
Online ISBN: 978-3-031-72904-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ViPer: Visual Personalization of Generative Models via Individual Preference Learning