Skip to main content

Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15146))

Included in the following conference series:

  • 258 Accesses

Abstract

Motivated by ethical and legal concerns, the scientific community is actively developing methods to limit the misuse of Text-to-Image diffusion models for reproducing copyrighted, violent, explicit, or personal information in the generated images. Simultaneously, researchers put these newly developed safety measures to the test by assuming the role of an adversary to find vulnerabilities and backdoors in them. We use the compositional property of diffusion models, which allows us to leverage multiple prompts in a single image generation. This property allows us to combine other concepts that should not have been affected by the inhibition to reconstruct the vector responsible for target concept generation, even though the direct computation of this vector is no longer accessible. We provide theoretical and empirical evidence of why the proposed attacks are possible and discuss the implications of these findings for safe model deployment. We argue that it is essential to consider all possible approaches to image generation with diffusion models that can be employed by an adversary. Our work opens up the discussion about the implications of concept arithmetics and compositional inference for safety mechanisms in diffusion models.

Content Advisory: This paper contains discussions and model-generated content that may be considered offensive. Reader discretion is advised.

Project page: https://cs-people.bu.edu/vpetsiuk/arc

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://docs.midjourney.com/docs/multi-prompts,

    https://platform.stability.ai/docs/features/multi-prompting.

  2. 2.

    Throughout, we imply that the string is embedded using CLIP [24] textual encoder before being passed to \(\epsilon \).

References

  1. Tutorial: How to remove the safety filter in 5 seconds. https://www.reddit.com/r/StableDiffusion/comments/wv2nw0/tutorial_how_to_remove_the_safety_filter_in_5/

  2. Birhane, A., Prabhu, V.U., Kahembwe, E.: Multimodal datasets: misogyny, pornography, and malignant stereotypes. arXiv preprint arXiv:2110.01963 (2021)

  3. Brack, M., Schramowski, P., Friedrich, F., Hintersdorf, D., Kersting, K.: The stable artist: Steering semantics in diffusion latent space (2022)

    Google Scholar 

  4. Chin, Z.Y., Jiang, C.M., Huang, C.C., Chen, P.Y., Chiu, W.C.: Prompting4debugging: Red-teaming text-to-image diffusion models by finding problematic prompts. arXiv preprint arXiv:2309.06135 (2023)

  5. Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. Adv. Neural. Inf. Process. Syst. 34, 8780–8794 (2021)

    Google Scholar 

  6. Fernandez, P., Couairon, G., Jégou, H., Douze, M., Furon, T.: The stable signature: Rooting watermarks in latent diffusion models. arXiv preprint arXiv:2303.15435 (2023)

  7. Gandikota, R., Materzynska, J., Fiotto-Kaufman, J., Bau, D.: Erasing concepts from diffusion models (2023)

    Google Scholar 

  8. Gandikota, R., Orgad, H., Belinkov, Y., Materzyńska, J., Bau, D.: Unified concept editing in diffusion models. IEEE/CVF Winter Conference on Applications of Computer Vision (2024)

    Google Scholar 

  9. Harris, D.: Deepfakes: false pornography is here and the law cannot protect you. Duke Law Technol. Rev. 17(1), 99–127 (2019). https://scholarship.law.duke.edu/dltr/vol17/iss1/4

  10. Heng, A., Soh, H.: Selective amnesia: a continual learning approach to forgetting in deep generative models. Advances in Neural Information Processing Systems 36 (2024)

    Google Scholar 

  11. Hessel, J., Holtzman, A., Forbes, M., Le Bras, R., Choi, Y.: Clipscore: a reference-free evaluation metric for image captioning. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 7514–7528 (2021)

    Google Scholar 

  12. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840–6851 (2020)

    Google Scholar 

  13. Ho, J., Salimans, T.: Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022)

  14. Howard, J., Gugger, S.: fastai: A layered api for deep learning. Inf. 11, 108 (2020) https://api.semanticscholar.org/CorpusID:211082837

  15. Jiang, Z., Zhang, J., Gong, N.Z.: Evading watermark based detection of ai-generated content. In: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pp. 1168–1181 (2023)

    Google Scholar 

  16. Kumari, N., Zhang, B., Wang, S.Y., Shechtman, E., Zhang, R., Zhu, J.Y.: Ablating concepts in text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22691–22702 (2023)

    Google Scholar 

  17. Liu, N., Li, S., Du, Y., Torralba, A., Tenenbaum, J.B.: Compositional visual generation with composable diffusion models. In: European Conference on Computer Vision, pp. 423–439. Springer (2022)

    Google Scholar 

  18. Luccioni, A.S., Akiki, C., Mitchell, M., Jernite, Y.: Stable bias: analyzing societal representations in diffusion models. arXiv preprint arXiv:2303.11408 (2023)

  19. Myhand, T.: Once the Jury Sees It, the Jury Can’t Unsee It: The Challenge Trial Judges Face When Authenticating Video Evidence in the Age of Deepfakes. preprint (2022). https://doi.org/10.2139/ssrn.4270735. https://papers.ssrn.com/abstract=4270735

  20. Naik, R., Nushi, B.: Social biases through the text-to-image generation lens. In: Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, pp. 786–808 (2023)

    Google Scholar 

  21. Nichol, A.Q., Dhariwal, P.: Improved denoising diffusion probabilistic models. In: International Conference on Machine Learning, pp. 8162–8171. PMLR (2021)

    Google Scholar 

  22. OpenAI: Chatgpt (2022). https://openai.com/blog/chatgpt

  23. Praneeth, B., brett koonce, Ayinmehr, A.: bedapudi6788/nudenet: place for checkpoint files, December 2019. https://doi.org/10.5281/zenodo.3584720

  24. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  25. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022)

  26. Rando, J., Paleka, D., Lindner, D., Heim, L., Tramèr, F.: Red-teaming the stable diffusion safety filter. arXiv preprint arXiv:2210.04610 (2022)

  27. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)

    Google Scholar 

  28. Roose, K.: An A.I.-Generated Picture Won an Art Prize. Artists Aren’t Happy. https://www.nytimes.com/2022/09/02/technology/ai-artificial-intelligence-artists.html

  29. Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. Adv. Neural. Inf. Process. Syst. 35, 36479–36494 (2022)

    Google Scholar 

  30. Schramowski, P., Brack, M., Deiseroth, B., Kersting, K.: Safe latent diffusion: mitigating inappropriate degeneration in diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22522–22531 (2023)

    Google Scholar 

  31. Shan, S., Cryan, J., Wenger, E., Zheng, H., Hanocka, R., Zhao, B.Y.: Glaze: protecting artists from style mimicry by Text-to-Image models. In: 32nd USENIX Security Symposium (USENIX Security 23), pp. 2187–2204 (2023)

    Google Scholar 

  32. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning, pp. 2256–2265. PMLR (2015)

    Google Scholar 

  33. Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems 32 (2019)

    Google Scholar 

  34. Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020)

  35. Tsai, Y.L., et al.: Ring-a-bell! how reliable are concept removal methods for diffusion models? In: International Conference on Learning Representations (2024)

    Google Scholar 

  36. Van Le, T., Phung, H., Nguyen, T.H., Dao, Q., Tran, N.N., Tran, A.: Anti-dreambooth: protecting users from personalized text-to-image synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2116–2127 (2023)

    Google Scholar 

  37. Wang, H., Shen, Q., Tong, Y., Zhang, Y., Kawaguchi, K.: The stronger the diffusion model, the easier the backdoor: data poisoning to induce copyright breaches without adjusting finetuning pipeline. arXiv preprint arXiv:2401.04136 (2024)

  38. Wen, Y., Kirchenbauer, J., Geiping, J., Goldstein, T.: Tree-ring watermarks: Fingerprints for diffusion images that are invisible and robust (2023)

    Google Scholar 

  39. Yang, Y., Gao, R., Wang, X., Ho, T.Y., Xu, N., Xu, Q.: Mma-diffusion: multimodal attack on diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7737–7746 (2024)

    Google Scholar 

  40. Zhang, G., Wang, K., Xu, X., Wang, Z., Shi, H.: Forget-me-not: learning to forget in text-to-image diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1755–1764 (2024)

    Google Scholar 

  41. Zhao, Y., Pang, T., Du, C., Yang, X., Cheung, N.M., Lin, M.: A recipe for watermarking diffusion models. arXiv preprint arXiv:2303.10137 (2023)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vitali Petsiuk .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 34244 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Petsiuk, V., Saenko, K. (2025). Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15146. Springer, Cham. https://doi.org/10.1007/978-3-031-73223-2_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73223-2_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73222-5

  • Online ISBN: 978-3-031-73223-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics