Skip to main content

T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

While text-to-image diffusion models demonstrate impressive generation capabilities, they also exhibit vulnerability to backdoor attacks, which involve the manipulation of model outputs through malicious triggers. In this paper, for the first time, we propose a comprehensive defense method named T2IShield to detect, localize, and mitigate such attacks. Specifically, we find the “Assimilation Phenomenon” on the cross-attention maps caused by the backdoor trigger. Based on this key insight, we propose two effective backdoor detection methods: Frobenius Norm Threshold Truncation and Covariance Discriminant Analysis. Besides, we introduce a binary-search approach to localize the trigger within a backdoor sample and assess the efficacy of existing concept editing methods in mitigating backdoor attacks. Empirical evaluations on two advanced backdoor attack scenarios show the effectiveness of our proposed defense method. For backdoor sample detection, T2IShield achieves a detection F1 score of 88.9\(\%\) with low computational cost. Furthermore, T2IShield achieves a localization F1 score of 86.4\(\%\) and invalidates 99\(\%\) poisoned samples. Codes are released at https://github.com/Robin-WZQ/T2IShield.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Civitai. https://civitai.com

  2. Midjourney. www.midjourney.com

  3. Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1985)

    Book  Google Scholar 

  4. Arad, D., Orgad, H., Belinkov, Y.: ReFACT: updating text-to-image models by editing the text encoder. arXiv preprint arXiv:2306.00738 (2023)

  5. Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM J. Matrix Anal. Appl. 29, 328–347 (2007)

    Article  MathSciNet  Google Scholar 

  6. Chen, B., et al.: Detecting backdoor attacks on deep neural networks by activation clustering. arXiv preprint arXiv:1811.03728 (2018)

  7. Chen, H., Fu, C., Zhao, J., Koushanfar, F.: DeepInspect: a black-box trojan detection and mitigation framework for deep neural networks. In: International Joint Conference on Artificial Intelligence (2019)

    Google Scholar 

  8. Chou, S.Y., Chen, P.Y., Ho, T.Y.: VillanDiffusion: a unified backdoor attack framework for diffusion models. arXiv preprint arXiv:2306.06874 (2023)

  9. Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 8780–8794. Curran Associates, Inc. (2021)

    Google Scholar 

  10. Doan, K.D., Lao, Y., Zhao, W., Li, P.: LIRA: learnable, imperceptible and robust backdoor attacks. In: 2021 IEEE/CVF International Conference on Computer Vision, pp. 11946–11956 (2021)

    Google Scholar 

  11. Gal, R., et al.: An image is worth one word: personalizing text-to-image generation using textual inversion (2022)

    Google Scholar 

  12. Gandikota, R., Materzynska, J., Fiotto-Kaufman, J., Bau, D.: Erasing concepts from diffusion models. In: 2023 IEEE/CVF International Conference on Computer Vision, pp. 2426–2436 (2023)

    Google Scholar 

  13. Gandikota, R., Materzyńska, J., Fiotto-Kaufman, J., Bau, D.: Erasing concepts from diffusion models. In: Proceedings of the 2023 IEEE International Conference on Computer Vision (2023)

    Google Scholar 

  14. Gandikota, R., Orgad, H., Belinkov, Y., Materzy’nska, J., Bau, D.: Unified concept editing in diffusion models. arXiv preprint arXiv:2308.14761 (2023)

  15. Gandikota, R., Orgad, H., Belinkov, Y., Materzyńska, J., Bau, D.: Unified concept editing in diffusion models. In: IEEE/CVF Winter Conference on Applications of Computer Vision (2024)

    Google Scholar 

  16. Ghosh, A., Fossas, G.: Can there be art without an artist? arXiv preprint arXiv:2209.07667 (2022)

  17. Gu, T., Liu, K., Dolan-Gavitt, B., Garg, S.: BadNets: evaluating backdooring attacks on deep neural networks. IEEE Access 7, 47230–47244 (2019)

    Article  Google Scholar 

  18. Guo, W., Wang, L., Xing, X., Du, M., Song, D.X.: TABOR: a highly accurate approach to inspecting and restoring trojan backdoors in AI systems. arXiv preprint arXiv:1908.01763 (2019)

  19. Heng, A., Soh, H.: Selective amnesia: a continual learning approach to forgetting in deep generative models. In: Advances in Neural Information Processing Systems (2023)

    Google Scholar 

  20. Hertz, A., Mokady, R., Tenenbaum, J.M., Aberman, K., Pritch, Y., Cohen-Or, D.: Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626 (2022)

  21. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851. Curran Associates, Inc. (2020)

    Google Scholar 

  22. Ho, J., Salimans, T.: Classifier-free diffusion guidance. In: NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications (2021)

    Google Scholar 

  23. Hu, J.E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Chen, W.: LoRA: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)

  24. Huang, Y., Guo, Q., Juefei-Xu, F.: Personalization as a shortcut for few-shot backdoor attack against text-to-image diffusion models (2023)

    Google Scholar 

  25. Huang, Z., Wang, R., Shan, S., Li, X., Chen, X.: Log-Euclidean metric learning on symmetric positive definite manifold with application to image set classification. In: International Conference on Machine Learning (2015)

    Google Scholar 

  26. Jiang, Y., Huang, Z., Pan, X., Loy, C.C., Liu, Z.: Talk-to-edit: fine-grained facial editing via dialog. In: 2021 IEEE/CVF International Conference on Computer Vision, pp. 13779–13788 (2021)

    Google Scholar 

  27. Kim, J., Gu, G., Park, M., Park, S.K., Choo, J.: StableVITON: learning semantic correspondence with latent diffusion model for virtual try-on. arXiv preprint arXiv:2312.01725 (2023)

  28. Kim, S., Jung, S., Kim, B., Choi, M., Shin, J., Lee, J.: Towards safe self-distillation of internet-scale text-to-image diffusion models. arXiv preprint arXiv:2307.05977 (2023)

  29. Kumari, N., Zhang, B., Wang, S.Y., Shechtman, E., Zhang, R., Zhu, J.Y.: Ablating concepts in text-to-image diffusion models. In: International Conference on Computer Vision (2023)

    Google Scholar 

  30. Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., Ma, X.: Anti-backdoor learning: training clean models on poisoned data. In: Neural Information Processing Systems (2021)

    Google Scholar 

  31. Li, Y., Li, Y., Wu, B., Li, L., He, R., Lyu, S.: Invisible backdoor attack with sample-specific triggers. In: 2021 IEEE/CVF International Conference on Computer Vision, pp. 16443–16452 (2020)

    Google Scholar 

  32. Lin, H., et al.: CAT: cross attention in vision transformer. In: 2022 IEEE International Conference on Multimedia and Expo, pp. 1–6 (2021)

    Google Scholar 

  33. Liu, K., Dolan-Gavitt, B., Garg, S.: Fine-pruning: defending against backdooring attacks on deep neural networks. arXiv preprint arXiv:1805.12185 (2018)

  34. Liu, Y., Ma, X., Bailey, J., Lu, F.: Reflection backdoor: a natural backdoor attack on deep neural networks. arXiv preprint arXiv:2007.02343 (2020)

  35. Nguyen, A., Tran, A.: Input-aware dynamic backdoor attack. In: Advances in Neural Information Processing Systems, vol. 33, pp. 3454–3464 (2020)

    Google Scholar 

  36. Oquab, M., et al.: DINOv2: learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)

  37. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (ICML) (2021)

    Google Scholar 

  38. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with CLIP latents (2022)

    Google Scholar 

  39. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10674–10685 (2021)

    Google Scholar 

  40. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. arXiv preprint arXiv:1505.04597 (2015)

  41. Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

    Google Scholar 

  42. Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: International Conference on Learning Representations (2021)

    Google Scholar 

  43. Struppek, L., Hintersdorf, D., Kersting, K.: Rickrolling the artist: injecting backdoors into text encoders for text-to-image synthesis, pp. 4561–4573 (2022)

    Google Scholar 

  44. Sui, Y., et al.: DisDet: exploring detectability of backdoor attack on diffusion models. arXiv preprint arXiv:2402.02739 (2024)

  45. Tran, B., Li, J., Madry, A.: Spectral signatures in backdoor attacks. In: Neural Information Processing Systems (2018)

    Google Scholar 

  46. Vaswani, A., et al.: Attention is all you need. In: Neural Information Processing Systems (2017)

    Google Scholar 

  47. Vice, J., Akhtar, N., Hartley, R.I., Mian, A.S.: BAGM: a backdoor attack for manipulating text-to-image generative models. arXiv preprint arXiv:2307.16489 (2023)

  48. Wang, B., et al.: Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 707–723 (2019)

    Google Scholar 

  49. Wang, R., Guo, H., Davis, L.S., Dai, Q.: Covariance discriminative learning: a natural and efficient approach to image set classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2496–2503 (2012)

    Google Scholar 

  50. Wang, Z.J., Montoya, E., Munechika, D., Yang, H., Hoover, B., Chau, D.H.: DiffusionDB: a large-scale prompt gallery dataset for text-to-image generative models. arXiv preprint arXiv:2210.14896 (2022)

  51. Wu, Y., Zhang, J., Kerschbaum, F., Zhang, T.: Backdooring textual inversion for concept censorship. arXiv preprint arXiv:2308.10718 (2023)

  52. Yu, J., et al.: Scaling autoregressive models for content-rich text-to-image generation. Trans. Mach. Learn. Res. (2022)

    Google Scholar 

  53. Zhang, E., Wang, K., Xu, X., Wang, Z., Shi, H.: Forget-me-not: learning to forget in text-to-image diffusion models. arXiv preprint arXiv:2303.17591 (2023)

  54. Zhu, L., et al.: TryOnDiffusion: a tale of two UNets. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4606–4615 (2023)

    Google Scholar 

Download references

Acknowledgement

This work is partially supported by National Key R&D Program of China (No. 2021YFC3310100), Strategic Priority Research Program of the Chinese Academy of Sciences (No. XDB0680000), Beijing Nova Program (20230484368), Suzhou Frontier Technology Research Project (No. SYG202325), and Youth Innovation Promotion Association CAS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Zhang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3085 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Z., Zhang, J., Shan, S., Chen, X. (2025). T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15143. Springer, Cham. https://doi.org/10.1007/978-3-031-73013-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73013-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73012-2

  • Online ISBN: 978-3-031-73013-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics