Abstract
While text-to-image diffusion models demonstrate impressive generation capabilities, they also exhibit vulnerability to backdoor attacks, which involve the manipulation of model outputs through malicious triggers. In this paper, for the first time, we propose a comprehensive defense method named T2IShield to detect, localize, and mitigate such attacks. Specifically, we find the “Assimilation Phenomenon” on the cross-attention maps caused by the backdoor trigger. Based on this key insight, we propose two effective backdoor detection methods: Frobenius Norm Threshold Truncation and Covariance Discriminant Analysis. Besides, we introduce a binary-search approach to localize the trigger within a backdoor sample and assess the efficacy of existing concept editing methods in mitigating backdoor attacks. Empirical evaluations on two advanced backdoor attack scenarios show the effectiveness of our proposed defense method. For backdoor sample detection, T2IShield achieves a detection F1 score of 88.9\(\%\) with low computational cost. Furthermore, T2IShield achieves a localization F1 score of 86.4\(\%\) and invalidates 99\(\%\) poisoned samples. Codes are released at https://github.com/Robin-WZQ/T2IShield.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Civitai. https://civitai.com
Midjourney. www.midjourney.com
Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1985)
Arad, D., Orgad, H., Belinkov, Y.: ReFACT: updating text-to-image models by editing the text encoder. arXiv preprint arXiv:2306.00738 (2023)
Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM J. Matrix Anal. Appl. 29, 328–347 (2007)
Chen, B., et al.: Detecting backdoor attacks on deep neural networks by activation clustering. arXiv preprint arXiv:1811.03728 (2018)
Chen, H., Fu, C., Zhao, J., Koushanfar, F.: DeepInspect: a black-box trojan detection and mitigation framework for deep neural networks. In: International Joint Conference on Artificial Intelligence (2019)
Chou, S.Y., Chen, P.Y., Ho, T.Y.: VillanDiffusion: a unified backdoor attack framework for diffusion models. arXiv preprint arXiv:2306.06874 (2023)
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 8780–8794. Curran Associates, Inc. (2021)
Doan, K.D., Lao, Y., Zhao, W., Li, P.: LIRA: learnable, imperceptible and robust backdoor attacks. In: 2021 IEEE/CVF International Conference on Computer Vision, pp. 11946–11956 (2021)
Gal, R., et al.: An image is worth one word: personalizing text-to-image generation using textual inversion (2022)
Gandikota, R., Materzynska, J., Fiotto-Kaufman, J., Bau, D.: Erasing concepts from diffusion models. In: 2023 IEEE/CVF International Conference on Computer Vision, pp. 2426–2436 (2023)
Gandikota, R., Materzyńska, J., Fiotto-Kaufman, J., Bau, D.: Erasing concepts from diffusion models. In: Proceedings of the 2023 IEEE International Conference on Computer Vision (2023)
Gandikota, R., Orgad, H., Belinkov, Y., Materzy’nska, J., Bau, D.: Unified concept editing in diffusion models. arXiv preprint arXiv:2308.14761 (2023)
Gandikota, R., Orgad, H., Belinkov, Y., Materzyńska, J., Bau, D.: Unified concept editing in diffusion models. In: IEEE/CVF Winter Conference on Applications of Computer Vision (2024)
Ghosh, A., Fossas, G.: Can there be art without an artist? arXiv preprint arXiv:2209.07667 (2022)
Gu, T., Liu, K., Dolan-Gavitt, B., Garg, S.: BadNets: evaluating backdooring attacks on deep neural networks. IEEE Access 7, 47230–47244 (2019)
Guo, W., Wang, L., Xing, X., Du, M., Song, D.X.: TABOR: a highly accurate approach to inspecting and restoring trojan backdoors in AI systems. arXiv preprint arXiv:1908.01763 (2019)
Heng, A., Soh, H.: Selective amnesia: a continual learning approach to forgetting in deep generative models. In: Advances in Neural Information Processing Systems (2023)
Hertz, A., Mokady, R., Tenenbaum, J.M., Aberman, K., Pritch, Y., Cohen-Or, D.: Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626 (2022)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851. Curran Associates, Inc. (2020)
Ho, J., Salimans, T.: Classifier-free diffusion guidance. In: NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications (2021)
Hu, J.E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Chen, W.: LoRA: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Huang, Y., Guo, Q., Juefei-Xu, F.: Personalization as a shortcut for few-shot backdoor attack against text-to-image diffusion models (2023)
Huang, Z., Wang, R., Shan, S., Li, X., Chen, X.: Log-Euclidean metric learning on symmetric positive definite manifold with application to image set classification. In: International Conference on Machine Learning (2015)
Jiang, Y., Huang, Z., Pan, X., Loy, C.C., Liu, Z.: Talk-to-edit: fine-grained facial editing via dialog. In: 2021 IEEE/CVF International Conference on Computer Vision, pp. 13779–13788 (2021)
Kim, J., Gu, G., Park, M., Park, S.K., Choo, J.: StableVITON: learning semantic correspondence with latent diffusion model for virtual try-on. arXiv preprint arXiv:2312.01725 (2023)
Kim, S., Jung, S., Kim, B., Choi, M., Shin, J., Lee, J.: Towards safe self-distillation of internet-scale text-to-image diffusion models. arXiv preprint arXiv:2307.05977 (2023)
Kumari, N., Zhang, B., Wang, S.Y., Shechtman, E., Zhang, R., Zhu, J.Y.: Ablating concepts in text-to-image diffusion models. In: International Conference on Computer Vision (2023)
Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., Ma, X.: Anti-backdoor learning: training clean models on poisoned data. In: Neural Information Processing Systems (2021)
Li, Y., Li, Y., Wu, B., Li, L., He, R., Lyu, S.: Invisible backdoor attack with sample-specific triggers. In: 2021 IEEE/CVF International Conference on Computer Vision, pp. 16443–16452 (2020)
Lin, H., et al.: CAT: cross attention in vision transformer. In: 2022 IEEE International Conference on Multimedia and Expo, pp. 1–6 (2021)
Liu, K., Dolan-Gavitt, B., Garg, S.: Fine-pruning: defending against backdooring attacks on deep neural networks. arXiv preprint arXiv:1805.12185 (2018)
Liu, Y., Ma, X., Bailey, J., Lu, F.: Reflection backdoor: a natural backdoor attack on deep neural networks. arXiv preprint arXiv:2007.02343 (2020)
Nguyen, A., Tran, A.: Input-aware dynamic backdoor attack. In: Advances in Neural Information Processing Systems, vol. 33, pp. 3454–3464 (2020)
Oquab, M., et al.: DINOv2: learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (ICML) (2021)
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with CLIP latents (2022)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10674–10685 (2021)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. arXiv preprint arXiv:1505.04597 (2015)
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: International Conference on Learning Representations (2021)
Struppek, L., Hintersdorf, D., Kersting, K.: Rickrolling the artist: injecting backdoors into text encoders for text-to-image synthesis, pp. 4561–4573 (2022)
Sui, Y., et al.: DisDet: exploring detectability of backdoor attack on diffusion models. arXiv preprint arXiv:2402.02739 (2024)
Tran, B., Li, J., Madry, A.: Spectral signatures in backdoor attacks. In: Neural Information Processing Systems (2018)
Vaswani, A., et al.: Attention is all you need. In: Neural Information Processing Systems (2017)
Vice, J., Akhtar, N., Hartley, R.I., Mian, A.S.: BAGM: a backdoor attack for manipulating text-to-image generative models. arXiv preprint arXiv:2307.16489 (2023)
Wang, B., et al.: Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 707–723 (2019)
Wang, R., Guo, H., Davis, L.S., Dai, Q.: Covariance discriminative learning: a natural and efficient approach to image set classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2496–2503 (2012)
Wang, Z.J., Montoya, E., Munechika, D., Yang, H., Hoover, B., Chau, D.H.: DiffusionDB: a large-scale prompt gallery dataset for text-to-image generative models. arXiv preprint arXiv:2210.14896 (2022)
Wu, Y., Zhang, J., Kerschbaum, F., Zhang, T.: Backdooring textual inversion for concept censorship. arXiv preprint arXiv:2308.10718 (2023)
Yu, J., et al.: Scaling autoregressive models for content-rich text-to-image generation. Trans. Mach. Learn. Res. (2022)
Zhang, E., Wang, K., Xu, X., Wang, Z., Shi, H.: Forget-me-not: learning to forget in text-to-image diffusion models. arXiv preprint arXiv:2303.17591 (2023)
Zhu, L., et al.: TryOnDiffusion: a tale of two UNets. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4606–4615 (2023)
Acknowledgement
This work is partially supported by National Key R&D Program of China (No. 2021YFC3310100), Strategic Priority Research Program of the Chinese Academy of Sciences (No. XDB0680000), Beijing Nova Program (20230484368), Suzhou Frontier Technology Research Project (No. SYG202325), and Youth Innovation Promotion Association CAS.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Z., Zhang, J., Shan, S., Chen, X. (2025). T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15143. Springer, Cham. https://doi.org/10.1007/978-3-031-73013-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-73013-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73012-2
Online ISBN: 978-3-031-73013-9
eBook Packages: Computer ScienceComputer Science (R0)