Skip to main content

From Deconstruction to Reconstruction: A Plug-In Module for Diffusion-Based Purification of Adversarial Examples

  • Conference paper
  • First Online:
Digital Forensics and Watermarking (IWDW 2023)

Abstract

As the use and reliance on AI technologies continue to proliferate, there is mounting concern regarding adversarial example attacks, emphasizing the pressing necessity for robust defense strategies to protect AI systems from malicious input manipulation. In this paper, we introduce a computationally efficient plug-in module, seamlessly integrable with advanced diffusion models for purifying adversarial examples. Drawing inspiration from the concept of deconstruction and reconstruction (DR), our module decomposes an input image into foundational visual features expected to exhibit robustness against adversarial perturbations and subsequently rebuilds the image using an image-to-image transformation neural network. Through the collaborative integration of the module with an advanced diffusion model, this combination attains state-of-the-art performance in effectively purifying adversarial examples while preserving high classification accuracy on clean image samples. The model performance is evaluated on representative neural network classifiers pre-trained and fine-tuned on large-scale datasets. An ablation study analyses the impact of the proposed plug-in module on enhancing the effectiveness of diffusion-based purification. Furthermore, it is noteworthy that the module demonstrates significant computational efficiency, incurring only minimal computational overhead during the purification process.

E. Bao and C.-C. Chang—These authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.J., Srivastava, M.B., Chang, K.W.: Generating natural language adversarial examples. In: Proceedings of Conference on Empirical Methods Natural Language Processing (EMNLP) (2018)

    Google Scholar 

  2. Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of International Conference on Machine Learning (ICML) (2018)

    Google Scholar 

  3. Brendel, W., Rauber, J., Bethge, M.: Decision-based adversarial attacks: reliable attacks against black-box machine learning models. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  4. Carlini, N., et al.: Hidden voice commands. In: Proceedings of USENIX Security Symposium (USENIX Security) (2016)

    Google Scholar 

  5. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: Proceedings of IEEE Symposium on Security and Privacy (SP) (2017)

    Google Scholar 

  6. Chen, P.Y., Zhang, H., Sharma, Y., Yi, J., Hsieh, C.J.: Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of ACM Workshop Artificial Intellgient Security (AISec) (2017)

    Google Scholar 

  7. Croce, F., et al.: Robustbench: a standardized adversarial robustness benchmark. In: Proceedings of Advance Neural Information Processing System (NeurIPS) (2021)

    Google Scholar 

  8. Dhillon, G.S., et al.: Stochastic activation pruning for robust adversarial defense. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  9. Dong, Y., et al.: Boosting adversarial attacks with momentum. In: Proceedings of IEEE Conference on Computer Vision on Pattern Recognition (CVPR) (2018)

    Google Scholar 

  10. Dong, Y., et al.: Efficient decision-based black-box adversarial attacks on face recognition. In: Proceedings of IEEE Conference on Computer Vision on Pattern Recognition (CVPR) (2019)

    Google Scholar 

  11. Dziugaite, G.K., Ghahramani, Z., Roy, D.M.: A study of the effect of JPG compression on adversarial images. arXiv preprint arXiv:1608.00853 (2016)

  12. Eykholt, K., et al.: Robust physical-world attacks on deep learning visual classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  13. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Proceedings of International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  14. Guo, C., Rana, M., Cisse, M., van der Maaten, L.: Countering adversarial images using input transformations. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  15. Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: Proceedings of International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  16. Hendrycks, D., Gimpel, K.: Early methods for detecting adversarial images. In: Proceedings of International Conference on Learning Representations Workshop (ICLR) (2017)

    Google Scholar 

  17. Ilyas, A., Engstrom, L., Athalye, A., Lin, J.: Black-box adversarial attacks with limited queries and information. In: Proceedings of International Conference on Machine Learning (ICML) (2018)

    Google Scholar 

  18. Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Adversarial examples are not bugs, they are features. In: Proceedings of Advance Neural Information Processing System (NeurIPS) (2019)

    Google Scholar 

  19. Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: Proceedings of International Conference on Learning Representations Workshop (ICLR) (2017)

    Google Scholar 

  20. Liao, F., Liang, M., Dong, Y., Pang, T., Hu, X., Zhu, J.: Defense against adversarial attacks using high-level representation guided denoiser. In: Proceedings of IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2018)

    Google Scholar 

  21. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arxiv:1706.06083 (2017)

  22. Metzen, J.H., Genewein, T., Fischer, V., Bischoff, B.: On detecting adversarial perturbations. In: Proceedings of International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  23. Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  24. Nie, W., Guo, B., Huang, Y., Xiao, C., Vahdat, A., Anandkumar, A.: Diffusion models for adversarial purification. In: Proceedings of International Conference on Machine Learning (ICML) (2022)

    Google Scholar 

  25. Pang, T., Xu, K., Zhu, J.: Mixup inference: better exploiting mixup to defend adversarial attacks. In: Proceedings of International Conference on Learning Representations (ICLR) (2020)

    Google Scholar 

  26. Raghunathan, A., Steinhardt, J., Liang, P.: Certified defenses against adversarial examples. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  27. Rauber, J., Brendel, W., Bethge, M.: Foolbox: a python toolbox to benchmark the robustness of machine learning models. In: Proceedings of International Conference on Machine Learning (ICML) (2017)

    Google Scholar 

  28. Samangouei, P., Kabkab, M., Chellappa, R.: Defense-GAN: protecting classifiers against adversarial attacks using generative models. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  29. Shafahi, A., et al.: Adversarial training for free! In: Proceedings of Advance Neural Information Processing System (NeurIPS) (2019)

    Google Scholar 

  30. Sharif, M., Bhagavatula, S., Bauer, L., Reiter, M.K.: Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In: Proceedings of ACM SIGSAC Conference on Computer Communication Security (CCS) (2016)

    Google Scholar 

  31. Sharif, M., Bhagavatula, S., Bauer, L., Reiter, M.K.: A general framework for adversarial examples with objectives. ACM Trans. Priv. Secur. (TOPS) 22(3), 1–30 (2019)

    Article  Google Scholar 

  32. Sinha, A., Namkoong, H., Duchi, J.: Certifying some distributional robustness with principled adversarial training. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  33. Song, Y., Kim, T., Nowozin, S., Ermon, S., Kushman, N.: Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  34. Szegedy, C., et al.: Intriguing properties of neural networks. In: Proceedings of International Conference on Learning Representations (ICLR) (2014)

    Google Scholar 

  35. Uesato, J., O’Donoghue, B., van den Oord, A., Kohli, P.: Adversarial risk and the dangers of evaluating against weak attacks. In: Proceedings of International Conference on Machine Learning (ICML) (2018)

    Google Scholar 

  36. Wong, E., Kolter, Z.: Provable defenses against adversarial examples via the convex outer adversarial polytope. In: Proceedings of International Conference on Machine Learning (ICML) (2018)

    Google Scholar 

  37. Wong, E., Schmidt, F., Metzen, J.H., Kolter, J.Z.: Scaling provable adversarial defenses. In: Proceedings of Advance Neural Information Processing System (NeurIPS) (2018)

    Google Scholar 

  38. Xiao, C., Li, B., Zhu, J.Y., He, W., Liu, M., Song, D.: Generating adversarial examples with adversarial networks. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI) (2018)

    Google Scholar 

  39. Xie, C., Wang, J., Zhang, Z., Ren, Z., Yuille, A.: Mitigating adversarial effects through randomization. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  40. Xie, C., et al.: Improving transferability of adversarial examples with input diversity. In: Proceedings of IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2019)

    Google Scholar 

  41. Xu, W., Evans, D., Qi, Y.: Feature squeezing: detecting adversarial examples in deep neural networks. In: Proceedings of Network Distribution System on Security Symposium (NDSS) (2018)

    Google Scholar 

  42. Yuan, X., He, P., Zhu, Q., Li, X.: Adversarial examples: attacks and defenses for deep learning. IEEE Trans. Neural Netw. Learn. Syst. 30, 2805–2824 (2019)

    Article  MathSciNet  Google Scholar 

  43. Zhang, D., Zhang, T., Lu, Y., Zhu, Z., Dong, B.: You only propagate once: accelerating adversarial training via maximal principle. In: Proceedings of Advances Neural Information Processing System (NeurIPS) (2019)

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by JSPS KAKENHI Grants JP18H04120, JP20K23355, JP21H04907, and JP21K18023, and by JST CREST Grants JPMJCR18A6 and JPMJCR20D3, Japan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erjin Bao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bao, E., Chang, CC., Nguyen, H.H., Echizen, I. (2024). From Deconstruction to Reconstruction: A Plug-In Module for Diffusion-Based Purification of Adversarial Examples. In: Ma, B., Li, J., Li, Q. (eds) Digital Forensics and Watermarking. IWDW 2023. Lecture Notes in Computer Science, vol 14511. Springer, Singapore. https://doi.org/10.1007/978-981-97-2585-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-2585-4_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-2584-7

  • Online ISBN: 978-981-97-2585-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics