From Deconstruction to Reconstruction: A Plug-In Module for Diffusion-Based Purification of Adversarial Examples

Bao, Erjin; Chang, Ching-Chun; Nguyen, Huy H.; Echizen, Isao

doi:10.1007/978-981-97-2585-4_4

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14511))

Included in the following conference series:

International Workshop on Digital Watermarking

48 Accesses

Abstract

As the use and reliance on AI technologies continue to proliferate, there is mounting concern regarding adversarial example attacks, emphasizing the pressing necessity for robust defense strategies to protect AI systems from malicious input manipulation. In this paper, we introduce a computationally efficient plug-in module, seamlessly integrable with advanced diffusion models for purifying adversarial examples. Drawing inspiration from the concept of deconstruction and reconstruction (DR), our module decomposes an input image into foundational visual features expected to exhibit robustness against adversarial perturbations and subsequently rebuilds the image using an image-to-image transformation neural network. Through the collaborative integration of the module with an advanced diffusion model, this combination attains state-of-the-art performance in effectively purifying adversarial examples while preserving high classification accuracy on clean image samples. The model performance is evaluated on representative neural network classifiers pre-trained and fine-tuned on large-scale datasets. An ablation study analyses the impact of the proposed plug-in module on enhancing the effectiveness of diffusion-based purification. Furthermore, it is noteworthy that the module demonstrates significant computational efficiency, incurring only minimal computational overhead during the purification process.

E. Bao and C.-C. Chang—These authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.J., Srivastava, M.B., Chang, K.W.: Generating natural language adversarial examples. In: Proceedings of Conference on Empirical Methods Natural Language Processing (EMNLP) (2018)
Google Scholar
Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of International Conference on Machine Learning (ICML) (2018)
Google Scholar
Brendel, W., Rauber, J., Bethge, M.: Decision-based adversarial attacks: reliable attacks against black-box machine learning models. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Carlini, N., et al.: Hidden voice commands. In: Proceedings of USENIX Security Symposium (USENIX Security) (2016)
Google Scholar
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: Proceedings of IEEE Symposium on Security and Privacy (SP) (2017)
Google Scholar
Chen, P.Y., Zhang, H., Sharma, Y., Yi, J., Hsieh, C.J.: Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of ACM Workshop Artificial Intellgient Security (AISec) (2017)
Google Scholar
Croce, F., et al.: Robustbench: a standardized adversarial robustness benchmark. In: Proceedings of Advance Neural Information Processing System (NeurIPS) (2021)
Google Scholar
Dhillon, G.S., et al.: Stochastic activation pruning for robust adversarial defense. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Dong, Y., et al.: Boosting adversarial attacks with momentum. In: Proceedings of IEEE Conference on Computer Vision on Pattern Recognition (CVPR) (2018)
Google Scholar
Dong, Y., et al.: Efficient decision-based black-box adversarial attacks on face recognition. In: Proceedings of IEEE Conference on Computer Vision on Pattern Recognition (CVPR) (2019)
Google Scholar
Dziugaite, G.K., Ghahramani, Z., Roy, D.M.: A study of the effect of JPG compression on adversarial images. arXiv preprint arXiv:1608.00853 (2016)
Eykholt, K., et al.: Robust physical-world attacks on deep learning visual classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Proceedings of International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Guo, C., Rana, M., Cisse, M., van der Maaten, L.: Countering adversarial images using input transformations. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: Proceedings of International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Hendrycks, D., Gimpel, K.: Early methods for detecting adversarial images. In: Proceedings of International Conference on Learning Representations Workshop (ICLR) (2017)
Google Scholar
Ilyas, A., Engstrom, L., Athalye, A., Lin, J.: Black-box adversarial attacks with limited queries and information. In: Proceedings of International Conference on Machine Learning (ICML) (2018)
Google Scholar
Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Adversarial examples are not bugs, they are features. In: Proceedings of Advance Neural Information Processing System (NeurIPS) (2019)
Google Scholar
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: Proceedings of International Conference on Learning Representations Workshop (ICLR) (2017)
Google Scholar
Liao, F., Liang, M., Dong, Y., Pang, T., Hu, X., Zhu, J.: Defense against adversarial attacks using high-level representation guided denoiser. In: Proceedings of IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2018)
Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arxiv:1706.06083 (2017)
Metzen, J.H., Genewein, T., Fischer, V., Bischoff, B.: On detecting adversarial perturbations. In: Proceedings of International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Nie, W., Guo, B., Huang, Y., Xiao, C., Vahdat, A., Anandkumar, A.: Diffusion models for adversarial purification. In: Proceedings of International Conference on Machine Learning (ICML) (2022)
Google Scholar
Pang, T., Xu, K., Zhu, J.: Mixup inference: better exploiting mixup to defend adversarial attacks. In: Proceedings of International Conference on Learning Representations (ICLR) (2020)
Google Scholar
Raghunathan, A., Steinhardt, J., Liang, P.: Certified defenses against adversarial examples. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Rauber, J., Brendel, W., Bethge, M.: Foolbox: a python toolbox to benchmark the robustness of machine learning models. In: Proceedings of International Conference on Machine Learning (ICML) (2017)
Google Scholar
Samangouei, P., Kabkab, M., Chellappa, R.: Defense-GAN: protecting classifiers against adversarial attacks using generative models. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Shafahi, A., et al.: Adversarial training for free! In: Proceedings of Advance Neural Information Processing System (NeurIPS) (2019)
Google Scholar
Sharif, M., Bhagavatula, S., Bauer, L., Reiter, M.K.: Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In: Proceedings of ACM SIGSAC Conference on Computer Communication Security (CCS) (2016)
Google Scholar
Sharif, M., Bhagavatula, S., Bauer, L., Reiter, M.K.: A general framework for adversarial examples with objectives. ACM Trans. Priv. Secur. (TOPS) 22(3), 1–30 (2019)
Article Google Scholar
Sinha, A., Namkoong, H., Duchi, J.: Certifying some distributional robustness with principled adversarial training. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Song, Y., Kim, T., Nowozin, S., Ermon, S., Kushman, N.: Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. In: Proceedings of International Conference on Learning Representations (ICLR) (2014)
Google Scholar
Uesato, J., O’Donoghue, B., van den Oord, A., Kohli, P.: Adversarial risk and the dangers of evaluating against weak attacks. In: Proceedings of International Conference on Machine Learning (ICML) (2018)
Google Scholar
Wong, E., Kolter, Z.: Provable defenses against adversarial examples via the convex outer adversarial polytope. In: Proceedings of International Conference on Machine Learning (ICML) (2018)
Google Scholar
Wong, E., Schmidt, F., Metzen, J.H., Kolter, J.Z.: Scaling provable adversarial defenses. In: Proceedings of Advance Neural Information Processing System (NeurIPS) (2018)
Google Scholar
Xiao, C., Li, B., Zhu, J.Y., He, W., Liu, M., Song, D.: Generating adversarial examples with adversarial networks. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI) (2018)
Google Scholar
Xie, C., Wang, J., Zhang, Z., Ren, Z., Yuille, A.: Mitigating adversarial effects through randomization. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Xie, C., et al.: Improving transferability of adversarial examples with input diversity. In: Proceedings of IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2019)
Google Scholar
Xu, W., Evans, D., Qi, Y.: Feature squeezing: detecting adversarial examples in deep neural networks. In: Proceedings of Network Distribution System on Security Symposium (NDSS) (2018)
Google Scholar
Yuan, X., He, P., Zhu, Q., Li, X.: Adversarial examples: attacks and defenses for deep learning. IEEE Trans. Neural Netw. Learn. Syst. 30, 2805–2824 (2019)
Article MathSciNet Google Scholar
Zhang, D., Zhang, T., Lu, Y., Zhu, Z., Dong, B.: You only propagate once: accelerating adversarial training via maximal principle. In: Proceedings of Advances Neural Information Processing System (NeurIPS) (2019)
Google Scholar

Download references

Acknowledgments

This work was partially supported by JSPS KAKENHI Grants JP18H04120, JP20K23355, JP21H04907, and JP21K18023, and by JST CREST Grants JPMJCR18A6 and JPMJCR20D3, Japan.

Author information

Authors and Affiliations

The Graduate University for Advanced Studies, SOKENDAI, Kanagawa, Japan
Erjin Bao & Isao Echizen
National Institute of Informatics, Tokyo, Japan
Erjin Bao, Ching-Chun Chang, Huy H. Nguyen & Isao Echizen
The University of Tokyo, Tokyo, Japan
Isao Echizen

Authors

Erjin Bao
View author publications
You can also search for this author in PubMed Google Scholar
Ching-Chun Chang
View author publications
You can also search for this author in PubMed Google Scholar
Huy H. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Isao Echizen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erjin Bao .

Editor information

Editors and Affiliations

School of Cyber Security, Qilu University of Technology, Jinan, China
Bin Ma
Qilu University of Technology, Jinan, China
Jian Li
Qilu University of Technology, Jinan, China
Qi Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bao, E., Chang, CC., Nguyen, H.H., Echizen, I. (2024). From Deconstruction to Reconstruction: A Plug-In Module for Diffusion-Based Purification of Adversarial Examples. In: Ma, B., Li, J., Li, Q. (eds) Digital Forensics and Watermarking. IWDW 2023. Lecture Notes in Computer Science, vol 14511. Springer, Singapore. https://doi.org/10.1007/978-981-97-2585-4_4

Download citation

DOI: https://doi.org/10.1007/978-981-97-2585-4_4
Published: 25 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2584-7
Online ISBN: 978-981-97-2585-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

From Deconstruction to Reconstruction: A Plug-In Module for Diffusion-Based Purification of Adversarial Examples