ABSTRACT
MagNet is a defense method that adopts autoencoders to detect and purify adversarial examples. Although MagNet is robust against grey-box and black-box attacks, it is vulnerable to white-box attacks. Despite this prior knowledge, the fundamental reason for and mitigation of the vulnerability of MagNet have not been discussed. We suggest that the challenge of MagNet is the generalization of the data manifold. To explain this, in this work, we leverage deep learning coverage for the reformer of MagNet. We mutate training images through image transformation algorithms and then train the reformer using mutants with new coverage information. The selected mutants provide an interesting data manifold, that cannot be handled by the random noise of MagNet, to the reformer. In grey-box settings, our defense method classified adversarial examples for various perturbation sizes much more accurately than MagNet even with the same architecture. Based on the preliminary result of this work, we consider future work to identify whether the generalization power of deep learning coverage is effective for stronger adversaries and different architectures.
Supplemental Material
- Nicholas Carlini and David Wagner. 2017. Magnet and" efficient defenses against adversarial attacks" are not robust to adversarial examples. arXiv preprint arXiv:1711.08478 (2017).Google Scholar
- Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In Proc. the IEEE SP. 39--57.Google ScholarCross Ref
- Zhong Li et al. 2021. Testing DNN-based Autonomous Driving Systems under Critical Environmental Conditions. In Proc. the ICML. 6471--6482.Google Scholar
- Lei Ma et al. 2018. Deepgauge: Multi-granularity testing criteria for deep learning systems. In Proc. the ACM/IEEE ASE. 120--131.Google Scholar
- Dongyu Meng et al. 2017. Magnet: a Two-Pronged Defense against Adversarial Examples. In Proc. the ACM CCS. 135--147.Google Scholar
- Weili Nie et al. 2022. Diffusion Models for Adversarial Purification. In Proc. the ICML.Google Scholar
- Kexin Pei et al. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In Proc. the SOSP. 1--18.Google Scholar
- Xiaofei Xie et al. 2019. Deephunter: a coverage-guided fuzz testing framework for deep neural networks. In Proc. the ACM ISSTA. 146--157.Google Scholar
- Shenao Yan et al. 2020. Correlations between deep neural network model coverage criteria and model quality. In Proc. the ACM FSE. 775--787.Google Scholar
Index Terms
- Poster: Adversarial Defense with Deep Learning Coverage on MagNet's Purification
Recommendations
A hybrid adversarial training for deep learning model and denoising network resistant to adversarial examples
AbstractDeep neural networks (DNNs) are vulnerable to adversarial attacks that generate adversarial examples by adding small perturbations to the clean images. To combat adversarial attacks, the two main defense methods used are denoising and adversarial ...
Adversarial attacks and defenses in deep learning for image recognition: A survey
Highlights- Introducing the concepts of adversarial examples and adversarial learning.
- ...
AbstractIn recent years, researches on adversarial attacks and defense mechanisms have obtained much attention. It’s observed that adversarial examples crafted with small malicious perturbations would mislead the deep neural network (DNN) ...
Toward Defensive Letter Design
Pattern RecognitionAbstractA major approach for defending against adversarial attacks aims at controlling only image classifiers to be more resilient, and it does not care about visual objects, such as pandas and cars, in images. This means that visual objects themselves ...
Comments