Abstract
The adversarial patch is a practical and effective method that modifies a small region on an image, making DNNs fail to classify. Existing empirical defenses against adversarial patch attacks lack theoretical analysis and are vulnerable to adaptive attacks. To overcome such shortcomings, certified defenses that provide a guaranteed classification performance in the face of strong unknown adversarial attacks are proposed. However, on the one hand, existing certified defenses either have low clean accuracy or need specified architecture, which is not robust enough. On the other hand, they can only provide provable accuracy but ignore the relationship to the number of perturbations. In this paper, we propose a certified defense against patch attacks that provides both the provable radius and high classification accuracy. By adding Gaussian noises only on the patch region with a mask, we prove that a stronger certificate with high confidence can be achieved by randomized smoothing. Furthermore, we design a practical scheme based on joint voting to find the patch with a high probability and certify it effectively. Our defense achieves 86.4% clean accuracy and 71.8% certified accuracy on CIFAR-10 exceeding the maximum 60% certified accuracy of existing methods. The clean accuracy of 67.8% and the certified accuracy of 53.6% on ImageNet are better than the state-of-the-art method, whose certified accuracy is 26%.
Similar content being viewed by others
References
Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks. In: Proceedings of the 2nd International Conference on Learning Representations, 2014
Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. In: Proceedings of the 3rd International Conference on Learning Representations, 2015
Moosavi-Dezfooli S M, Fawzi A, Frossard P. Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 2574–2582
Carlini N, Wagner D. Towards evaluating the robustness of neural Networks. In: Proceedings of IEEE Symposium on Security and Privacy, 2017. 39–57
Chen P Y, Zhang H, Sharma Y, et al. Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, 2017. 15–26
Madry A, Makelov A, Schmidt L, et al. Towards deep learning models resistant to adversarial attacks. In: Proceedings of the 6th International Conference on Learning Representations, 2018
Brown T B, Mané D, Roy A, et al. Adversarial patch. 2017. ArXiv:1712.09665
Karmon D, Zoran D, Goldberg Y. LaVAN: localized and visible adversarial noise. In: Proceedings of the 35th International Conference on Machine Learning, 2018. 2507–2515
Yang C L, Kortylewski A, Xie C, et al. Patchattack: a black-box texture-based attack with reinforcement learning. In: Proceedings of the 16th European Conference on Computer Vision, 2020. 681–698
Li Y, Bian X, Lyu S. Attacking object detectors via imperceptible patches on background. 2018. ArXiv:1809.05966
Lee M, Kolter J Z. On physical adversarial patches for object detection. 2019. ArXiv:1906.11897
Wu Z, Lim S N, Davis L, et al. Making an invisibility cloak: real world adversarial attacks on object detectors. In: Proceedings of the 16th European Conference on Computer Vision, 2020. 1–17
Xu K, Zhang G, Liu S, et al. Adversarial T-shirt! Evading person detectors in a physical world. In: Proceedings of the 16th European Conference on Computer Vision, 2020. 665–681
Saha A, Subramanya A, Patil K, et al. Role of spatial context in adversarial robustness for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2020. 784–785
Redmon J, Farhadi A. YOLOv3: an incremental improvement. 2018. ArXiv:1804.02767
Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 1137–1149
Pautov M, Melnikov G, Kaziakhmedov E, et al. On adversarial patches: real-world attack on ArcFace-100 face recognition system. In: Proceedings of International Multi-Conference on Engineering, Computer and Information Sciences, 2019. 391–396
Komkov S A, Petiushko A. Advhat: real-world adversarial attack on arcface face id system. In: Proceedings of the 25th International Conference on Pattern Recognition, 2021. 819–826
Yang X, Wei F, Zhang H, et al. Design and interpretation of universal adversarial patches in face detection. In: Proceedings of the 16th European Conference on Computer Vision, 2020. 174–191
Hayes J. On visible adversarial perturbations & digital watermarking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018. 1597–1604
Naseer M, Khan S, Porikli F. Local gradients smoothing: defense against localized adversarial attacks. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision, 2019. 1300–1307
Wu T, Tong L, Vorobeychik Y. Defending against physically realizable attacks on image classification. In: Proceedings of the 8th International Conference on Learning Representations, 2020
Rao S, Stutz D, Schiele B. Adversarial training against location-optimized adversarial patches. In: Proceedings of European Conference on Computer Vision Workshops, 2020. 429–448
Athalye A, Carlini N, Wagner D A. Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning, 2018. 274–283
Carlini N, Athalye A, Papernot N, et al. On evaluating adversarial robustness. 2019. ArXiv:1902.06705
Chiang P Y, Ni R, Abdelkader A, et al. Certified defenses for adversarial patches. In: Proceedings of the 8th International Conference on Learning Representations, 2020
Levine A, Feizi S. (De) randomized smoothing for certifiable defense against patch attacks. In: Proceedings of Advances in Neural Information Processing Systems, 2020
Zhang Z, Yuan B, McCoyd M, et al. Clipped bagNet: defending against sticker attacks with clipped bag-of-features. In: Proceedings of IEEE Security and Privacy Workshops, 2020. 55–61
Xiang C, Bhagoji A N, Sehwag V, et al. Patchguard: a provably robust defense against adversarial patches via small receptive fields and masking. In: Proceedings of the 30th USENIX Security Symposium, 2021
Metzen J H, Yatsura M. Efficient certified defenses against patch attacks on image classifiers. In: Proceedings of the 9th International Conference on Learning Representations, 2021
Subramanya A, Pillai V, Pirsiavash H. Fooling network interpretation in image classification. In: Proceedings of IEEE International Conference on Computer Vision, 2019. 2020–2029
Gittings T, Schneider S, Collomosse J. Robust synthesis of adversarial visual examples using a deep image prior. In: Proceedings of the 30th British Machine Vision Conference, 2019
Ulyanov D, Vedaldi A S, Lempitsky V. Deep image prior. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 9446–9454
Fendley N, Lennon M, Wang I, et al. Jacks of all trades, masters of none: addressing distributional shift and obtrusiveness via transparent patch attacks. In: Proceedings of European Conference on Computer Vision Workshops, 2020. 105–119
Brunner T, Diehl F, Knoll A. Copy and paste: a simple but effective initialization method for black-box adversarial attacks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019
Liu A, Liu X, Fan J, et al. Perceptual-sensitive GAN for generating adversarial patches. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019. 1028–1035
Luo J, Bai T, Zhao J, et al. Generating adversarial yet inconspicuous patches with a single image. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021. 15837–15838
Gowal S, Stanforth R. Scalable verified training for provably robust image classification. In: Proceedings of IEEE International Conference on Computer Vision, 2019. 4841–4850
Cohen J, Rosenfeld E, Kolter Z. Certified adversarial robustness via randomized smoothing. In: Proceedings of the 36th International Conference on Machine Learning, 2019. 1310–1320
Levine A, Feizi S. Robustness certificates for sparse adversarial attacks by randomized ablation. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020. 4585–4593
McCoyd M, Park W, Chen S, et al. Minority reports defense: defending against adversarial patches. In: Proceedings of Applied Cryptography and Network Security Workshops, 2020. 564–582
Neyman J, Pearson E S. On the problem of the most efficient tests of statistical hypotheses. Phil Trans R Soc Lond A, 1933, 231: 289–337
Telea A. An image inpainting technique based on the fast marching method. J Graphics Tools, 2004, 9: 23–34
Su J, Vargas D V, Sakurai K. One pixel attack for fooling deep neural networks. IEEE Trans Evol Comput, 2019, 23: 828–841
Black S, Keshavarz S, Souvenir R. Evaluation of image inpainting for classification and retrieval. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision, 2020. 1060–1069
Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. 2009
Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE conference on Computer Vision and Pattern Recognition, 2009. 248–255
Selvaraju R R, Cogswell M, Das A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int J Comput Vis, 2020, 128: 336–359
Acknowledgements
This work was supported in part by National Natural Science Foundation of China (Grant Nos. U20B2047, 62072421, 62002334, 62102386, 62121002), Exploration Fund Project of University of Science and Technology of China (Grant No. YD348000-2001), and Fundamental Research Funds for the Central Universities (Grant No. WK2100000011).
Author information
Authors and Affiliations
Corresponding author
Additional information
Supporting information Appendixes A–C. The supporting information is available online at info.scichina.com and link. springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Zhang, K., Zhou, H., Bian, H. et al. Certified defense against patch attacks via mask-guided randomized smoothing. Sci. China Inf. Sci. 65, 170306 (2022). https://doi.org/10.1007/s11432-021-3457-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-021-3457-7