skip to main content
10.1145/3472634.3472644acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesacm-turcConference Proceedingsconference-collections
research-article

Fighting Adversarial Images With Interpretable Gradients

Published:02 October 2021Publication History

ABSTRACT

Adversarial images are specifically designed to fool neural networks into making a wrong decision about what they are looking at, which severely degrade neural network accuracy. Recently, empirical and theoretical evidence suggests that robust neural network models tend to have better interpretable gradients. Therefore, we speculate that improving the interpretability of the gradients of the neural network models may also help to improve the robustness of the models. Two methods are used to add gradient-dependent constraint terms to the loss function of neural network models and both improve the robustness of the models. The first method adds the fussed lasso penalty term of the saliency maps to the loss function of the neural network models, which makes the saliency maps arrange in a natural way to improve the interpretability of the saliency maps, and uses the gradient enhancement for relu instead of relu to strengthen the constraint of regularization term on saliency maps. In the second method, the cosine similarity penalty term between the input gradients and the image contour is added to the loss function of the model to constrain the approximation between the input gradients and the image contour. This method has a certain biological significance, because the contour information of the image is used in the human visual system to recognize the image. Both methods improve the interpretability of model‘s gradients and the first method exceeds most regularization methods except adversarial training on MNIST and the second method even exceeds the adversarial training under white-box attacks on CIFAR-10 and CIFAR-100.

References

  1. Anish Athalye, Nicholas Carlini, and David Wagner. 2018. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International Conference on Machine Learning. PMLR, 274–283.Google ScholarGoogle Scholar
  2. Prasad Chalasani, Somesh Jha, Aravind Sadagopan, and Xi Wu. 2018. Adversarial learning and explainability in structured datasets. arXiv preprint arXiv:1810.06583(2018).Google ScholarGoogle Scholar
  3. Alvin Chan, Yi Tay, and Yew-Soon Ong. 2020. What it thinks is important is important: Robustness transfers through input gradients. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 332–341.Google ScholarGoogle ScholarCross RefCross Ref
  4. Alvin Chan, Yi Tay, Yew Soon Ong, and Jie Fu. 2019. Jacobian adversarially regularized networks for robustness. arXiv preprint arXiv:1912.10185(2019).Google ScholarGoogle Scholar
  5. Christian Etmann, Sebastian Lunz, Peter Maass, and Carola-Bibiane Schönlieb. 2019. On the Connection Between Adversarial Robustness and Saliency Map Interpretability. arXiv (2019).Google ScholarGoogle Scholar
  6. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and Harnessing Adversarial Examples. Computer Science (2014).Google ScholarGoogle Scholar
  7. Wenbo Guo, Dongliang Mu, Jun Xu, Purui Su, Gang Wang, and Xinyu Xing. 2018. Lemna: Explaining deep learning based security applications. In proceedings of the 2018 ACM SIGSAC conference on computer and communications security. 364–379.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yiwen Guo, Long Chen, Yurong Chen, and Changshui Zhang. 2020. On connections between regularizations for improving dnn robustness. IEEE transactions on pattern analysis and machine intelligence (2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083(2017).Google ScholarGoogle Scholar
  10. Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Jonathan Uesato, and Pascal Frossard. 2019. Robustness via curvature regularization, and vice versa. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9078–9086.Google ScholarGoogle ScholarCross RefCross Ref
  11. Adam Noack, Isaac Ahern, Dejing Dou, and Boyang Li. 2021. An Empirical Study on the Relation Between Network Interpretability and Adversarial Robustness. SN Computer Science 2, 1 (2021), 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Andrew Slavin Ross and Finale Doshi-Velez. 2017. Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients. (2017).Google ScholarGoogle Scholar
  13. Kevin Roth, Yannic Kilcher, and Thomas Hofmann. 2019. The odds are odd: A statistical test for detecting adversarial examples. In International Conference on Machine Learning. PMLR, 5498–5507.Google ScholarGoogle Scholar
  14. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. Computer ence (2013).Google ScholarGoogle Scholar
  15. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. Computer Science (2013).Google ScholarGoogle Scholar
  16. Robert Tibshirani, Michael Saunders, Saharon Rosset, Ji Zhu, and Keith Knight. 2010. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society 67, 1 (2010), 91–108.Google ScholarGoogle ScholarCross RefCross Ref
  17. Jonathan Uesato, Brendan O’donoghue, Pushmeet Kohli, and Aaron Oord. 2018. Adversarial risk and the dangers of evaluating against weak attacks. In International Conference on Machine Learning. PMLR, 5025–5034.Google ScholarGoogle Scholar
  18. Xingxing Wei, Siyuan Liang, Ning Chen, and Xiaochun Cao. 2018. Transferable Adversarial Attacks for Image and Video Object Detection. (2018).Google ScholarGoogle Scholar
  19. Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan L Yuille, and Kaiming He. 2019. Feature denoising for improving adversarial robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 501–509.Google ScholarGoogle ScholarCross RefCross Ref
  20. Xinyang Zhang, Ningfei Wang, Hua Shen, Shouling Ji, Xiapu Luo, and Ting Wang. 2020. Interpretable deep learning under fire. In 29th {USENIX} Security Symposium ({USENIX} Security 20).Google ScholarGoogle Scholar

Index Terms

  1. Fighting Adversarial Images With Interpretable Gradients
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ACM TURC '21: Proceedings of the ACM Turing Award Celebration Conference - China
        July 2021
        284 pages
        ISBN:9781450385671
        DOI:10.1145/3472634

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 October 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited
      • Article Metrics

        • Downloads (Last 12 months)26
        • Downloads (Last 6 weeks)9

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format