research-article

Fighting Adversarial Images With Interpretable Gradients

Authors:
Keke Du

Donghua University, China

Donghua University, China
View Profile

,
Shan Chang

Donghua University, China

Donghua University, China
View Profile

,
Huixiang Wen

Donghua University, China

Donghua University, China
View Profile

,
Hao Zhang

Donghua University, China

Donghua University, China
View Profile

ACM TURC '21: Proceedings of the ACM Turing Award Celebration Conference - ChinaJuly 2021Pages 44–48https://doi.org/10.1145/3472634.3472644

Published:02 October 2021Publication History

ACM TURC '21: Proceedings of the ACM Turing Award Celebration Conference - China

Pages 44–48

ABSTRACT

Adversarial images are specifically designed to fool neural networks into making a wrong decision about what they are looking at, which severely degrade neural network accuracy. Recently, empirical and theoretical evidence suggests that robust neural network models tend to have better interpretable gradients. Therefore, we speculate that improving the interpretability of the gradients of the neural network models may also help to improve the robustness of the models. Two methods are used to add gradient-dependent constraint terms to the loss function of neural network models and both improve the robustness of the models. The first method adds the fussed lasso penalty term of the saliency maps to the loss function of the neural network models, which makes the saliency maps arrange in a natural way to improve the interpretability of the saliency maps, and uses the gradient enhancement for relu instead of relu to strengthen the constraint of regularization term on saliency maps. In the second method, the cosine similarity penalty term between the input gradients and the image contour is added to the loss function of the model to constrain the approximation between the input gradients and the image contour. This method has a certain biological significance, because the contour information of the image is used in the human visual system to recognize the image. Both methods improve the interpretability of model‘s gradients and the first method exceeds most regularization methods except adversarial training on MNIST and the second method even exceeds the adversarial training under white-box attacks on CIFAR-10 and CIFAR-100.

References

Anish Athalye, Nicholas Carlini, and David Wagner. 2018. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International Conference on Machine Learning. PMLR, 274–283.Google Scholar
Prasad Chalasani, Somesh Jha, Aravind Sadagopan, and Xi Wu. 2018. Adversarial learning and explainability in structured datasets. arXiv preprint arXiv:1810.06583(2018).Google Scholar
Alvin Chan, Yi Tay, and Yew-Soon Ong. 2020. What it thinks is important is important: Robustness transfers through input gradients. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 332–341.Google ScholarCross Ref
Alvin Chan, Yi Tay, Yew Soon Ong, and Jie Fu. 2019. Jacobian adversarially regularized networks for robustness. arXiv preprint arXiv:1912.10185(2019).Google Scholar
Christian Etmann, Sebastian Lunz, Peter Maass, and Carola-Bibiane Schönlieb. 2019. On the Connection Between Adversarial Robustness and Saliency Map Interpretability. arXiv (2019).Google Scholar
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and Harnessing Adversarial Examples. Computer Science (2014).Google Scholar
Wenbo Guo, Dongliang Mu, Jun Xu, Purui Su, Gang Wang, and Xinyu Xing. 2018. Lemna: Explaining deep learning based security applications. In proceedings of the 2018 ACM SIGSAC conference on computer and communications security. 364–379.Google ScholarDigital Library
Yiwen Guo, Long Chen, Yurong Chen, and Changshui Zhang. 2020. On connections between regularizations for improving dnn robustness. IEEE transactions on pattern analysis and machine intelligence (2020).Google ScholarDigital Library
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083(2017).Google Scholar
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Jonathan Uesato, and Pascal Frossard. 2019. Robustness via curvature regularization, and vice versa. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9078–9086.Google ScholarCross Ref
Adam Noack, Isaac Ahern, Dejing Dou, and Boyang Li. 2021. An Empirical Study on the Relation Between Network Interpretability and Adversarial Robustness. SN Computer Science 2, 1 (2021), 1–13.Google ScholarDigital Library
Andrew Slavin Ross and Finale Doshi-Velez. 2017. Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients. (2017).Google Scholar
Kevin Roth, Yannic Kilcher, and Thomas Hofmann. 2019. The odds are odd: A statistical test for detecting adversarial examples. In International Conference on Machine Learning. PMLR, 5498–5507.Google Scholar
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. Computer ence (2013).Google Scholar
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. Computer Science (2013).Google Scholar
Robert Tibshirani, Michael Saunders, Saharon Rosset, Ji Zhu, and Keith Knight. 2010. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society 67, 1 (2010), 91–108.Google ScholarCross Ref
Jonathan Uesato, Brendan O’donoghue, Pushmeet Kohli, and Aaron Oord. 2018. Adversarial risk and the dangers of evaluating against weak attacks. In International Conference on Machine Learning. PMLR, 5025–5034.Google Scholar
Xingxing Wei, Siyuan Liang, Ning Chen, and Xiaochun Cao. 2018. Transferable Adversarial Attacks for Image and Video Object Detection. (2018).Google Scholar
Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan L Yuille, and Kaiming He. 2019. Feature denoising for improving adversarial robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 501–509.Google ScholarCross Ref
Xinyang Zhang, Ningfei Wang, Hua Shen, Shouling Ji, Xiapu Luo, and Ting Wang. 2020. Interpretable deep learning under fire. In 29th {USENIX} Security Symposium ({USENIX} Security 20).Google Scholar

Index Terms

Fighting Adversarial Images With Interpretable Gradients
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Wavelet regularization benefits adversarial training
Abstract
Adversarial training methods are frequently-used empirical defense methods against adversarial examples. While many regularization techniques demonstrate effectiveness when combined with adversarial training, these methods typically ...
Read More
LW-Net: an interpretable network with smart lifting wavelet kernel for mechanical feature extraction and fault diagnosis
Abstract
Deep learning has been applied in mechanical fault diagnosis. Hereinto, the convolutional neural network (CNN) has the shallow convolution operation, supporting the function of feature learning. However, the interpretability of CNN has always been ...
Read More
An extensive evaluation of deep featuresof convolutional neural networks for saliency prediction of human visual attention
Abstract
Based on transfer learning, feature maps of deep convolutional neural networks (DCNNs) have been used to predict human visual attention. In this paper, we conduct extensive comparisons to investigate effects of feature maps on the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ACM TURC '21: Proceedings of the ACM Turing Award Celebration Conference - China
July 2021
284 pages
ISBN:9781450385671
DOI:10.1145/3472634

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
adversarial images
input gradients
interpretability
robustness
saliency map
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 95
  Total Downloads
- Downloads (Last 12 months)26
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Fighting Adversarial Images With Interpretable Gradients

ACM TURC '21: Proceedings of the ACM Turing Award Celebration Conference - China

ABSTRACT

References

Cited By

Index Terms

Recommendations

Wavelet regularization benefits adversarial training

LW-Net: an interpretable network with smart lifting wavelet kernel for mechanical feature extraction and fault diagnosis

An extensive evaluation of deep featuresof convolutional neural networks for saliency prediction of human visual attention

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Fighting Adversarial Images With Interpretable Gradients

ACM TURC '21: Proceedings of the ACM Turing Award Celebration Conference - China

ABSTRACT

References

Cited By

Index Terms

Recommendations

Wavelet regularization benefits adversarial training

LW-Net: an interpretable network with smart lifting wavelet kernel for mechanical feature extraction and fault diagnosis

An extensive evaluation of deep featuresof convolutional neural networks for saliency prediction of human visual attention

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media