Fooling Neural Network Interpretations: Adversarial Noise to Attack Images

Song, Qianqian; Kong, Xiangwei; Wang, Ziming

doi:10.1007/978-3-030-93049-3_4

Qianqian Song¹⁴,
Xiangwei Kong¹⁵ &
Ziming Wang¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13070))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

1328 Accesses
1 Citations

Abstract

The accurate interpretation of neural network about how the network works is important. However, a manipulated explanation may have the potential to mislead human users not to trust a reliable network. Therefore, it is necessary to verify interpretation algorithms by designing effective attacks to simulate various possible threats in the real world. In this work, we mainly explore how to mislead interpretation. More specifically, we optimize the noise added to the input, which aims to highlight a certain area that we specify without changing the output category of network. With our proposed algorithm, we demonstrate that the state-of-the-art saliency maps based interpreters, e.g., Grad-CAM, Guided-Feature-Inversion, Grad-CAM++, Score-CAM and Full-Grad can be easily fooled. We propose two situations of fooling, Single-target attack and Multi-target attack, and show that the fooling can be transfered to different interpretation methods as well as generalized to the unseen samples with the universal noise. We also take image patches to fool Grad-CAM. Our results are proved in both qualitative and quantitative ways and we further propose a quantitative metric to measure the effectiveness of algorithm. We believe that our method can serve as an additional evaluation of robustness for future interpretation algorithms.

Supported by National Natural Science Foundation of China (61772111).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Selvaraju, Ramprasaath, R., et al.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Google Scholar
DU, M., et al.: Towards explanation of DNN-based prediction with guided feature inversion. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1358–1367 (2018)
Google Scholar
Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V. N.: Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 839–847. IEEE (2018)
Google Scholar
Wang, H., Wang, Z., Du, M., Yang, F., et al.: Score-CAM: score-weighted visual explanations for convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 111–119 (2020)
Google Scholar
Srinivas, S., Fleuret, F.: Full-gradient representation for neural network visualization. In: Advances in Neural Information Processing Systems, pp. 4124–4133 (2019)
Google Scholar
Subramanya, A., Pillai, V., Pirsiavash, H.: Fooling network interpretation in image classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2020–2029 (2019)
Google Scholar
Heo, J., Joo, S., Moon, T.: Fooling neural network interpretations via adversarial model manipulation. In: Advances in Neural Information Processing Systems, pp. 2921–2932 (2019)
Google Scholar
Dombrowski, A.K., Alber, M., Anders, C., et al.: Explanations can be manipulated and geometry is to blame. In: Advances in Neural Information Processing Systems, pp. 13567–13578 (2019)
Google Scholar
Lakkaraju, H., Bastani, O.: How do I fool you? manipulating user trust via misleading black box explanations. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pp. 79–85 (2020)
Google Scholar
Ghorbani, A., Abid, A., Zou, J.: Interpretation of neural networks is fragile. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 3681–3688 (2019)
Google Scholar
Adebayo, J., Gilmer, J., Muelly, M., et al.: Sanity checks for saliency maps. In: CoRR, abs/1810.03292 (2018)
Google Scholar
Kindermans, P.-J., et al.: The (Un) reliability of saliency methods. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 267–280. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_14
Chapter Google Scholar
Myers, L., Sirois, M. J.: S pearman correlation coefficients, differences between. In: Encyclopedia of statistical sciences (2004)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Choi, J.H., Zhang, H., Kim, J.H., Hsieh, C.J., Lee, J.S.: Evaluating robustness of deep image super-resolution against adversarial attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 303–311 (2019)
Google Scholar
Lakkaraju, H., Kamar, E., Caruana, R., Leskovec, J.: Faithful and customizable explanations of black box models. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 131–138 (2019)
Google Scholar
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. In: Proceedings of the International Conference on Learning Representations (2016)
Google Scholar
Brown, T.B., Mané, D., Roy, A., Abadi, M., Gilmer, J.: Adversarial patch. In: Machine learning and Computer Security Workshop - NeurIPS (2017)
Google Scholar
Ching, T., et al.: Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15(141), 20170387 (2017)
Article Google Scholar
Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H.: Understanding neural networks through deep visualization. In: CoRR, abs/1506.06579 (2015)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Chapter Google Scholar
Thys, S., Van Ranst, W., Goedemé, T.: Fooling automated surveillance cameras: adversarial patches to attack person detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 49–55 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Communication Engineering, Dalian University of Technology, Dalian, 116024, China
Qianqian Song
School of Management, Zhejiang University, Hangzhou, 310058, China
Xiangwei Kong & Ziming Wang

Authors

Qianqian Song
View author publications
You can also search for this author in PubMed Google Scholar
Xiangwei Kong
View author publications
You can also search for this author in PubMed Google Scholar
Ziming Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangwei Kong .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Duke University, Durham, NC, USA
Yiran Chen
Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
University of British Columbia, Vancouver, BC, Canada
Jane Wang
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Ruiping Wang
Xidian University, Xi'an, China
Weisheng Dong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, Q., Kong, X., Wang, Z. (2021). Fooling Neural Network Interpretations: Adversarial Noise to Attack Images. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds) Artificial Intelligence. CICAI 2021. Lecture Notes in Computer Science(), vol 13070. Springer, Cham. https://doi.org/10.1007/978-3-030-93049-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-93049-3_4
Published: 01 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93048-6
Online ISBN: 978-3-030-93049-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fooling Neural Network Interpretations: Adversarial Noise to Attack Images