Explanation-Guided Minimum Adversarial Attack

Liu, Mingting; Liu, Xiaozhang; Yan, Anli; Qi, Yuan; Li, Wei

doi:10.1007/978-3-031-20096-0_20

Mingting Liu¹²,
Xiaozhang Liu¹³,
Anli Yan¹²,
Yuan Qi¹³ &
…
Wei Li¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13655))

Included in the following conference series:

International Conference on Machine Learning for Cyber Security

1021 Accesses

Abstract

Machine learning has been tremendously successful in various fields, rang-ing from image classification to natural language processing. Despite it has been gained ubiquitous, its application in high-risk domains has been hindered by the opacity of its decision-making, i.e., users do not understand the reason for the given prediction result. To circumvent this limitation, explainable artificial intelligence (XAI) is being developed from multiple perspectives and at multiple levels. However, the auxiliary information provided by XAI helps to build a trust bridge between users and models, while inevitably increasing the risk of the model being attacked. In this paper, we prove that explanation information has a certain risk of attack on the model, and to explore how the adversary can use explanation information to reduce the attack dimension. Our proposed attack method can reduce the perturbation range to a certain extent, i.e., the adversary can add perturbation in a very small range. It can ensure the distortion and success rate at the same time, reduce the perturbation amplitude, and obtain the adversary samples that can not be discernible by human eyes. Extensive evaluations results show that the explanation information provided by XAI provides a set of sensitive features for the adversary. On the CIFAR-10 dataset, the scope of our attack is 90% smaller than the C &W attack, while maintaining a similar success rate and distortion. At the same time, we verify that our method can still achieve good attack effect even in black box.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Molnar, C.: Interpretable Machine Learning (2020). https://www.lulu.com/
Tu, C.C., Ting, P., Chen, P.Y., et al.: Autozoom: autoencoder-based zeroth order optimization method for attacking black-box neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33(01), pp. 742–749 (2019)
Google Scholar
Aïvodji, U., Bolot, A., Gambs, S.: Model extraction from counterfactual explanations. arXiv preprint arXiv:2009.01884 (2020)
Amich, A., Eshete, B.: EG-Booster: explanation-guided booster of ML evasion attacks. arXiv preprint arXiv:2108.13930 (2021)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP). IEEE, pp. 39–57 (2017)
Google Scholar
Elshawi, R., Al-Mallah, M.H., Sakr, S.: On the interpretability of machine learning-based model for predicting hypertension. BMC Med. Inform. Decis. Making 19(1), 1–32 (2019)
Article Google Scholar
Shokri, R., Strobel, M., Zick, Y.: On the privacy risks of model explanations. In: AIES 2021: AAAI/ACM Conference on AI, Ethics, and Society. ACM (2021)
Google Scholar
Garcia, W., Choi, J.I., Adari, S.K., et al.: Explainable black-box attacks against model-based authentication. arXiv preprint arXiv:1810.00024 (2018)
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)
Google Scholar
Milli, S., Schmidt, L., Dragan, A.D., et al.: Model reconstruction from model explanations. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 1–9 (2019)
Google Scholar
Ovadia, Y., Fertig, E., et al.: Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift (2019)
Google Scholar
Papernot, N., McDaniel, P., Jha, S., et al.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS &P), pp. 372–387 IEEE (2016)
Google Scholar
Ribeiro, MT., Singh, S., Guestrin, C.: Why should I trust you?: explaining the predictions of any classifier. In: The 22nd ACM SIGKDD International Conference. ACM (2016)
Google Scholar
Su, J., Vargas, D.V., Sakurai, K.: One pixel attack for fooling deep neural net-works. IEEE Trans. Evol. Comput. 23(5), 828–841 (2019)
Article Google Scholar
Severi, G., Meyer, J., Coull, S., et al.: Explanation-guided backdoor poisoning attacks against malware classifiers. In: 30th USENIX Security Symposium (USENIX Security 21), pp. 1487–1504 (2021)
Google Scholar
Zhao, X., Zhang, W., Xiao, X., et al.: Exploiting explanations for model inversion attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 682–692 (2021)
Google Scholar
Chen, P.Y., et al.: Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 15–26 (2017)
Google Scholar
Andriushchenko, M., Croce, F., Flammarion, N., Hein, M.: Square attack: a query-efficient black-box adversarial attack via random search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 484–501. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_29
Chapter Google Scholar
Du, Z., Liu, F., Yan, X.: Minimum adversarial examples. Entropy 24(3), 396 (2022)
Article MathSciNet Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., et al.: Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128(2), 336–359 (2020)
Article Google Scholar
Wang, H., Wang, Z., Du, M., et al.: Score-CAM: score-weighted visual explanations for convolutional neural networks (2019)
Google Scholar
Mothilal, R.K., Sharma, A., Tan, C.: Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 607–617 (2020)
Google Scholar
Ilyas, A., Engstrom, L., Athalye, A., et al.: Query-efficient black-box adversarial examples (superceded). arXiv preprint arXiv:1712.07113 (2017)
Lee, H., Kim, S.T., Ro, Y.M.: Generation of multimodal justification using visual word constraint model for explainable computer-aided diagnosis. In: Suzuki, K., et al. (eds.) ML-CDS/IMIMIC -2019. LNCS, vol. 11797, pp. 21–29. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33850-3_3
Chapter Google Scholar
Meyes, R., de Puiseau, C.W., Posada-Moreno, A., Meisen, T.: Under the hood of neural networks: characterizing learned representations by functional neuron populations and network ablations. arXiv preprint arXiv:2004.01254 (2020)
Van Molle, P., De Strooper, M., Verbelen, T., Vankeirsbilck, B., Simoens, P., Dhoedt, B.: Visualizing convolutional neural networks to improve decision support for skin lesion classification. In: Stoyanov, D., et al. (eds.) MLCN/DLF/IMIMIC -2018. LNCS, vol. 11038, pp. 115–123. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02628-8_13
Chapter Google Scholar

Download references

Acknowledgements

This work is supported by This work is supported by the National Natural Science Foundation of China (Grant No. 61966011), Hainan University Education and Teaching Reform Research Project (Grant No. HDJWJG01).

Author information

Authors and Affiliations

School of Cyberspace Security, Hainan University, Hainan, 570228, China
Mingting Liu & Anli Yan
School of Computer Science and Technology, Hainan University, Hainan, 570228, China
Xiaozhang Liu, Yuan Qi & Wei Li

Authors

Mingting Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaozhang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Anli Yan
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Qi
View author publications
You can also search for this author in PubMed Google Scholar
Wei Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaozhang Liu .

Editor information

Editors and Affiliations

School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, IN, USA
Yuan Xu
Institute of Artificial Intelligence and Blockchain, Guangzhou University, Guangzhou, China
Hongyang Yan
Institute of Artificial Intelligence and Blockchain, Guangzhou University, Guangzhou, China
Huang Teng
Guangdong Polytechnic Normal University, Guangzhou, China
Jun Cai
Institute of Artificial Intelligence and Blockchain, Guangzhou University, Guangzhou, China
Jin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, M., Liu, X., Yan, A., Qi, Y., Li, W. (2023). Explanation-Guided Minimum Adversarial Attack. In: Xu, Y., Yan, H., Teng, H., Cai, J., Li, J. (eds) Machine Learning for Cyber Security. ML4CS 2022. Lecture Notes in Computer Science, vol 13655. Springer, Cham. https://doi.org/10.1007/978-3-031-20096-0_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-20096-0_20
Published: 13 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20095-3
Online ISBN: 978-3-031-20096-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics