Skip to main content

Explanation-Guided Minimum Adversarial Attack

  • Conference paper
  • First Online:
Machine Learning for Cyber Security (ML4CS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13655))

Included in the following conference series:

  • 1021 Accesses

Abstract

Machine learning has been tremendously successful in various fields, rang-ing from image classification to natural language processing. Despite it has been gained ubiquitous, its application in high-risk domains has been hindered by the opacity of its decision-making, i.e., users do not understand the reason for the given prediction result. To circumvent this limitation, explainable artificial intelligence (XAI) is being developed from multiple perspectives and at multiple levels. However, the auxiliary information provided by XAI helps to build a trust bridge between users and models, while inevitably increasing the risk of the model being attacked. In this paper, we prove that explanation information has a certain risk of attack on the model, and to explore how the adversary can use explanation information to reduce the attack dimension. Our proposed attack method can reduce the perturbation range to a certain extent, i.e., the adversary can add perturbation in a very small range. It can ensure the distortion and success rate at the same time, reduce the perturbation amplitude, and obtain the adversary samples that can not be discernible by human eyes. Extensive evaluations results show that the explanation information provided by XAI provides a set of sensitive features for the adversary. On the CIFAR-10 dataset, the scope of our attack is 90% smaller than the C &W attack, while maintaining a similar success rate and distortion. At the same time, we verify that our method can still achieve good attack effect even in black box.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Molnar, C.: Interpretable Machine Learning (2020). https://www.lulu.com/

  2. Tu, C.C., Ting, P., Chen, P.Y., et al.: Autozoom: autoencoder-based zeroth order optimization method for attacking black-box neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33(01), pp. 742–749 (2019)

    Google Scholar 

  3. Aïvodji, U., Bolot, A., Gambs, S.: Model extraction from counterfactual explanations. arXiv preprint arXiv:2009.01884 (2020)

  4. Amich, A., Eshete, B.: EG-Booster: explanation-guided booster of ML evasion attacks. arXiv preprint arXiv:2108.13930 (2021)

  5. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP). IEEE, pp. 39–57 (2017)

    Google Scholar 

  6. Elshawi, R., Al-Mallah, M.H., Sakr, S.: On the interpretability of machine learning-based model for predicting hypertension. BMC Med. Inform. Decis. Making 19(1), 1–32 (2019)

    Article  Google Scholar 

  7. Shokri, R., Strobel, M., Zick, Y.: On the privacy risks of model explanations. In: AIES 2021: AAAI/ACM Conference on AI, Ethics, and Society. ACM (2021)

    Google Scholar 

  8. Garcia, W., Choi, J.I., Adari, S.K., et al.: Explainable black-box attacks against model-based authentication. arXiv preprint arXiv:1810.00024 (2018)

  9. Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)

    Google Scholar 

  10. Milli, S., Schmidt, L., Dragan, A.D., et al.: Model reconstruction from model explanations. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 1–9 (2019)

    Google Scholar 

  11. Ovadia, Y., Fertig, E., et al.: Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift (2019)

    Google Scholar 

  12. Papernot, N., McDaniel, P., Jha, S., et al.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS &P), pp. 372–387 IEEE (2016)

    Google Scholar 

  13. Ribeiro, MT., Singh, S., Guestrin, C.: Why should I trust you?: explaining the predictions of any classifier. In: The 22nd ACM SIGKDD International Conference. ACM (2016)

    Google Scholar 

  14. Su, J., Vargas, D.V., Sakurai, K.: One pixel attack for fooling deep neural net-works. IEEE Trans. Evol. Comput. 23(5), 828–841 (2019)

    Article  Google Scholar 

  15. Severi, G., Meyer, J., Coull, S., et al.: Explanation-guided backdoor poisoning attacks against malware classifiers. In: 30th USENIX Security Symposium (USENIX Security 21), pp. 1487–1504 (2021)

    Google Scholar 

  16. Zhao, X., Zhang, W., Xiao, X., et al.: Exploiting explanations for model inversion attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 682–692 (2021)

    Google Scholar 

  17. Chen, P.Y., et al.: Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 15–26 (2017)

    Google Scholar 

  18. Andriushchenko, M., Croce, F., Flammarion, N., Hein, M.: Square attack: a query-efficient black-box adversarial attack via random search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 484–501. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_29

    Chapter  Google Scholar 

  19. Du, Z., Liu, F., Yan, X.: Minimum adversarial examples. Entropy 24(3), 396 (2022)

    Article  MathSciNet  Google Scholar 

  20. Selvaraju, R.R., Cogswell, M., Das, A., et al.: Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128(2), 336–359 (2020)

    Article  Google Scholar 

  21. Wang, H., Wang, Z., Du, M., et al.: Score-CAM: score-weighted visual explanations for convolutional neural networks (2019)

    Google Scholar 

  22. Mothilal, R.K., Sharma, A., Tan, C.: Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 607–617 (2020)

    Google Scholar 

  23. Ilyas, A., Engstrom, L., Athalye, A., et al.: Query-efficient black-box adversarial examples (superceded). arXiv preprint arXiv:1712.07113 (2017)

  24. Lee, H., Kim, S.T., Ro, Y.M.: Generation of multimodal justification using visual word constraint model for explainable computer-aided diagnosis. In: Suzuki, K., et al. (eds.) ML-CDS/IMIMIC -2019. LNCS, vol. 11797, pp. 21–29. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33850-3_3

    Chapter  Google Scholar 

  25. Meyes, R., de Puiseau, C.W., Posada-Moreno, A., Meisen, T.: Under the hood of neural networks: characterizing learned representations by functional neuron populations and network ablations. arXiv preprint arXiv:2004.01254 (2020)

  26. Van Molle, P., De Strooper, M., Verbelen, T., Vankeirsbilck, B., Simoens, P., Dhoedt, B.: Visualizing convolutional neural networks to improve decision support for skin lesion classification. In: Stoyanov, D., et al. (eds.) MLCN/DLF/IMIMIC -2018. LNCS, vol. 11038, pp. 115–123. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02628-8_13

    Chapter  Google Scholar 

Download references

Acknowledgements

This work is supported by This work is supported by the National Natural Science Foundation of China (Grant No. 61966011), Hainan University Education and Teaching Reform Research Project (Grant No. HDJWJG01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaozhang Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, M., Liu, X., Yan, A., Qi, Y., Li, W. (2023). Explanation-Guided Minimum Adversarial Attack. In: Xu, Y., Yan, H., Teng, H., Cai, J., Li, J. (eds) Machine Learning for Cyber Security. ML4CS 2022. Lecture Notes in Computer Science, vol 13655. Springer, Cham. https://doi.org/10.1007/978-3-031-20096-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20096-0_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20095-3

  • Online ISBN: 978-3-031-20096-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics