skip to main content
10.1145/3576915.3624379acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
poster

Poster: Fooling XAI with Explanation-Aware Backdoors

Published: 21 November 2023 Publication History

Abstract

The overabundance of learnable parameters in recent machine-learning models renders them inscrutable. Even their developers can not explain their exact inner workings anymore. For this reason, researchers have developed explanation algorithms to shed light on a model's decision-making process. Explanations identify the deciding factors for a model's decision. Therefore, much hope is set in explanations to solve problems like biases, spurious correlations, and more prominently attacks like neural backdoors.
In this paper, we present explanation-aware backdoors, which fool both, the model's decisions and the explanation algorithm in the presence of a trigger. Explanation-aware backdoors therefore can bypass explanation-based detection techniques and "throw a red herring" at the human analyst. While we have presented successful explanation-aware backdoors in our original work, "Disguising Attacks with Explanation-Aware Backdoors," in this paper, we provide a brief overview and a focus on the dataset "German Traffic Sign Recognition Benchmark" (GTSRB). We evaluate a different trigger and target explanation compared to the original paper and present results for GradCAM explanations. Supplemental material is publicly available at https://intellisec.de/research/xai-backdoor.

References

[1]
E. Bagdasaryan and V. Shmatikov. Blind backdoors in deep learning models. In Proc. of the USENIX Security Symposium, pages 1505--1521, 2021.
[2]
H. Baniecki and P. Biecek. Adversarial attacks and defenses in explainable artificial intelligence: A survey. In Proc. of the IJCAI Workshop of explainable AI (XAI), 2023.
[3]
E. Chou, F. Tramèr, and G. Pellegrino. SentiNet: Detecting localized universal attacks against deep learning systems. In Proc. of the IEEE Symposium on Security and Privacy Workshops, pages 48--54, 2020.
[4]
B. G. Doan, E. Abbasnejad, and D. C. Ranasinghe. Februus: Input purification defense against trojan attacks on deep neural network systems. In Proc. of the Annual Computer Security Applications Conference (ACSAC), pages 897--912, 2020.
[5]
A.-K. Dombrowski, M. Alber, C. Anders, M. Ackermann, K.-R. Müller, and P. Kessel. Explanations can be manipulated and geometry is to blame. In Proc. of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2019.
[6]
S. Fang and A. Choromanska. Backdoor attacks on the DNN interpretation system. Proc. of the National Conference on Artificial Intelligence (AAAI), 2022.
[7]
J. Heo, S. Joo, and T. Moon. Fooling neural network interpretations via adversarial model manipulation. In Proc. of the Annual Conference on Neural Information Processing Systems (NeurIPS), pages 2921--2932, 2019.
[8]
A. Ivankay, I. Girardi, P. Frossard, and C. Marchiori. Fooling explanations in text classifiers. Proc. of the International Conference on Learning Representations (ICLR), page 13, 2022.
[9]
V. Nair and G. E. Hinton. Rectified linear units improve restricted Boltzmann machines. In Proc. of the International Conference on Machine Learning (ICML), pages 807--814, 2010.
[10]
M. Noppel and C. Wressnegger. SoK: Explainable machine learning in adversarial environments. In Proc. of the IEEE Symposium on Security and Privacy, 2024.
[11]
M. Noppel, L. Peter, and C. Wressnegger. Disguising attacks with explanation-aware backdoors. In Proc. of the IEEE Symposium on Security and Privacy, 2023.
[12]
N. Papernot, P. D. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. In Proc. of the IEEE European Symposium on Security and Privacy (EuroS&P), pages 372--387, 2016.
[13]
N. Papernot, P. McDaniel, A. Sinha, and M. P. Wellman. SoK: Security and privacy in machine learning. In Proc. of the IEEE European Symposium on Security and Privacy (EuroS&P), pages 399--414, 2018.
[14]
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad-CAM: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128(2), 2020.
[15]
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proc. of the International Conference on Learning Representations (ICLR), 2015.
[16]
J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel. Man vs. Computer: Benchmark-ing machine learning algorithms for traffic sign recognition. Neural Networks, 32:323--332, 2012.
[17]
X. Zhang, N. Wang, H. Shen, S. Ji, X. Luo, and T. Wang. Interpretable deep learning under fire. In Proc. of the USENIX Security Symposium, 2020.

Index Terms

  1. Poster: Fooling XAI with Explanation-Aware Backdoors

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CCS '23: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security
      November 2023
      3722 pages
      ISBN:9798400700507
      DOI:10.1145/3576915
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 November 2023

      Check for updates

      Qualifiers

      • Poster

      Funding Sources

      • Helmholtz Association (HGF)
      • German Federal Ministry of Education and Research (BMBF)

      Conference

      CCS '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

      Upcoming Conference

      CCS '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 156
        Total Downloads
      • Downloads (Last 12 months)88
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 15 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media