poster

Poster: Fooling XAI with Explanation-Aware Backdoors

Authors:

Maximilian Noppel,

Christian WressneggerAuthors Info & Claims

CCS '23: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security

Pages 3612 - 3614

https://doi.org/10.1145/3576915.3624379

Published: 21 November 2023 Publication History

Abstract

The overabundance of learnable parameters in recent machine-learning models renders them inscrutable. Even their developers can not explain their exact inner workings anymore. For this reason, researchers have developed explanation algorithms to shed light on a model's decision-making process. Explanations identify the deciding factors for a model's decision. Therefore, much hope is set in explanations to solve problems like biases, spurious correlations, and more prominently attacks like neural backdoors.

In this paper, we present explanation-aware backdoors, which fool both, the model's decisions and the explanation algorithm in the presence of a trigger. Explanation-aware backdoors therefore can bypass explanation-based detection techniques and "throw a red herring" at the human analyst. While we have presented successful explanation-aware backdoors in our original work, "Disguising Attacks with Explanation-Aware Backdoors," in this paper, we provide a brief overview and a focus on the dataset "German Traffic Sign Recognition Benchmark" (GTSRB). We evaluate a different trigger and target explanation compared to the original paper and present results for GradCAM explanations. Supplemental material is publicly available at https://intellisec.de/research/xai-backdoor.

References

[1]

E. Bagdasaryan and V. Shmatikov. Blind backdoors in deep learning models. In Proc. of the USENIX Security Symposium, pages 1505--1521, 2021.

[2]

H. Baniecki and P. Biecek. Adversarial attacks and defenses in explainable artificial intelligence: A survey. In Proc. of the IJCAI Workshop of explainable AI (XAI), 2023.

[3]

E. Chou, F. Tramèr, and G. Pellegrino. SentiNet: Detecting localized universal attacks against deep learning systems. In Proc. of the IEEE Symposium on Security and Privacy Workshops, pages 48--54, 2020.

[4]

B. G. Doan, E. Abbasnejad, and D. C. Ranasinghe. Februus: Input purification defense against trojan attacks on deep neural network systems. In Proc. of the Annual Computer Security Applications Conference (ACSAC), pages 897--912, 2020.

Digital Library

[5]

A.-K. Dombrowski, M. Alber, C. Anders, M. Ackermann, K.-R. Müller, and P. Kessel. Explanations can be manipulated and geometry is to blame. In Proc. of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2019.

[6]

S. Fang and A. Choromanska. Backdoor attacks on the DNN interpretation system. Proc. of the National Conference on Artificial Intelligence (AAAI), 2022.

[7]

J. Heo, S. Joo, and T. Moon. Fooling neural network interpretations via adversarial model manipulation. In Proc. of the Annual Conference on Neural Information Processing Systems (NeurIPS), pages 2921--2932, 2019.

[8]

A. Ivankay, I. Girardi, P. Frossard, and C. Marchiori. Fooling explanations in text classifiers. Proc. of the International Conference on Learning Representations (ICLR), page 13, 2022.

[9]

V. Nair and G. E. Hinton. Rectified linear units improve restricted Boltzmann machines. In Proc. of the International Conference on Machine Learning (ICML), pages 807--814, 2010.

[10]

M. Noppel and C. Wressnegger. SoK: Explainable machine learning in adversarial environments. In Proc. of the IEEE Symposium on Security and Privacy, 2024.

[11]

M. Noppel, L. Peter, and C. Wressnegger. Disguising attacks with explanation-aware backdoors. In Proc. of the IEEE Symposium on Security and Privacy, 2023.

[12]

N. Papernot, P. D. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. In Proc. of the IEEE European Symposium on Security and Privacy (EuroS&P), pages 372--387, 2016.

[13]

N. Papernot, P. McDaniel, A. Sinha, and M. P. Wellman. SoK: Security and privacy in machine learning. In Proc. of the IEEE European Symposium on Security and Privacy (EuroS&P), pages 399--414, 2018.

[14]

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad-CAM: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128(2), 2020.

Digital Library

[15]

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proc. of the International Conference on Learning Representations (ICLR), 2015.

[16]

J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel. Man vs. Computer: Benchmark-ing machine learning algorithms for traffic sign recognition. Neural Networks, 32:323--332, 2012.

Digital Library

[17]

X. Zhang, N. Wang, H. Shen, S. Ji, X. Luo, and T. Wang. Interpretable deep learning under fire. In Proc. of the USENIX Security Symposium, 2020.

Index Terms

Poster: Fooling XAI with Explanation-Aware Backdoors
1. Computing methodologies
  1. Machine learning
2. Security and privacy

Recommendations

Defining Explanation and Explanatory Depth in XAI
Abstract
Explainable artificial intelligence (XAI) aims to help people understand black box algorithms, particularly of their outputs. But what are these explanations and when is one explanation better than another? The manipulationist definition of ...
Towards Causal Explanation Detection with Pyramid Salient-Aware Network
Chinese Computational Linguistics
Abstract
Causal explanation analysis (CEA) can assist us to understand the reasons behind daily events, which has been found very helpful for understanding the coherence of messages. In this paper, we focus on Causal Explanation Detection, an important ...
Backdoors to planning
Abstract
Backdoors measure the distance to tractable fragments and have become an important tool to find fixed-parameter tractable (fpt) algorithms for hard problems in AI and beyond. Despite their success, backdoors have not been used for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CCS '23: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security

November 2023

3722 pages

ISBN:9798400700507

DOI:10.1145/3576915

General Chairs:
Weizhi Meng
Technical University of Denmark
,
Christian D. Jensen
Technical University of Denmark
,
Program Chairs:
Cas Cremers
CISPA Helmholtz Center for Information Security
,
Engin Kirda
Khoury College of Computer Sciences

Copyright © 2023 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 November 2023

Check for updates

Qualifiers

Poster

Funding Sources

Helmholtz Association (HGF)
German Federal Ministry of Education and Research (BMBF)

Conference

CCS '23

Sponsor:

SIGSAC

CCS '23: ACM SIGSAC Conference on Computer and Communications Security

November 26 - 30, 2023

Copenhagen, Denmark

Acceptance Rates

Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
156
Total Downloads

Downloads (Last 12 months)88
Downloads (Last 6 weeks)3

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten