BAARD: Blocking Adversarial Examples by Testing for Applicability, Reliability and Decidability

Chang, Xinglong; Dost, Katharina; Zhao, Kaiqi; Demontis, Ambra; Roli, Fabio; Dobbie, Gillian; Wicker, Jörg

doi:10.1007/978-3-031-33374-3_1

Xinglong Chang¹⁰,
Katharina Dost¹⁰,
Kaiqi Zhao¹⁰,
Ambra Demontis¹¹,
Fabio Roli¹²,
Gillian Dobbie¹⁰ &
…
Jörg Wicker¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13935))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1586 Accesses
1 Altmetric

Abstract

Adversarial defenses protect machine learning models from adversarial attacks, but are often tailored to one type of model or attack. The lack of information on unknown potential attacks makes detecting adversarial examples challenging. Additionally, attackers do not need to follow the rules made by the defender. To address this problem, we take inspiration from the concept of Applicability Domain in cheminformatics. Cheminformatics models struggle to make accurate predictions because only a limited number of compounds are known and available for training. Applicability Domain defines a domain based on the known compounds and rejects any unknown compound that falls outside the domain. Similarly, adversarial examples start as harmless inputs, but can be manipulated to evade reliable classification by moving outside the domain of the classifier. We are the first to identify the similarity between Applicability Domain and adversarial detection. Instead of focusing on unknown attacks, we focus on what is known, the training data. We propose a simple yet robust triple-stage data-driven framework that checks the input globally and locally, and confirms that they are coherent with the model’s output. This framework can be applied to any classification model and is not limited to specific attacks. We demonstrate these three stages work as one unit, effectively detecting various attacks, even for a white-box scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Too Big to FAIL: What You Need to Know Before Attacking a Machine Learning System

When Should You Defend Your Classifier?

How to Compare Adversarial Robustness of Classifiers from a Global Perspective

Notes

1.
Source: https://archive.ics.uci.edu/ml.

References

Alvarsson, J., McShane, S.A., Norinder, U., Spjuth, O.: Predicting with confidence: using conformal prediction in drug discovery. J. Pharm. Sci. 110(1), 42–49 (2021)
Article Google Scholar
Brendel, W., Rauber, J., Bethge, M.: Decision-based adversarial attacks: reliable attacks against black-box machine learning models. In: ICLR (2018)
Google Scholar
Cao, X., Gong, N.Z.: Mitigating evasion attacks to deep neural networks via region-based classification. In: ACSAC, pp. 278–287 (2017)
Google Scholar
Carlini, N., Wagner, D.: Adversarial examples are not easily detected: bypassing ten detection methods. In: AISec, pp. 3–14 (2017)
Google Scholar
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: IEEE SSP, pp. 39–57 (2017)
Google Scholar
Croce, F., Hein, M.: Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: ICML, pp. 2206–2216. PMLR (2020)
Google Scholar
Demontis, A., et al.: Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks. In: USENIX Security, pp. 321–338 (2019)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015)
Google Scholar
Hanser, T., Barber, C., Marchaland, J., Werner, S.: Applicability domain: towards a more formal definition. SAR QSAR Environ. Res. 27(11), 865–881 (2016)
Article Google Scholar
He, W., Wei, J., Chen, X., Carlini, N., Song, D.: Adversarial example defenses: ensembles of weak defenses are not strong. In: USENIX WOOT, pp. 15–15 (2017)
Google Scholar
Hu, S., Yu, T., Guo, C., Chao, W.L., Weinberger, K.Q.: A new defense against adversarial images: turning a weakness into a strength. In: NIPS 32 (2019)
Google Scholar
Kloukiniotis, A., Papandreou, A., Lalos, A., Kapsalas, P., Nguyen, D.V., Moustakas, K.: Countering adversarial attacks on autonomous vehicles using denoising techniques: a review. In: IEEE OJ-ITS (2022)
Google Scholar
Luo, W., Wu, C., Ni, L., Zhou, N., Zhang, Z.: Detecting adversarial examples by positive and negative representations. ASC 117, 108383 (2022)
Google Scholar
Ma, X., et al.: Characterizing adversarial subspaces using local intrinsic dimensionality. In: ICLR (2018)
Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: ICLR (2018)
Google Scholar
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: CVPR, pp. 2574–2582. IEEE (2016)
Google Scholar
Netzeva, T.I., et al.: Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships: the report and recommendations of ECVAM workshop 52. ATLA 33(2), 155–173 (2005)
Google Scholar
Papernot, N., McDaniel, P., Goodfellow, I.: Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016)
Roth, K., Kilcher, Y., Hofmann, T.: The odds are odd: a statistical test for detecting adversarial examples. In: ICML, pp. 5498–5507. PMLR (2019)
Google Scholar
Tramer, F.: Detecting adversarial examples is (nearly) as hard as classifying them. In: ICML, pp. 21692–21702. PMLR (2022)
Google Scholar
Tramer, F., Carlini, N., Brendel, W., Madry, A.: On adaptive attacks to adversarial example defenses. NIPS 33, 1633–1645 (2020)
Google Scholar
Xu, W., Evans, D., Qi, Y.: Feature squeezing: detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017)
Yang, P., Chen, J., Hsieh, C.J., Wang, J.L., Jordan, M.: ML-LOO: detecting adversarial examples with feature attribution. In: AAAI, vol. 34, pp. 6639–6647 (2020)
Google Scholar

Download references

Acknowledgements

The authors wish to acknowledge the use of New Zealand eScience Infrastructure (NeSI) national facilities - https://www.nesi.org.nz.

Author information

Authors and Affiliations

The University of Auckland, Auckland, New Zealand
Xinglong Chang, Katharina Dost, Kaiqi Zhao, Gillian Dobbie & Jörg Wicker
University of Cagliari, Cagliari, Italy
Ambra Demontis
University of Genoa, Genoa, Italy
Fabio Roli

Authors

Xinglong Chang
View author publications
You can also search for this author in PubMed Google Scholar
Katharina Dost
View author publications
You can also search for this author in PubMed Google Scholar
Kaiqi Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Ambra Demontis
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Roli
View author publications
You can also search for this author in PubMed Google Scholar
Gillian Dobbie
View author publications
You can also search for this author in PubMed Google Scholar
Jörg Wicker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinglong Chang .

Editor information

Editors and Affiliations

Kyoto University, Kyoto, Japan
Hisashi Kashima
IBM Research, Thomas J. Watson Research Center, Yorktown Heights, NY, USA
Tsuyoshi Ide
National Chiao Tung University, Hsinchu, Taiwan
Wen-Chih Peng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chang, X. et al. (2023). BAARD: Blocking Adversarial Examples by Testing for Applicability, Reliability and Decidability. In: Kashima, H., Ide, T., Peng, WC. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13935. Springer, Cham. https://doi.org/10.1007/978-3-031-33374-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-33374-3_1
Published: 27 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33373-6
Online ISBN: 978-3-031-33374-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

BAARD: Blocking Adversarial Examples by Testing for Applicability, Reliability and Decidability

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Too Big to FAIL: What You Need to Know Before Attacking a Machine Learning System

When Should You Defend Your Classifier?

How to Compare Adversarial Robustness of Classifiers from a Global Perspective

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

BAARD: Blocking Adversarial Examples by Testing for Applicability, Reliability and Decidability

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Too Big to FAIL: What You Need to Know Before Attacking a Machine Learning System

When Should You Defend Your Classifier?

How to Compare Adversarial Robustness of Classifiers from a Global Perspective

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation