Abstract
Adversarial defenses protect machine learning models from adversarial attacks, but are often tailored to one type of model or attack. The lack of information on unknown potential attacks makes detecting adversarial examples challenging. Additionally, attackers do not need to follow the rules made by the defender. To address this problem, we take inspiration from the concept of Applicability Domain in cheminformatics. Cheminformatics models struggle to make accurate predictions because only a limited number of compounds are known and available for training. Applicability Domain defines a domain based on the known compounds and rejects any unknown compound that falls outside the domain. Similarly, adversarial examples start as harmless inputs, but can be manipulated to evade reliable classification by moving outside the domain of the classifier. We are the first to identify the similarity between Applicability Domain and adversarial detection. Instead of focusing on unknown attacks, we focus on what is known, the training data. We propose a simple yet robust triple-stage data-driven framework that checks the input globally and locally, and confirms that they are coherent with the model’s output. This framework can be applied to any classification model and is not limited to specific attacks. We demonstrate these three stages work as one unit, effectively detecting various attacks, even for a white-box scenario.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Source: https://archive.ics.uci.edu/ml.
References
Alvarsson, J., McShane, S.A., Norinder, U., Spjuth, O.: Predicting with confidence: using conformal prediction in drug discovery. J. Pharm. Sci. 110(1), 42–49 (2021)
Brendel, W., Rauber, J., Bethge, M.: Decision-based adversarial attacks: reliable attacks against black-box machine learning models. In: ICLR (2018)
Cao, X., Gong, N.Z.: Mitigating evasion attacks to deep neural networks via region-based classification. In: ACSAC, pp. 278–287 (2017)
Carlini, N., Wagner, D.: Adversarial examples are not easily detected: bypassing ten detection methods. In: AISec, pp. 3–14 (2017)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: IEEE SSP, pp. 39–57 (2017)
Croce, F., Hein, M.: Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: ICML, pp. 2206–2216. PMLR (2020)
Demontis, A., et al.: Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks. In: USENIX Security, pp. 321–338 (2019)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015)
Hanser, T., Barber, C., Marchaland, J., Werner, S.: Applicability domain: towards a more formal definition. SAR QSAR Environ. Res. 27(11), 865–881 (2016)
He, W., Wei, J., Chen, X., Carlini, N., Song, D.: Adversarial example defenses: ensembles of weak defenses are not strong. In: USENIX WOOT, pp. 15–15 (2017)
Hu, S., Yu, T., Guo, C., Chao, W.L., Weinberger, K.Q.: A new defense against adversarial images: turning a weakness into a strength. In: NIPS 32 (2019)
Kloukiniotis, A., Papandreou, A., Lalos, A., Kapsalas, P., Nguyen, D.V., Moustakas, K.: Countering adversarial attacks on autonomous vehicles using denoising techniques: a review. In: IEEE OJ-ITS (2022)
Luo, W., Wu, C., Ni, L., Zhou, N., Zhang, Z.: Detecting adversarial examples by positive and negative representations. ASC 117, 108383 (2022)
Ma, X., et al.: Characterizing adversarial subspaces using local intrinsic dimensionality. In: ICLR (2018)
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: ICLR (2018)
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: CVPR, pp. 2574–2582. IEEE (2016)
Netzeva, T.I., et al.: Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships: the report and recommendations of ECVAM workshop 52. ATLA 33(2), 155–173 (2005)
Papernot, N., McDaniel, P., Goodfellow, I.: Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016)
Roth, K., Kilcher, Y., Hofmann, T.: The odds are odd: a statistical test for detecting adversarial examples. In: ICML, pp. 5498–5507. PMLR (2019)
Tramer, F.: Detecting adversarial examples is (nearly) as hard as classifying them. In: ICML, pp. 21692–21702. PMLR (2022)
Tramer, F., Carlini, N., Brendel, W., Madry, A.: On adaptive attacks to adversarial example defenses. NIPS 33, 1633–1645 (2020)
Xu, W., Evans, D., Qi, Y.: Feature squeezing: detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017)
Yang, P., Chen, J., Hsieh, C.J., Wang, J.L., Jordan, M.: ML-LOO: detecting adversarial examples with feature attribution. In: AAAI, vol. 34, pp. 6639–6647 (2020)
Acknowledgements
The authors wish to acknowledge the use of New Zealand eScience Infrastructure (NeSI) national facilities - https://www.nesi.org.nz.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chang, X. et al. (2023). BAARD: Blocking Adversarial Examples by Testing for Applicability, Reliability and Decidability. In: Kashima, H., Ide, T., Peng, WC. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13935. Springer, Cham. https://doi.org/10.1007/978-3-031-33374-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-33374-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33373-6
Online ISBN: 978-3-031-33374-3
eBook Packages: Computer ScienceComputer Science (R0)