Skip to main content

BAARD: Blocking Adversarial Examples by Testing for Applicability, Reliability and Decidability

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13935))

Included in the following conference series:

Abstract

Adversarial defenses protect machine learning models from adversarial attacks, but are often tailored to one type of model or attack. The lack of information on unknown potential attacks makes detecting adversarial examples challenging. Additionally, attackers do not need to follow the rules made by the defender. To address this problem, we take inspiration from the concept of Applicability Domain in cheminformatics. Cheminformatics models struggle to make accurate predictions because only a limited number of compounds are known and available for training. Applicability Domain defines a domain based on the known compounds and rejects any unknown compound that falls outside the domain. Similarly, adversarial examples start as harmless inputs, but can be manipulated to evade reliable classification by moving outside the domain of the classifier. We are the first to identify the similarity between Applicability Domain and adversarial detection. Instead of focusing on unknown attacks, we focus on what is known, the training data. We propose a simple yet robust triple-stage data-driven framework that checks the input globally and locally, and confirms that they are coherent with the model’s output. This framework can be applied to any classification model and is not limited to specific attacks. We demonstrate these three stages work as one unit, effectively detecting various attacks, even for a white-box scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Source: https://archive.ics.uci.edu/ml.

References

  1. Alvarsson, J., McShane, S.A., Norinder, U., Spjuth, O.: Predicting with confidence: using conformal prediction in drug discovery. J. Pharm. Sci. 110(1), 42–49 (2021)

    Article  Google Scholar 

  2. Brendel, W., Rauber, J., Bethge, M.: Decision-based adversarial attacks: reliable attacks against black-box machine learning models. In: ICLR (2018)

    Google Scholar 

  3. Cao, X., Gong, N.Z.: Mitigating evasion attacks to deep neural networks via region-based classification. In: ACSAC, pp. 278–287 (2017)

    Google Scholar 

  4. Carlini, N., Wagner, D.: Adversarial examples are not easily detected: bypassing ten detection methods. In: AISec, pp. 3–14 (2017)

    Google Scholar 

  5. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: IEEE SSP, pp. 39–57 (2017)

    Google Scholar 

  6. Croce, F., Hein, M.: Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: ICML, pp. 2206–2216. PMLR (2020)

    Google Scholar 

  7. Demontis, A., et al.: Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks. In: USENIX Security, pp. 321–338 (2019)

    Google Scholar 

  8. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015)

    Google Scholar 

  9. Hanser, T., Barber, C., Marchaland, J., Werner, S.: Applicability domain: towards a more formal definition. SAR QSAR Environ. Res. 27(11), 865–881 (2016)

    Article  Google Scholar 

  10. He, W., Wei, J., Chen, X., Carlini, N., Song, D.: Adversarial example defenses: ensembles of weak defenses are not strong. In: USENIX WOOT, pp. 15–15 (2017)

    Google Scholar 

  11. Hu, S., Yu, T., Guo, C., Chao, W.L., Weinberger, K.Q.: A new defense against adversarial images: turning a weakness into a strength. In: NIPS 32 (2019)

    Google Scholar 

  12. Kloukiniotis, A., Papandreou, A., Lalos, A., Kapsalas, P., Nguyen, D.V., Moustakas, K.: Countering adversarial attacks on autonomous vehicles using denoising techniques: a review. In: IEEE OJ-ITS (2022)

    Google Scholar 

  13. Luo, W., Wu, C., Ni, L., Zhou, N., Zhang, Z.: Detecting adversarial examples by positive and negative representations. ASC 117, 108383 (2022)

    Google Scholar 

  14. Ma, X., et al.: Characterizing adversarial subspaces using local intrinsic dimensionality. In: ICLR (2018)

    Google Scholar 

  15. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: ICLR (2018)

    Google Scholar 

  16. Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: CVPR, pp. 2574–2582. IEEE (2016)

    Google Scholar 

  17. Netzeva, T.I., et al.: Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships: the report and recommendations of ECVAM workshop 52. ATLA 33(2), 155–173 (2005)

    Google Scholar 

  18. Papernot, N., McDaniel, P., Goodfellow, I.: Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016)

  19. Roth, K., Kilcher, Y., Hofmann, T.: The odds are odd: a statistical test for detecting adversarial examples. In: ICML, pp. 5498–5507. PMLR (2019)

    Google Scholar 

  20. Tramer, F.: Detecting adversarial examples is (nearly) as hard as classifying them. In: ICML, pp. 21692–21702. PMLR (2022)

    Google Scholar 

  21. Tramer, F., Carlini, N., Brendel, W., Madry, A.: On adaptive attacks to adversarial example defenses. NIPS 33, 1633–1645 (2020)

    Google Scholar 

  22. Xu, W., Evans, D., Qi, Y.: Feature squeezing: detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017)

  23. Yang, P., Chen, J., Hsieh, C.J., Wang, J.L., Jordan, M.: ML-LOO: detecting adversarial examples with feature attribution. In: AAAI, vol. 34, pp. 6639–6647 (2020)

    Google Scholar 

Download references

Acknowledgements

The authors wish to acknowledge the use of New Zealand eScience Infrastructure (NeSI) national facilities - https://www.nesi.org.nz.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinglong Chang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chang, X. et al. (2023). BAARD: Blocking Adversarial Examples by Testing for Applicability, Reliability and Decidability. In: Kashima, H., Ide, T., Peng, WC. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13935. Springer, Cham. https://doi.org/10.1007/978-3-031-33374-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-33374-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-33373-6

  • Online ISBN: 978-3-031-33374-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics