Skip to main content

MEAD: A Multi-Armed Approach for Evaluation of Adversarial Examples Detectors

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2022)

Abstract

Detection of adversarial examples has been a hot topic in the last years due to its importance for safely deploying machine learning algorithms in critical applications. However, the detection methods are generally validated by assuming a single implicitly known attack strategy, which does not necessarily account for real-life threats. Indeed, this can lead to an overoptimistic assessment of the detectors’ performance and may induce some bias in the comparison between competing detection schemes. We propose a novel multi-armed framework, called MEAD, for evaluating detectors based on several attack strategies to overcome this limitation. Among them, we make use of three new objectives to generate attacks. The proposed performance metric is based on the worst-case scenario: detection is successful if and only if all different attacks are correctly recognized. Empirically, we show the effectiveness of our approach. Moreover, the poor performance obtained for state-of-the-art detectors opens a new exciting line of research.

F. Granese and M. Picot—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Throughout the paper, when the values of \(\varepsilon \) and p are clear from the context, we denote the attack mechanism as \(a_\ell (\cdot )\).

  2. 2.

    With an abuse of notation, \(\forall \ell \in \mathcal {L}\) stands for all the considered attack mechanisms for specific values of \(\varepsilon \), p within a collection of objectives \(\mathcal {L}\).

References

  1. Aldahdooh, A., Hamidouche, W., Déforges, O.: Revisiting model’s uncertainty and confidences for adversarial example detection. Appl. Intell. 53, 509–531 (2021)

    Article  Google Scholar 

  2. Aldahdooh, A., Hamidouche, W., Fezza, S.A., Déforges, O.: Adversarial example detection for DNN models: a review. CoRR abs/2105.00203 (2021)

    Google Scholar 

  3. Andriushchenko, M., Croce, F., Flammarion, N., Hein, M.: Square attack: a query-efficient black-box adversarial attack via random search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 484–501. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_29

    Chapter  Google Scholar 

  4. Athalye, A., Carlini, N., Wagner, D.A.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018. Proceedings of Machine Learning Research, vol. 80, pp. 274–283. PMLR (2018)

    Google Scholar 

  5. Atkinson, C., Mitchell, A.F.S.: Rao’s distance measure. Sankhyā: the Indian J. Stat. Series A (1961–2002), 43(3), 345–365 (1981)

    Google Scholar 

  6. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017)

    Google Scholar 

  7. Carrara, F., Becarelli, R., Caldelli, R., Falchi, F., Amato, G.: Adversarial examples detection in features distance spaces. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11130, pp. 313–327. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11012-3_26

    Chapter  Google Scholar 

  8. Chen, J., Jordan, M.I., Wainwright, M.J.: Hopskipjumpattack: a query-efficient decision-based attack. In: 2020 IEEE Symposium on Security and Privacy (SP), pp. 1277–1294. IEEE (2020)

    Google Scholar 

  9. Croce, F., et al.: Robustbench: a standardized adversarial robustness benchmark. CoRR abs/2010.09670 (2020)

    Google Scholar 

  10. Croce, F., Hein, M.: Minimally distorted adversarial examples with a fast adaptive boundary attack. In: International Conference on Machine Learning, pp. 2196–2205. PMLR (2020)

    Google Scholar 

  11. Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006)

    Google Scholar 

  12. Engstrom, L., Tran, B., Tsipras, D., Schmidt, L., Madry, A.: Exploring the landscape of spatial robustness. In: International Conference on Machine Learning, pp. 1802–1811. PMLR (2019)

    Google Scholar 

  13. Feinman, R., Curtin, R.R., Shintre, S., Gardner, A.B.: Detecting adversarial samples from artifacts. CoRR abs/1703.00410 (2017)

    Google Scholar 

  14. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (2015)

    Google Scholar 

  15. Granese, F., Romanelli, M., Gorla, D., Palamidessi, C., Piantanida, P.: DOCTOR: a simple method for detecting misclassification errors. In: Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, 6–14 December 2021, virtual, pp. 5669–5681 (2021)

    Google Scholar 

  16. Grosse, K., Manoharan, P., Papernot, N., Backes, M., McDaniel, P.D.: On the (statistical) detection of adversarial examples. CoRR abs/1702.06280 (2017)

    Google Scholar 

  17. Kherchouche, A., Fezza, S.A., Hamidouche, W., Déforges, O.: Natural scene statistics for detecting adversarial examples in deep neural networks. In: 22nd IEEE International Workshop on Multimedia Signal Processing, MMSP 2020, Tampere, Finland, 21–24 September 2020, pp. 1–6. IEEE (2020)

    Google Scholar 

  18. Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report (2009)

    Google Scholar 

  19. Kurakin, A., Goodfellow, I., Bengio, S., et al.: Adversarial examples in the physical world (2016)

    Google Scholar 

  20. LeCun, Y., Cortes, C., Burges, C.: MNIST handwritten digit database (2010)

    Google Scholar 

  21. Li, X., Li, F.: Adversarial examples detection in deep networks with convolutional filter statistics. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017, pp. 5775–5783. IEEE Computer Society (2017)

    Google Scholar 

  22. Liang, B., Li, H., Su, M., Li, X., Shi, W., Wang, X.: Detecting adversarial image examples in deep neural networks with adaptive noise reduction. IEEE Trans. Dependable Secur. Comput. 18(1), 72–85 (2021)

    Article  Google Scholar 

  23. Lu, J., Issaranon, T., Forsyth, D.A.: SafetyNet: detecting and rejecting adversarial examples robustly. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017, pp. 446–454. IEEE Computer Society (2017)

    Google Scholar 

  24. Ma, S., Liu, Y., Tao, G., Lee, W., Zhang, X.: NIC: detecting adversarial samples with neural network invariant checking. In: 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, 24–27 February 2019. The Internet Society (2019)

    Google Scholar 

  25. Ma, X., et al.: Characterizing adversarial subspaces using local intrinsic dimensionality. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3 2018, Conference Track Proceedings. OpenReview.net (2018)

    Google Scholar 

  26. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: International Conference on Learning Representations (2018)

    Google Scholar 

  27. Meng, D., Chen, H.: MagNet: a two-pronged defense against adversarial examples. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, October 30–November 03 2017, pp. 135–147. ACM (2017)

    Google Scholar 

  28. Metzen, J.H., Genewein, T., Fischer, V., Bischoff, B.: On detecting adversarial perturbations. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017, Conference Track Proceedings. OpenReview.net (2017)

    Google Scholar 

  29. Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)

    Google Scholar 

  30. Pang, T., Du, C., Dong, Y., Zhu, J.: Towards robust detection of adversarial examples. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3–8 December 2018, Montréal, Canada, pp. 4584–4594 (2018)

    Google Scholar 

  31. Picot, M., Messina, F., Boudiaf, M., Labeau, F., Ben Ayed, I., Piantanida, P.: Adversarial robustness via fisher-Rao regularization. IEEE Trans. Pattern Anal. Mach. Intell. 45, 1–1 (2022)

    Google Scholar 

  32. Sotgiu, A., Demontis, A., Melis, M., Biggio, B., Fumera, G., Feng, X., Roli, F.: Deep neural rejection against adversarial examples. EURASIP J. Inf. Secur. 2020(1), 1–10 (2020). https://doi.org/10.1186/s13635-020-00105-y

    Article  Google Scholar 

  33. Szegedy, C., et al.: Intriguing properties of neural networks. In: International Conference on Learning Representations (2014)

    Google Scholar 

  34. Szegedy, C., et al.: Intriguing properties of neural networks. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014, Conference Track Proceedings (2014)

    Google Scholar 

  35. Tao, G., Ma, S., Liu, Y., Zhang, X.: Attacks meet interpretability: Attribute-steered detection of adversarial samples. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3–8 December 2018, Montréal, Canada, pp. 7728–7739 (2018)

    Google Scholar 

  36. Xu, W., Evans, D., Qi, Y.: Feature squeezing: detecting adversarial examples in deep neural networks. In: 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, 18–21 February 2018. The Internet Society (2018)

    Google Scholar 

  37. Zhang, H., Yu, Y., Jiao, J., Xing, E.P., Ghaoui, L.E., Jordan, M.I.: Theoretically principled trade-off between robustness and accuracy. In: International Conference on Machine Learning, pp. 1–11 (2019)

    Google Scholar 

  38. Zheng, S., Song, Y., Leung, T., Goodfellow, I.J.: Improving the robustness of deep neural networks via stability training. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 4480–4488. IEEE Computer Society (2016)

    Google Scholar 

  39. Zheng, T., Chen, C., Ren, K.: Distributionally adversarial attack. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27–February 1 2019, pp. 2253–2260. AAAI Press (2019)

    Google Scholar 

  40. Zheng, Z., Hong, P.: Robust detection of adversarial attacks by modeling the intrinsic properties of deep neural networks. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3–8 December 2018, Montréal, Canada, pp. 7924–7933 (2018)

    Google Scholar 

Download references

Acknowledgements

The work of Federica Granese was supported by the European Research Council (ERC) project HYPATIA under the European Union’s Horizon 2020 research and innovation program. Grant agreement №835294. This work has been supported by the project PSPC AIDA: 2019-PSPC-09 funded by BPI-France. This work was performed using HPC resources from GENCI-IDRIS (Grant 2022-[AD011012352R1]) and thanks to the Saclay-IA computing platform.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Federica Granese .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 422 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Granese, F., Picot, M., Romanelli, M., Messina, F., Piantanida, P. (2023). MEAD: A Multi-Armed Approach for Evaluation of Adversarial Examples Detectors. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13715. Springer, Cham. https://doi.org/10.1007/978-3-031-26409-2_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26409-2_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26408-5

  • Online ISBN: 978-3-031-26409-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics