Skip to main content

Evaluating Defensive Distillation for Defending Text Processing Neural Networks Against Adversarial Examples

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2019: Image Processing (ICANN 2019)

Abstract

Adversarial examples are artificially modified input samples which lead to misclassifications, while not being detectable by humans. These adversarial examples are a challenge for many tasks such as image and text classification, especially as research shows that many adversarial examples are transferable between different classifiers. In this work, we evaluate the performance of a popular defensive strategy for adversarial examples called defensive distillation, which can be successful in hardening neural networks against adversarial examples in the image domain. However, instead of applying defensive distillation to networks for image classification, we examine, for the first time, its performance on text classification tasks and also evaluate its effect on the transferability of adversarial text examples. Our results indicate that defensive distillation only has a minimal impact on text classifying neural networks and does neither help with increasing their robustness against adversarial examples nor prevent the transferability of adversarial examples between neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The software used for the experiments can be found online at https://github.com/Top-Ranger/text_adversarial_attack.

References

  1. AG’s corpus of news articles. http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html. Accessed 27 Oct 2017

  2. Akhtar, N., Mian, A.: Threat of adversarial attacks on deep learning in computer vision: a survey. IEEE Access 6, 14410–14430 (2018). https://doi.org/10.1109/ACCESS.2018.2807385

    Article  Google Scholar 

  3. Brendel, W., Bethge, M.: Comment on “biologically inspired protection of deep networks from adversarial attacks”. CoRR abs/1704.01547 (2017)

    Google Scholar 

  4. Carlini, N., Wagner, D.A.: Defensive distillation is not robust to adversarial examples. CoRR abs/1607.04311 (2016)

    Google Scholar 

  5. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  6. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (2015)

    Google Scholar 

  7. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NIPS Deep Learning Workshop (2014)

    Google Scholar 

  8. Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 2021–2031 (2017). https://doi.org/10.18653/v1/D17-1215

  9. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1746–1751 (2014)

    Google Scholar 

  10. Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection. http://snap.stanford.edu/data (2014)

  11. Liang, B., Li, H., Su, M., Bian, P., Li, X., Shi, W.: Deep text classification can be fooled. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 4208–4215 (2018). https://doi.org/10.24963/ijcai.2018/585

  12. Lu, J., Issaranon, T., Forsyth, D.: SafetyNet: detecting and rejecting adversarial examples robustly. In: IEEE International Conference on Computer Vision, pp. 446–454 (2017). https://doi.org/10.1109/ICCV.2017.56

  13. McAuley, J.J., Leskovec, J.: From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In: Proceedings of the International Conference on World Wide Web, pp. 897–908 (2013). https://doi.org/10.1145/2488388.2488466

  14. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations (2013)

    Google Scholar 

  15. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  16. Mitton, R.: Corpora of misspellings for download. http://www.dcs.bbk.ac.uk/~ROGER/corpora.html. Accessed 10 Nov 2017

  17. Nayebi, A., Ganguli, S.: Biologically inspired protection of deep networks from adversarial attacks. CoRR abs/1703.09202 (2017)

    Google Scholar 

  18. Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: IEEE Symposium on Security and Privacy, pp. 582–597 (2016). https://doi.org/10.1109/SP.2016.41

  19. Papernot, N., McDaniel, P.D.: Extending defensive distillation. CoRR abs/1705.05264 (2017)

    Google Scholar 

  20. Papernot, N., McDaniel, P.D., Sinha, A., Wellman, M.P.: Towards the science of security and privacy in machine learning. CoRR abs/1611.03814 (2016)

    Google Scholar 

  21. Rozsa, A., Gunther, M., Boult, T.E.: Towards robust deep neural networks with bang. In: IEEE Winter Conference on Applications of Computer Vision, pp. 803–811 (2018). https://doi.org/10.1109/WACV.2018.00093

  22. Samanta, S., Mehta, S.: Towards crafting text adversarial samples. CoRR abs/1707.02812 (2017)

    Google Scholar 

  23. Su, J., Vargas, D.V., Sakurai, K.: One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comput. (2019). https://doi.org/10.1109/TEVC.2019.2890858

    Article  Google Scholar 

  24. Szegedy, C., et al.: Intriguing properties of neural networks. In: International Conference on Learning Representations (2014)

    Google Scholar 

  25. Tramèr, F., Kurakin, A., Papernot, N., Boneh, D., McDaniel, P.: Ensemble adversarial training: attacks and defenses. In: International Conference on Learning Representations (2018)

    Google Scholar 

  26. Tramèr, F., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.: The space of transferable adversarial examples. CoRR abs/1704.03453 (2017)

    Google Scholar 

  27. Zhang, W.E., Sheng, Q.Z., Alhazmi, A.A.F., Li, C.: Generating textual adversarial examples for deep learning models: a survey. CoRR abs/1901.06796 (2019)

    Google Scholar 

  28. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)

    Google Scholar 

  29. Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In: Proceedings of the International Joint Conference on Natural Language Processing, pp. 253–263 (2017)

    Google Scholar 

Download references

Acknowledgments

The authors gratefully acknowledge partial support from the German Research Foundation DFG under project CML (TRR 169) and the European Union under project SECURE (No 642667). The following software libraries were used for this work: Keras, Tensorflow, Gensim, NLTK with the WordNet interface, and NumPy.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcus Soll .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Soll, M., Hinz, T., Magg, S., Wermter, S. (2019). Evaluating Defensive Distillation for Defending Text Processing Neural Networks Against Adversarial Examples. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Image Processing. ICANN 2019. Lecture Notes in Computer Science(), vol 11729. Springer, Cham. https://doi.org/10.1007/978-3-030-30508-6_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30508-6_54

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30507-9

  • Online ISBN: 978-3-030-30508-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics