Evaluating Defensive Distillation for Defending Text Processing Neural Networks Against Adversarial Examples

Soll, Marcus; Hinz, Tobias; Magg, Sven; Wermter, Stefan

doi:10.1007/978-3-030-30508-6_54

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11729))

Included in the following conference series:

International Conference on Artificial Neural Networks

2975 Accesses

Abstract

Adversarial examples are artificially modified input samples which lead to misclassifications, while not being detectable by humans. These adversarial examples are a challenge for many tasks such as image and text classification, especially as research shows that many adversarial examples are transferable between different classifiers. In this work, we evaluate the performance of a popular defensive strategy for adversarial examples called defensive distillation, which can be successful in hardening neural networks against adversarial examples in the image domain. However, instead of applying defensive distillation to networks for image classification, we examine, for the first time, its performance on text classification tasks and also evaluate its effect on the transferability of adversarial text examples. Our results indicate that defensive distillation only has a minimal impact on text classifying neural networks and does neither help with increasing their robustness against adversarial examples nor prevent the transferability of adversarial examples between neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Supervised contrastive learning for robust text adversarial training

Article 23 December 2022

Adversarial Attack on Scene Text Recognition Based on Adversarial Networks

Evaluating Adversarial Robustness on Document Image Classification

Notes

1.
The software used for the experiments can be found online at https://github.com/Top-Ranger/text_adversarial_attack.

References

AG’s corpus of news articles. http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html. Accessed 27 Oct 2017
Akhtar, N., Mian, A.: Threat of adversarial attacks on deep learning in computer vision: a survey. IEEE Access 6, 14410–14430 (2018). https://doi.org/10.1109/ACCESS.2018.2807385
Article Google Scholar
Brendel, W., Bethge, M.: Comment on “biologically inspired protection of deep networks from adversarial attacks”. CoRR abs/1704.01547 (2017)
Google Scholar
Carlini, N., Wagner, D.A.: Defensive distillation is not robust to adversarial examples. CoRR abs/1607.04311 (2016)
Google Scholar
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (2015)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NIPS Deep Learning Workshop (2014)
Google Scholar
Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 2021–2031 (2017). https://doi.org/10.18653/v1/D17-1215
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1746–1751 (2014)
Google Scholar
Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection. http://snap.stanford.edu/data (2014)
Liang, B., Li, H., Su, M., Bian, P., Li, X., Shi, W.: Deep text classification can be fooled. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 4208–4215 (2018). https://doi.org/10.24963/ijcai.2018/585
Lu, J., Issaranon, T., Forsyth, D.: SafetyNet: detecting and rejecting adversarial examples robustly. In: IEEE International Conference on Computer Vision, pp. 446–454 (2017). https://doi.org/10.1109/ICCV.2017.56
McAuley, J.J., Leskovec, J.: From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In: Proceedings of the International Conference on World Wide Web, pp. 897–908 (2013). https://doi.org/10.1145/2488388.2488466
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations (2013)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Mitton, R.: Corpora of misspellings for download. http://www.dcs.bbk.ac.uk/~ROGER/corpora.html. Accessed 10 Nov 2017
Nayebi, A., Ganguli, S.: Biologically inspired protection of deep networks from adversarial attacks. CoRR abs/1703.09202 (2017)
Google Scholar
Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: IEEE Symposium on Security and Privacy, pp. 582–597 (2016). https://doi.org/10.1109/SP.2016.41
Papernot, N., McDaniel, P.D.: Extending defensive distillation. CoRR abs/1705.05264 (2017)
Google Scholar
Papernot, N., McDaniel, P.D., Sinha, A., Wellman, M.P.: Towards the science of security and privacy in machine learning. CoRR abs/1611.03814 (2016)
Google Scholar
Rozsa, A., Gunther, M., Boult, T.E.: Towards robust deep neural networks with bang. In: IEEE Winter Conference on Applications of Computer Vision, pp. 803–811 (2018). https://doi.org/10.1109/WACV.2018.00093
Samanta, S., Mehta, S.: Towards crafting text adversarial samples. CoRR abs/1707.02812 (2017)
Google Scholar
Su, J., Vargas, D.V., Sakurai, K.: One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comput. (2019). https://doi.org/10.1109/TEVC.2019.2890858
Article Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. In: International Conference on Learning Representations (2014)
Google Scholar
Tramèr, F., Kurakin, A., Papernot, N., Boneh, D., McDaniel, P.: Ensemble adversarial training: attacks and defenses. In: International Conference on Learning Representations (2018)
Google Scholar
Tramèr, F., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.: The space of transferable adversarial examples. CoRR abs/1704.03453 (2017)
Google Scholar
Zhang, W.E., Sheng, Q.Z., Alhazmi, A.A.F., Li, C.: Generating textual adversarial examples for deep learning models: a survey. CoRR abs/1901.06796 (2019)
Google Scholar
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)
Google Scholar
Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In: Proceedings of the International Joint Conference on Natural Language Processing, pp. 253–263 (2017)
Google Scholar

Download references

Acknowledgments

The authors gratefully acknowledge partial support from the German Research Foundation DFG under project CML (TRR 169) and the European Union under project SECURE (No 642667). The following software libraries were used for this work: Keras, Tensorflow, Gensim, NLTK with the WordNet interface, and NumPy.

Author information

Authors and Affiliations

Knowledge Technology, Department of Informatics, Universität Hamburg, Vogt-Koelln-Str. 30, 22527, Hamburg, Germany
Marcus Soll, Tobias Hinz, Sven Magg & Stefan Wermter

Authors

Marcus Soll
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Hinz
View author publications
You can also search for this author in PubMed Google Scholar
Sven Magg
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Wermter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcus Soll .

Editor information

Editors and Affiliations

Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Igor V. Tetko
Institute of Computer Science, Czech Academy of Sciences, Praha 8, Czech Republic
Věra Kůrková
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Pavel Karpov
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Fabian Theis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Soll, M., Hinz, T., Magg, S., Wermter, S. (2019). Evaluating Defensive Distillation for Defending Text Processing Neural Networks Against Adversarial Examples. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Image Processing. ICANN 2019. Lecture Notes in Computer Science(), vol 11729. Springer, Cham. https://doi.org/10.1007/978-3-030-30508-6_54

Download citation

DOI: https://doi.org/10.1007/978-3-030-30508-6_54
Published: 09 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30507-9
Online ISBN: 978-3-030-30508-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluating Defensive Distillation for Defending Text Processing Neural Networks Against Adversarial Examples

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Supervised contrastive learning for robust text adversarial training

Adversarial Attack on Scene Text Recognition Based on Adversarial Networks

Evaluating Adversarial Robustness on Document Image Classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Evaluating Defensive Distillation for Defending Text Processing Neural Networks Against Adversarial Examples

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Supervised contrastive learning for robust text adversarial training

Adversarial Attack on Scene Text Recognition Based on Adversarial Networks

Evaluating Adversarial Robustness on Document Image Classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation