Abstract
Deep learning (DL) technology has been widely deployed in many fields and achieved great success, but it is not absolutely safe and reliable. It has been proved that research on adversarial attacks can reveal the vulnerability of deep neural networks (DNN). Although many methods of adversarial attack and defense have been proposed in the field of images, the research on textual adversarial samples is still few. It is challenging because text samples are sparse and discrete and the added perturbation might lead to grammatical errors and semantic changes. Thus, there are some special restrictions on textual adversarial samples. We propose a synonyms substitution-based adversarial text generation via Probability Determined Word Saliency (PDWS). In our method PDWS, the word saliency and the optimal substitution word are determined by the optimal replace-ment effect. The replacement effect is the probability change caused by replacing one word with its substitution word. We evaluate our attack method on two popular text classification tasks using CNN and LSTM. The experimental results show that our method gets higher misleading rate and less perturbation rate than the baseline methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: 3rd International Conference on Learning Representations 2015, ICLR, San Diego, USA (2015)
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2015, CVPR, pp. 427–436 (2015)
Wang, W., Wang, L., Tang, B., Wang, R., Ye, A.: Towards a robust deep neural network in text domain a survey. arXiv preprint arXiv:1902.07285 (2019)
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016)
Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.: Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204 (2017)
Dong, Y., et al.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR, pp. 9185–9193 (2018)
Wong, E., Kolter, Z.: Provable defenses against adversarial examples via the convex outer adversarial polytope. In: International Conference on Machine Learning, ICML, pp. 5286–5295 (2018)
Song C., He K., Wang L.: Improving the generalization of adversarial training with domain adaptation. arXiv preprint arXiv:1810.00740 (2019)
Ling, X., Ji, S., Zou, J., Wang, J., Wang, T.: DEEPSEC: a uniform platform for security analysis of deep learning model. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 529–546 (2019)
Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: 2016 IEEE Military Communications Conference, MILCOM, pp. 49–54 (2016)
Samanta, S., Mehta, S.: Towards crafting text adversarial samples. arXiv preprint arXiv:1707.02812 (2017)
Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.J., Chang, K.W.: Generating natural language adversarial examples. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 2890–2896 (2018)
Gao, J., Lanchantin, J., Soffa, M.L., Qi, Y.: Black-box generation of adversarial text sequences to evade deep learning classifiers. In: 2018 IEEE Security and Privacy Workshops, SPW, pp. 50–56 (2018)
Li, J., Ji, S., Du, T., Li, B., Wang, T.: Textbugger: generating adversarial text against real-world applications. arXiv preprint arXiv:1812.05271 (2019)
Ren, S., Deng, Y., He, K., Che, W.: Generating natural language adversarial examples through probability weighted word saliency. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, ACL, pp. 1085–1097 (2019)
Liang, B., Li, H., Su, M., Bian, P., Li, X., Shi, W.: Deep text classification can be fooled. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI, pp. 4208–4215 (2017)
Qi, F., Yang, C., Liu, Z., Dong, Q., Sun, M., Dong, Z.: Openhownet: an open sememe-based lexical knowledge base. arXiv preprint arXiv:1901.09957 (2019)
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 1746–1751 (2014)
Acknowledgements
The work is partially supported by the National Natural Science Foundation of China under Grant 61972148, Beijing Natural Science Foundation under grant 4182060.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ma, G., Shi, L., Guan, Z. (2020). Adversarial Text Generation via Probability Determined Word Saliency. In: Chen, X., Yan, H., Yan, Q., Zhang, X. (eds) Machine Learning for Cyber Security. ML4CS 2020. Lecture Notes in Computer Science(), vol 12487. Springer, Cham. https://doi.org/10.1007/978-3-030-62460-6_50
Download citation
DOI: https://doi.org/10.1007/978-3-030-62460-6_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62459-0
Online ISBN: 978-3-030-62460-6
eBook Packages: Computer ScienceComputer Science (R0)