Abstract
Deep learning models suffer from a phenomenon called adversarial attacks: we can apply minor changes to the model input to fool a classifier for a particular example. The literature mostly considers adversarial attacks on models with images and other structured inputs. However, the adversarial attacks for categorical sequences can also be harmful. Successful attacks for inputs in the form of categorical sequences should address the following challenges: (1) non-differentiability of the target function, (2) constraints on transformations of initial sequences, and (3) diversity of possible problems. We handle these challenges using two black-box adversarial attacks. The first approach adopts a Monte-Carlo method and allows usage in any scenario, the second approach uses a continuous relaxation of models and target metrics, and thus allows a usage of state-of-the-art methods for adversarial attacks with little additional effort. Results for money transactions, medical fraud, and NLP datasets suggest that the proposed methods generate reasonable adversarial sequences that are close to original ones, but fool machine learning models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The code is available at
https://github.com/fursovia/dilma/tree/master. The data is available at https://www.dropbox.com/s/axu26guw2a0mwos/adat_datasets.zip?dl=0.
- 2.
References
Yuan, X., He, P., Zhu, Q., Li, X.: Adversarial examples: attacks and defenses for deep learning. IEEE Trans. Neural Netw. Learn. Syst. 30(9), 2805–2824 (2019)
Akhtar, N., Mian, A.: Threat of adversarial attacks on deep learning in computer vision: a survey. IEEE Access 6, 14410–14430 (2018)
Khrulkov, V., Oseledets, I.: Art of singular vectors and universal adversarial perturbations. In: IEEE CVPR, pp. 8562–8570 (2018)
Zhang, W.E., Sheng, Q.Z., Alhazmi, A., Li, C.: Adversarial attacks on deep-learning models in natural language processing: a survey. ACM Trans. Intell. Syst. Technol. (TIST) 11(3), 1–41 (2020)
Wang, W., Tang, B., Wang, R., Wang, L., Ye, A.: A survey on adversarial attacks and defenses in text. arXiv:1902.07285 preprint (2019)
Sun, L., Wang, J., Yu, P.S., Li, B.: Adversarial attack and defense on graph data: a survey. arXiv:1812.10528 preprint (2018)
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial machine learning at scale. In: ICLR (2017)
Samanta, S., Mehta, S.: Towards crafting text adversarial samples. arXiv:1707.02812 preprint (2017)
Liang, B., Li, H., Su, M., Bian, P., Li, X., Shi, W.: Deep text classification can be fooled. In: IJCAI (2017)
Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: Hotflip: white-box adversarial examples for text classification. In: Annual Meeting of ACL, pp. 31–36 (2018)
Sato, M., Suzuki, H., Shindo, J., Matsumoto, Y.: Interpretable adversarial perturbation in input embedding space for text. In: IJCAI (2018)
Moon, S., Neves, L., Carvalho, V.: Multimodal named entity recognition for short social media posts. In: Conference of the North American Chapter ACL: Human Language Technologies, pp. 852–860 (2018)
Bowman, S., Vilnis, L., Vinyals, O., Dai, A., Jozefowicz, R., Bengio, S.: Generating sentences from a continuous space. In: SIGNLL CoNNL, pp. 10–21 (2016)
Szegedy, C., et al.: Intriguing properties of neural networks. In: ICLR (2014)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2014)
Zügner, D., Akbarnejad, A., Günnemann, S.: Adversarial attacks on neural networks for graph data. In: ACM SIGKDD, pp. 2847–2856 (2018)
Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: IEEE MILCOM, pp. 49–54 (2016)
Gao, J., Lanchantin, J., Soffa, M.L., Qi, Y.: Black-box generation of adversarial text sequences to evade deep learning classifiers. In: IEEE Security and Privacy Workshops, pp. 50–56. IEEE (2018)
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In: AAAI (2020)
Fursov, I., Zaytsev, A., Khasyanov, R., Spindler, M., Burnaev, E.: Sequence embeddings help to identify fraudulent cases in healthcare insurance. arXiv:1910.03072 preprint (2019)
Ren, Y., et al: Generating natural language adversarial examples on a large scale with generative models. arXiv preprint arXiv:2003.10388 (2020)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
Li, P., Lam, W., Bing, L., Wang, Z.: Deep recurrent generative decoder for abstractive text summarization. In: EMNLP, pp. 2091–2100 (2017)
Hu, R., Andreas, J., Rohrbach, M.: Learning to reason: end-to-end module networks for visual question answering. In: IEEE ICCV, pp. 804–813 (2017)
Gu, J., Lu, Z., Li, H., Li, V.O.: Incorporating copying mechanism in sequence-to-sequence learning. In: Annual Meeting of ACL, pp. 1631–1640 (2016)
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM (1999)
Graves, A.: Sequence transduction with recurrent neural networks. arXiv:1211.3711 preprint (2012)
Song, K., Tan, X., Qin, T., Lu, J., Liu, T.-Y.: Mass: masked sequence to sequence pre-training for language generation. In: ICML, pp. 5926–5936 (2019)
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Annual Meeting of ACL, pp. 311–318 (2002)
Fursov, I., Zaytsev, A., et al.: Differentiable language model adversarial attacks on categorical sequence classifiers. arXiv:2006.11078 preprint (2020)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: NeurIPS, pp. 649–657 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Fursov, I., Zaytsev, A., Kluchnikov, N., Kravchenko, A., Burnaev, E. (2021). Gradient-Based Adversarial Attacks on Categorical Sequence Models via Traversing an Embedded World. In: van der Aalst, W.M.P., et al. Analysis of Images, Social Networks and Texts. AIST 2020. Lecture Notes in Computer Science(), vol 12602. Springer, Cham. https://doi.org/10.1007/978-3-030-72610-2_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-72610-2_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72609-6
Online ISBN: 978-3-030-72610-2
eBook Packages: Computer ScienceComputer Science (R0)