Skip to main content

Gradient-Based Adversarial Attacks on Categorical Sequence Models via Traversing an Embedded World

  • Conference paper
  • First Online:
Analysis of Images, Social Networks and Texts (AIST 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12602))

  • 948 Accesses

Abstract

Deep learning models suffer from a phenomenon called adversarial attacks: we can apply minor changes to the model input to fool a classifier for a particular example. The literature mostly considers adversarial attacks on models with images and other structured inputs. However, the adversarial attacks for categorical sequences can also be harmful. Successful attacks for inputs in the form of categorical sequences should address the following challenges: (1) non-differentiability of the target function, (2) constraints on transformations of initial sequences, and (3) diversity of possible problems. We handle these challenges using two black-box adversarial attacks. The first approach adopts a Monte-Carlo method and allows usage in any scenario, the second approach uses a continuous relaxation of models and target metrics, and thus allows a usage of state-of-the-art methods for adversarial attacks with little additional effort. Results for money transactions, medical fraud, and NLP datasets suggest that the proposed methods generate reasonable adversarial sequences that are close to original ones, but fool machine learning models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The code is available at

    https://github.com/fursovia/dilma/tree/master. The data is available at https://www.dropbox.com/s/axu26guw2a0mwos/adat_datasets.zip?dl=0.

  2. 2.

    https://www.kaggle.com/c/python-and-analyze-data-final-project/data.

References

  1. Yuan, X., He, P., Zhu, Q., Li, X.: Adversarial examples: attacks and defenses for deep learning. IEEE Trans. Neural Netw. Learn. Syst. 30(9), 2805–2824 (2019)

    Article  MathSciNet  Google Scholar 

  2. Akhtar, N., Mian, A.: Threat of adversarial attacks on deep learning in computer vision: a survey. IEEE Access 6, 14410–14430 (2018)

    Article  Google Scholar 

  3. Khrulkov, V., Oseledets, I.: Art of singular vectors and universal adversarial perturbations. In: IEEE CVPR, pp. 8562–8570 (2018)

    Google Scholar 

  4. Zhang, W.E., Sheng, Q.Z., Alhazmi, A., Li, C.: Adversarial attacks on deep-learning models in natural language processing: a survey. ACM Trans. Intell. Syst. Technol. (TIST) 11(3), 1–41 (2020)

    Google Scholar 

  5. Wang, W., Tang, B., Wang, R., Wang, L., Ye, A.: A survey on adversarial attacks and defenses in text. arXiv:1902.07285 preprint (2019)

  6. Sun, L., Wang, J., Yu, P.S., Li, B.: Adversarial attack and defense on graph data: a survey. arXiv:1812.10528 preprint (2018)

  7. Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial machine learning at scale. In: ICLR (2017)

    Google Scholar 

  8. Samanta, S., Mehta, S.: Towards crafting text adversarial samples. arXiv:1707.02812 preprint (2017)

  9. Liang, B., Li, H., Su, M., Bian, P., Li, X., Shi, W.: Deep text classification can be fooled. In: IJCAI (2017)

    Google Scholar 

  10. Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: Hotflip: white-box adversarial examples for text classification. In: Annual Meeting of ACL, pp. 31–36 (2018)

    Google Scholar 

  11. Sato, M., Suzuki, H., Shindo, J., Matsumoto, Y.: Interpretable adversarial perturbation in input embedding space for text. In: IJCAI (2018)

    Google Scholar 

  12. Moon, S., Neves, L., Carvalho, V.: Multimodal named entity recognition for short social media posts. In: Conference of the North American Chapter ACL: Human Language Technologies, pp. 852–860 (2018)

    Google Scholar 

  13. Bowman, S., Vilnis, L., Vinyals, O., Dai, A., Jozefowicz, R., Bengio, S.: Generating sentences from a continuous space. In: SIGNLL CoNNL, pp. 10–21 (2016)

    Google Scholar 

  14. Szegedy, C., et al.: Intriguing properties of neural networks. In: ICLR (2014)

    Google Scholar 

  15. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2014)

    Google Scholar 

  16. Zügner, D., Akbarnejad, A., Günnemann, S.: Adversarial attacks on neural networks for graph data. In: ACM SIGKDD, pp. 2847–2856 (2018)

    Google Scholar 

  17. Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: IEEE MILCOM, pp. 49–54 (2016)

    Google Scholar 

  18. Gao, J., Lanchantin, J., Soffa, M.L., Qi, Y.: Black-box generation of adversarial text sequences to evade deep learning classifiers. In: IEEE Security and Privacy Workshops, pp. 50–56. IEEE (2018)

    Google Scholar 

  19. Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In: AAAI (2020)

    Google Scholar 

  20. Fursov, I., Zaytsev, A., Khasyanov, R., Spindler, M., Burnaev, E.: Sequence embeddings help to identify fraudulent cases in healthcare insurance. arXiv:1910.03072 preprint (2019)

  21. Ren, Y., et al: Generating natural language adversarial examples on a large scale with generative models. arXiv preprint arXiv:2003.10388 (2020)

  22. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)

    Google Scholar 

  23. Li, P., Lam, W., Bing, L., Wang, Z.: Deep recurrent generative decoder for abstractive text summarization. In: EMNLP, pp. 2091–2100 (2017)

    Google Scholar 

  24. Hu, R., Andreas, J., Rohrbach, M.: Learning to reason: end-to-end module networks for visual question answering. In: IEEE ICCV, pp. 804–813 (2017)

    Google Scholar 

  25. Gu, J., Lu, Z., Li, H., Li, V.O.: Incorporating copying mechanism in sequence-to-sequence learning. In: Annual Meeting of ACL, pp. 1631–1640 (2016)

    Google Scholar 

  26. Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM (1999)

    Google Scholar 

  27. Graves, A.: Sequence transduction with recurrent neural networks. arXiv:1211.3711 preprint (2012)

  28. Song, K., Tan, X., Qin, T., Lu, J., Liu, T.-Y.: Mass: masked sequence to sequence pre-training for language generation. In: ICML, pp. 5926–5936 (2019)

    Google Scholar 

  29. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Annual Meeting of ACL, pp. 311–318 (2002)

    Google Scholar 

  30. Fursov, I., Zaytsev, A., et al.: Differentiable language model adversarial attacks on categorical sequence classifiers. arXiv:2006.11078 preprint (2020)

  31. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: NeurIPS, pp. 649–657 (2015)

    Google Scholar 

Download references

Acknowledgments

The work presented in Sect. 3 by Alexey Zaytsev was supported by RSF grant 20–71-10135. The work presented in Sect. 4 by Evgeny Burnaev was supported by RFBR grant 20-01-00203.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivan Fursov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fursov, I., Zaytsev, A., Kluchnikov, N., Kravchenko, A., Burnaev, E. (2021). Gradient-Based Adversarial Attacks on Categorical Sequence Models via Traversing an Embedded World. In: van der Aalst, W.M.P., et al. Analysis of Images, Social Networks and Texts. AIST 2020. Lecture Notes in Computer Science(), vol 12602. Springer, Cham. https://doi.org/10.1007/978-3-030-72610-2_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-72610-2_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72609-6

  • Online ISBN: 978-3-030-72610-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics