Skip to main content

A Data-Free Substitute Model Training Method for Textual Adversarial Attacks

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1968))

Included in the following conference series:

  • 471 Accesses

Abstract

BERT and other pre-trained language models are vulnerable to textual adversarial attacks. While current transfer-based textual adversarial attacks in black-box environments rely on real datasets to train substitute models, obtaining the datasets can be challenging for attackers. To address this issue, we propose a data-free substitute training method (DaST-T) for textual adversarial attacks, which can train substitute models without the assistance of real data. DaST-T consists of two major steps. Firstly, DaST-T creates a special Generative Adversarial Network (GAN) to train substitute models without any real data. The training procedure utilizes samples synthesized at random by the generative model, where labels are generated by the attacked model. In particular, DaST-T designs a data augmenter for the generative model to facilitate rapid exploration of the entire sample space, thereby accelerating the performance of substitute model training. Secondly, DaST-T applies existing white-box textual adversarial attack methods to the substitute model to generate adversarial text, which is then migrated to the attacked model. DaST-T can effectively address the issue of limited access to real datasets in black-box textual adversarial attacks. Experimental results on text classification tasks in NLP show that DaST-T can achieve superior attack performance compared to other baselines of black-box textual adversarial attacks while requiring fewer sample queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.J., Srivastava, M., Chang, K.W.: Generating natural language adversarial examples. arXiv preprint arXiv:1804.07998 (2018)

  2. Cheng, M., Yi, J., Chen, P.Y., Zhang, H., Hsieh, C.J.: Seq2Sick: evaluating the robustness of sequence-to-sequence models with adversarial examples. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 3601–3608 (2020)

    Google Scholar 

  3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  4. Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751 (2017)

  5. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    Book  MATH  Google Scholar 

  6. Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. arXiv preprint arXiv:2004.01970 (2020)

  7. Guo, C., Sablayrolles, A., Jégou, H., Kiela, D.: Gradient-based adversarial attacks against text transformers. arXiv preprint arXiv:2104.13733 (2021)

  8. Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 8018–8025 (2020)

    Google Scholar 

  9. Lee, D., Moon, S., Lee, J., Song, H.O.: Query-efficient and scalable black-box adversarial attacks on discrete sequential data via Bayesian optimization. In: International Conference on Machine Learning, pp. 12478–12497. PMLR (2022)

    Google Scholar 

  10. Li, J., Ji, S., Du, T., Li, B., Wang, T.: TEXTBUGGER: generating adversarial text against real-world applications. arXiv preprint arXiv:1812.05271 (2018)

  11. Li, L., Ma, R., Guo, Q., Xue, X., Qiu, X.: BERT-attack: adversarial attack against BERT using BERT. arXiv preprint arXiv:2004.09984 (2020)

  12. Liu, S., Lu, N., Chen, C., Tang, K.: Efficient combinatorial optimization for word-level adversarial textual attack. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 98–111 (2021)

    Article  Google Scholar 

  13. Maheshwary, R., Maheshwary, S., Pudi, V.: A strong baseline for query efficient attacks in a black box setting. arXiv preprint arXiv:2109.04775 (2021)

  14. Miyato, T., Dai, A.M., Goodfellow, I.: Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725 (2016)

  15. Tan, S., Joty, S., Kan, M.Y., Socher, R.: It’s morphin’time! Combating linguistic discrimination with inflectional perturbations. arXiv preprint arXiv:2005.04364 (2020)

  16. Wang, B., Xu, C., Liu, X., Cheng, Y., Li, B.: SemAttack: natural textual attacks via different semantic spaces. arXiv preprint arXiv:2205.01287 (2022)

  17. Wang, X., Jin, H., He, K.: Natural language adversarial attacks and defenses in word level (2019). arXiv preprint arXiv:1909.06723

  18. Yu, M., Sun, S.: FE-DaST: fast and effective data-free substitute training for black-box adversarial attacks. Comput. Secur. 113, 102555 (2022)

    Article  Google Scholar 

  19. Zang, Y., et al.: Word-level textual adversarial attacking as combinatorial optimization. arXiv preprint arXiv:1910.12196 (2019)

  20. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. Adv. Neural Inf. Process. Syst. 28 (2015)

    Google Scholar 

  21. Zhou, M., Wu, J., Liu, Y., Liu, S., Zhu, C.: DAST: data-free substitute training for adversarial attacks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 234–243 (2020)

    Google Scholar 

Download references

Acknowledgements

The work of this article has been supported by the Key R &D projects in Hubei Province under Grant No.2022BAA041.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhidong Shen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, F., Shen, Z. (2024). A Data-Free Substitute Model Training Method for Textual Adversarial Attacks. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1968. Springer, Singapore. https://doi.org/10.1007/978-981-99-8181-6_23

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8181-6_23

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8180-9

  • Online ISBN: 978-981-99-8181-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics