A Data-Free Substitute Model Training Method for Textual Adversarial Attacks

Chen, Fenghong; Shen, Zhidong

doi:10.1007/978-981-99-8181-6_23

Fenghong Chen¹⁰ &
Zhidong Shen¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1968))

Included in the following conference series:

International Conference on Neural Information Processing

471 Accesses

Abstract

BERT and other pre-trained language models are vulnerable to textual adversarial attacks. While current transfer-based textual adversarial attacks in black-box environments rely on real datasets to train substitute models, obtaining the datasets can be challenging for attackers. To address this issue, we propose a data-free substitute training method (DaST-T) for textual adversarial attacks, which can train substitute models without the assistance of real data. DaST-T consists of two major steps. Firstly, DaST-T creates a special Generative Adversarial Network (GAN) to train substitute models without any real data. The training procedure utilizes samples synthesized at random by the generative model, where labels are generated by the attacked model. In particular, DaST-T designs a data augmenter for the generative model to facilitate rapid exploration of the entire sample space, thereby accelerating the performance of substitute model training. Secondly, DaST-T applies existing white-box textual adversarial attack methods to the substitute model to generate adversarial text, which is then migrated to the attacked model. DaST-T can effectively address the issue of limited access to real datasets in black-box textual adversarial attacks. Experimental results on text classification tasks in NLP show that DaST-T can achieve superior attack performance compared to other baselines of black-box textual adversarial attacks while requiring fewer sample queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.J., Srivastava, M., Chang, K.W.: Generating natural language adversarial examples. arXiv preprint arXiv:1804.07998 (2018)
Cheng, M., Yi, J., Chen, P.Y., Zhang, H., Hsieh, C.J.: Seq2Sick: evaluating the robustness of sequence-to-sequence models with adversarial examples. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 3601–3608 (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751 (2017)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Book MATH Google Scholar
Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. arXiv preprint arXiv:2004.01970 (2020)
Guo, C., Sablayrolles, A., Jégou, H., Kiela, D.: Gradient-based adversarial attacks against text transformers. arXiv preprint arXiv:2104.13733 (2021)
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 8018–8025 (2020)
Google Scholar
Lee, D., Moon, S., Lee, J., Song, H.O.: Query-efficient and scalable black-box adversarial attacks on discrete sequential data via Bayesian optimization. In: International Conference on Machine Learning, pp. 12478–12497. PMLR (2022)
Google Scholar
Li, J., Ji, S., Du, T., Li, B., Wang, T.: TEXTBUGGER: generating adversarial text against real-world applications. arXiv preprint arXiv:1812.05271 (2018)
Li, L., Ma, R., Guo, Q., Xue, X., Qiu, X.: BERT-attack: adversarial attack against BERT using BERT. arXiv preprint arXiv:2004.09984 (2020)
Liu, S., Lu, N., Chen, C., Tang, K.: Efficient combinatorial optimization for word-level adversarial textual attack. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 98–111 (2021)
Article Google Scholar
Maheshwary, R., Maheshwary, S., Pudi, V.: A strong baseline for query efficient attacks in a black box setting. arXiv preprint arXiv:2109.04775 (2021)
Miyato, T., Dai, A.M., Goodfellow, I.: Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725 (2016)
Tan, S., Joty, S., Kan, M.Y., Socher, R.: It’s morphin’time! Combating linguistic discrimination with inflectional perturbations. arXiv preprint arXiv:2005.04364 (2020)
Wang, B., Xu, C., Liu, X., Cheng, Y., Li, B.: SemAttack: natural textual attacks via different semantic spaces. arXiv preprint arXiv:2205.01287 (2022)
Wang, X., Jin, H., He, K.: Natural language adversarial attacks and defenses in word level (2019). arXiv preprint arXiv:1909.06723
Yu, M., Sun, S.: FE-DaST: fast and effective data-free substitute training for black-box adversarial attacks. Comput. Secur. 113, 102555 (2022)
Article Google Scholar
Zang, Y., et al.: Word-level textual adversarial attacking as combinatorial optimization. arXiv preprint arXiv:1910.12196 (2019)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. Adv. Neural Inf. Process. Syst. 28 (2015)
Google Scholar
Zhou, M., Wu, J., Liu, Y., Liu, S., Zhu, C.: DAST: data-free substitute training for adversarial attacks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 234–243 (2020)
Google Scholar

Download references

Acknowledgements

The work of this article has been supported by the Key R &D projects in Hubei Province under Grant No.2022BAA041.

Author information

Authors and Affiliations

Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan, China
Fenghong Chen & Zhidong Shen

Authors

Fenghong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhidong Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhidong Shen .

Editor information

Editors and Affiliations

Scholl of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, F., Shen, Z. (2024). A Data-Free Substitute Model Training Method for Textual Adversarial Attacks. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1968. Springer, Singapore. https://doi.org/10.1007/978-981-99-8181-6_23

Download citation

DOI: https://doi.org/10.1007/978-981-99-8181-6_23
Published: 27 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8180-9
Online ISBN: 978-981-99-8181-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Data-Free Substitute Model Training Method for Textual Adversarial Attacks