Abstract
BERT and other pre-trained language models are vulnerable to textual adversarial attacks. While current transfer-based textual adversarial attacks in black-box environments rely on real datasets to train substitute models, obtaining the datasets can be challenging for attackers. To address this issue, we propose a data-free substitute training method (DaST-T) for textual adversarial attacks, which can train substitute models without the assistance of real data. DaST-T consists of two major steps. Firstly, DaST-T creates a special Generative Adversarial Network (GAN) to train substitute models without any real data. The training procedure utilizes samples synthesized at random by the generative model, where labels are generated by the attacked model. In particular, DaST-T designs a data augmenter for the generative model to facilitate rapid exploration of the entire sample space, thereby accelerating the performance of substitute model training. Secondly, DaST-T applies existing white-box textual adversarial attack methods to the substitute model to generate adversarial text, which is then migrated to the attacked model. DaST-T can effectively address the issue of limited access to real datasets in black-box textual adversarial attacks. Experimental results on text classification tasks in NLP show that DaST-T can achieve superior attack performance compared to other baselines of black-box textual adversarial attacks while requiring fewer sample queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.J., Srivastava, M., Chang, K.W.: Generating natural language adversarial examples. arXiv preprint arXiv:1804.07998 (2018)
Cheng, M., Yi, J., Chen, P.Y., Zhang, H., Hsieh, C.J.: Seq2Sick: evaluating the robustness of sequence-to-sequence models with adversarial examples. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 3601–3608 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751 (2017)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. arXiv preprint arXiv:2004.01970 (2020)
Guo, C., Sablayrolles, A., Jégou, H., Kiela, D.: Gradient-based adversarial attacks against text transformers. arXiv preprint arXiv:2104.13733 (2021)
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 8018–8025 (2020)
Lee, D., Moon, S., Lee, J., Song, H.O.: Query-efficient and scalable black-box adversarial attacks on discrete sequential data via Bayesian optimization. In: International Conference on Machine Learning, pp. 12478–12497. PMLR (2022)
Li, J., Ji, S., Du, T., Li, B., Wang, T.: TEXTBUGGER: generating adversarial text against real-world applications. arXiv preprint arXiv:1812.05271 (2018)
Li, L., Ma, R., Guo, Q., Xue, X., Qiu, X.: BERT-attack: adversarial attack against BERT using BERT. arXiv preprint arXiv:2004.09984 (2020)
Liu, S., Lu, N., Chen, C., Tang, K.: Efficient combinatorial optimization for word-level adversarial textual attack. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 98–111 (2021)
Maheshwary, R., Maheshwary, S., Pudi, V.: A strong baseline for query efficient attacks in a black box setting. arXiv preprint arXiv:2109.04775 (2021)
Miyato, T., Dai, A.M., Goodfellow, I.: Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725 (2016)
Tan, S., Joty, S., Kan, M.Y., Socher, R.: It’s morphin’time! Combating linguistic discrimination with inflectional perturbations. arXiv preprint arXiv:2005.04364 (2020)
Wang, B., Xu, C., Liu, X., Cheng, Y., Li, B.: SemAttack: natural textual attacks via different semantic spaces. arXiv preprint arXiv:2205.01287 (2022)
Wang, X., Jin, H., He, K.: Natural language adversarial attacks and defenses in word level (2019). arXiv preprint arXiv:1909.06723
Yu, M., Sun, S.: FE-DaST: fast and effective data-free substitute training for black-box adversarial attacks. Comput. Secur. 113, 102555 (2022)
Zang, Y., et al.: Word-level textual adversarial attacking as combinatorial optimization. arXiv preprint arXiv:1910.12196 (2019)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. Adv. Neural Inf. Process. Syst. 28 (2015)
Zhou, M., Wu, J., Liu, Y., Liu, S., Zhu, C.: DAST: data-free substitute training for adversarial attacks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 234–243 (2020)
Acknowledgements
The work of this article has been supported by the Key R &D projects in Hubei Province under Grant No.2022BAA041.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chen, F., Shen, Z. (2024). A Data-Free Substitute Model Training Method for Textual Adversarial Attacks. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1968. Springer, Singapore. https://doi.org/10.1007/978-981-99-8181-6_23
Download citation
DOI: https://doi.org/10.1007/978-981-99-8181-6_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8180-9
Online ISBN: 978-981-99-8181-6
eBook Packages: Computer ScienceComputer Science (R0)