Abstract
The lack of robustness is a serious problem for deep neural networks (DNNs) and makes DNNs vulnerable to adversarial examples. A promising solution is applying adversarial training to alleviate this problem, which allows the model to learn the features from adversarial examples. However, adversarial training usually produces overfitted models and may not work when facing a new attack. We believe this is because the previous adversarial training using cross-entropy loss ignores the similarity between the adversarial examples and the original examples, which will result in a low margin. Accordingly, we propose a supervised adversarial contrastive learning (SACL) approach for adversarial training. SACL uses supervised adversarial contrastive loss which contains both the cross-entropy term and adversarial contrastive term. The cross-entropy term is used for guiding DNN inductive bias learning, and the adversarial contrastive term can help models learn example representations by maximizing feature consistency under different original examples, which fits well with the goal of solving low margins. In addition, SACL only uses adversarial examples which can successfully fool the model and their corresponding original examples for training. This process is more advantageous to provide the model with more accurate information about the decision boundary and obtain a model that fits the example distribution. Experiments show that SACL can reduce the attack success rate of multiple adversarial attack algorithms against different models on text classification tasks. The defensive performance is significantly better than other adversarial training approaches without reducing the generalization ability of the model. In addition, the DNN model trained by our approach has high transferability and robustness.





Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availability
The datasets generated during and/or analysed during the current study are available in the GitHub repository, [https://github.com/chrisli1995/paper/tree/main/SACL].
References
Akash AK, Lokhande VS, Ravi SN, Singh V (2021) Learning invariant representations using inverse contrastive loss. In: Proceedings of the 35th AAAI conference on artificial intelligence, pp 6582–6591
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, et al (2020) Language models are few-shot learners. In: Proceedings of the 34th conference on neural information processing systems, pp 2339–2352
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. In: Proceedings of the 34th conference on neural information processing systems
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: Proceedings of the international conference on machine learning, pp 1597–1607
Deng Z, Liu H, Wang Y, Wang C, Yu Z, Sun X (2021) Pml: Progressive margin loss for long-tailed age classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10503–10512
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the North American chapter of the association for computational linguistics, pp 4171–4186
Eger S, Şahin GG, Rücklé A, Lee JU, Schulz C, Mesgar M, Swarnkar K, Simpson E, Gurevych I (2019) Text processing like humans do: visually attacking and shielding NLP systems. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1634–1647
Gan C, Feng Q, Zhang Z (2021) Scalable multi-channel dilated CNN-BiLSTM model with attention mechanism for Chinese textual sentiment analysis. Future Gener Comput Syst 118:297–309
Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. In: Proceedings of the 3rd international conference on learning representations
Gunel B, Du J, Conneau A, Stoyanov V (2020) Supervised contrastive learning for pre-trained language model fine-tuning. In: Proceedings of the 9th international conference on learning representations
He X, Lyu L, Xu Q, Sun L (2021) Model extraction and adversarial transferability, your bert is vulnerable! In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 2006–2012
Jia R, Liang P (2017) Adversarial examples for evaluating reading comprehension systems. In: Proceedings of the conference on empirical methods in natural language processing, pp 2021–2031
Jiang Z, Chen T, Chen T, Wang Z (2020) Robust pre-training by adversarial contrastive learning. In: Proceedings of the 34th conference on neural information processing systems
Jin D, Jin Z, Zhou JT, Szolovits P (2020) Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In: Proceedings of the 34th AAAI conference on artificial intelligence, vol 34, pp 8018–8025
Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. arXiv preprint arXiv:2004.11362
Kim M, Tack J, Hwang SJ (2020) Adversarial self-supervised contrastive learning. In: Proceedings of the 34th conference on neural information processing systems
Li J, Ji S, Du T, Li B, Wang T (2019) Textbugger: generating adversarial text against real-world applications. In: Proceedings of the 26th network and distributed system security symposium
Li D, Zhang Y, Peng H, Chen L, Brockett C, Sun MT, Dolan B (2021) Contextualized perturbation for textual adversarial attack. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 5053–5069
Lin J, Zou J, Ding N (2021) Using adversarial attacks to reveal the statistical bias in machine reading comprehension models. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, pp 333–342
Liu H, Zhang Y, Wang Y, Lin Z, Chen Y (2020) Joint character-level word embedding and adversarial stability training to defend adversarial text. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 8384–8391
Lv Y, Wei F, Zheng Y, Wang C, Wan C, Wang C (2021) A span-based model for aspect terms extraction and aspect sentiment classification. Neural Comput Appl 33(8):3769–3779
Maruf S, Saleh F, Haffari G (2021) A survey on document-level neural machine translation: methods and evaluation. ACM Comput Surv 54(2):1–36
Rakhlin A (2014) Convolutional neural networks for sentence classification. In: Proceedings of the conference on empirical methods in natural language processing
Ren S, Deng Y, He K, Che W (2019) Generating natural language adversarial examples through probability weighted word saliency. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 1085–1097
Shen C, Li Z, Chu Y, Zhao Z (2021) GAR: Graph adversarial representation for adverse drug event detection on twitter. Appl Soft Comput 106:107324
Sun X, Jiang J, Shang Y (2021) ESRE: handling repeated entities in distant supervised relation extraction. Neural Comput Appl 33(8):1–13
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2014) Intriguing properties of neural networks. In: Proceedings of the international conference on learning representations
Tan S, Joty S, Kan MY, Socher R (2020) It’s Morphin’Time! combating linguistic discrimination with inflectional perturbations. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 2920–2935
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Wang W, Wang R, Wang L, Wang Z, Ye A (2020) Towards a robust deep neural network in texts: a survey. arXiv preprint arXiv:1902.07285
Wang X, Yang Y, Deng Y, He K (2020) Adversarial training with fast gradient projection method against synonym substitution based text attacks. In: Proceedings of the 35th AAAI conference on artificial intelligence, pp 13997–14005
Wang Y, Che W, Titov I, Cohen S, Zhao Z, Liu T (2021) A closer look into the robustness of neural dependency parsers using better adversarial examples. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, pp 2344–2354
Xu Y, Zhong X, Yepes AJ, Lau JH (2021) Grey-box adversarial attack and defence for sentiment classification. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 4078–4087
Yu H, Yang K, Zhang T, Tsai YY, Ho TY, Jin Y (2020) Cloudleak: large-scale deep learning models stealing through adversarial examples. In: Proceedings of the 27th network and distributed system security symposium
Yuan M, Xu Y (2021) Bound estimation-based safe acceleration for maximum margin of twin spheres machine with pinball loss. Pattern Recogn 114:107860
Zhou Q, Zhou W, Wang S, Xing Y (2021) Unsupervised domain adaptation with adversarial distribution adaptation network. Neural Comput Appl 33(13):7709–7721
Zhou Y, Zheng X, Hsieh CJ, Chang KW, Huang X (2021) Defense against synonym substitution-based adversarial attacks via Dirichlet neighborhood ensemble. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, pp 5482–5492
Acknowledgements
This work is supported by joint funds of China’s national natural science foundation (U1936122) and Primary Research & Developement Plan of Hubei Province (2020BAB101).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Financial interest
The authors have no relevant financial or non-financial interests to disclose. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, W., Zhao, B., An, Y. et al. Supervised contrastive learning for robust text adversarial training. Neural Comput & Applic 35, 7357–7368 (2023). https://doi.org/10.1007/s00521-022-07871-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07871-5