Skip to main content
Log in

An adversarial-example generation method for Chinese sentiment tendency classification based on audiovisual confusion and contextual association

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The generation methods of adversarial examples have been more explored on English data, while the research papers on Chinese adversarial examples are very limited. At the same time, the existing Chinese adversarial attack methods are often characterized by a single form of generation and not rich enough expression. And the attack effect of these methods still has room for improvement. Therefore, this paper proposes SentiAttack, a method to introduce 6 perturbations from two perspectives, according to the characteristics of Chinese. The 6 types of perturbation were obtained from both audiovisual deception (words with similar sound, Chinese characters with similar form, horizontal splitting of Chinese character and reverse order of adjacent Chinese characters within word) and contextualized generation (WoBERT-MLM (Su in Wobert: Word-based chinese bert model - zhuiyiai. Technical report, 2020. https://github.com/ZhuiyiTechnology/WoBERT) word generation and LongLM (Guan et al. in Trans Assoc Comput Linguist 10:434–451, 2022. https://doi.org/10.1162/tacl_a_00469) sentence-piece generation), respectively. In addition, a “fluency” metric is added to further measure the quality of the adversarial examples. We conducted experiments on five datasets (CH-SIMS 3, ChnSentiCorp, online shopping, waimai, and weibo8). With the effective constraints of semantic similarity, expression fluency and perturbation, we obtained 74.40%, 49.10%, 42.90%, 39.90% and 66.20% accuracy decrease, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Notes

  1. IDEA-CCNL (2022) Fengshenbang-LM. https://github.com/IDEA-CCNL/Fengshenbang-LM. Accessed 1 October 2022.

  2. The Scalable Knowledge Intelligence team at IBM Almaden Research Center (2018) DimSim. https://github.com/System-T/DimSim. Accessed 1 October 2022.

  3. XiaoFang (2017) SimilarCharacter. https://github.com/contr4l/SimilarCharacter. Accessed 1 October 2022.

  4. QQXIUZI (2008) chaizi. https://www.qqxiuzi.cn/zh/chaizi.htm. Accessed 1 October 2022.

  5. SophonPlus (2017) ChnSentiCorp_htl_all. https://github.com/SophonPlus/ChineseNlpCorpus/blob/master/datasets/ChnSentiCorp_htl_all. Accessed 1 October 2022.

  6. SophonPlus (2017) online shopping_10_cats. https://github.com/SophonPlus/ChineseNlpCorpus/blob/master/datasets/online_shopping_10_cats. Accessed 1 October 2022.

  7. SophonPlus (2017) waimai_10k. https://github.com/SophonPlus/ChineseNlpCorpus/tree/master/datasets/waimai_10k. Accessed 1 October 2022.

  8. CCF TCCI (2014) Emotion Analysis in Chinese Weibo Texts. http://tcci.ccf.org.cn/conference/2014/pages/page04_sam.html#. Accessed 1 October 2022.

  9. Baidu AI (2021) Short text similarity. https://ai.baidu.com/ai-doc/NLP/ek6z52frp. Accessed 1 October 2022.

  10. Baidu AI (2021) Chinese DNN language model. https://ai.baidu.com/ai-doc/NLP/0k6z52fb4. Accessed 1 October 2022.

References

  1. Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2014) Intriguing properties of neural networks. In: 2nd international conference on learning representations, ICLR 2014, Banff, April 14–16, 2014, conference track proceedings, http://arxiv.org/abs/1312.6199

  2. Goodfellow I, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: 3rd international conference on learning representations, ICLR 2015, San Diego, May 7–9, 2015, conference track proceedings. http://arxiv.org/abs/1412.6572

  3. Nguyen A, Yosinski J, Clune J (2015) Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, June 7–12, pp 427–436. https://doi.org/10.1109/CVPR.2015.7298640

  4. Chakraborty A, Alam M, Dey V, Chattopadhyay A, Mukhopadhyay D (2021) A survey on adversarial attacks and defences. CAAI Trans Intell Technol 6(1):25–45. https://doi.org/10.1049/cit2.12028

    Article  Google Scholar 

  5. Wang W, Wang R, Wang L, Tang B (2019) Adversarial examples generation approach for tendency classification on chinese texts. Ruan Jian Xue Bao/J Softw 30(08):2415–2427

    Google Scholar 

  6. Tong X, Wang L, Wang R, Wang J (2020) A generation method of word-level adversarial samples for chinese text classification. Netinfo Secur 20(09):12–16

    Google Scholar 

  7. Li L, Shao Y, Song D, Qiu X, Huang X (2020) Generating adversarial examples in chinese texts using sentence-pieces. CoRR, abs/2012.14769. https://arxiv.org/abs/2012.14769

  8. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, June 2–7, 2019, Vol. 1 (Long and Short Papers), pp 4171–4186. https://doi.org/10.18653/v1/n19-1423

  9. Hongxu Ou, Long Yu, Tian S, Chen X (2022) Chinese adversarial examples generation approach with multi-strategy based on semantic. Knowl Inf Syst 64(4):1101–1119. https://doi.org/10.1007/s10115-022-01652-1

    Article  Google Scholar 

  10. Jin D, Jin Z, Zhou JT, Szolovits P (2020) Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: The thirty-fourth AAAI conference on artificial intelligence, AAAI 2020, the thirty-second innovative applications of artificial intelligence conference, IAAI 2020, The tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, pp 8018–8025. https://ojs.aaai.org/index.php/AAAI/article/view/6311

  11. Eke CI, Norman AA, Shuib L, Nweke HF (2019) A survey of user profiling: state-of-the-art, challenges, and solutions. IEEE Access 7:144907–144924. https://doi.org/10.1109/ACCESS.2019.2944243

    Article  Google Scholar 

  12. Kaddoura S, Chandrasekaran G, Popescu DE, Duraisamy JH (2022) A systematic literature review on spam content detection and classification. PeerJ Comput Sci 8:e830. https://doi.org/10.7717/peerj-cs.830

    Article  Google Scholar 

  13. Wang W, Tang B, Wang R, Wang L, Ye A (2019) A survey on adversarial attacks and defenses in text. CoRR, abs/1902.07285

  14. Papernot N, McDaniel PD, Wu X, Jha S, Swami A (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In: IEEE symposium on security and privacy, SP 2016, San Jose, pp 582–597. https://doi.org/10.1109/SP.2016.41

  15. Belinkov Y, Bisk Y (2018) Synthetic and natural noise both break neural machine translation. In: 6th international conference on learning representations, ICLR 2018, Vancouver, Conference track proceedings. https://openreview.net/forum?id=BJ8vJebC-

  16. Gao J, Lanchantin J, Soffa ML, Qi Y (2018) Black-box generation of adversarial text sequences to evade deep learning classifiers. In: 2018 IEEE security and privacy workshops, SP workshops 2018, San Francisco, pp 50–56. https://doi.org/10.1109/SPW.2018.00016

  17. Li J, Ji S, Du T, Li B, Wang T (2019) Textbugger: generating adversarial text against real-world applications. In: 26th annual network and distributed system security symposium, NDSS 2019, San Diego, https://www.ndss-symposium.org/ndss-paper/textbugger-generating-adversarial-text-against-real-world-applications/

  18. Samanta S, Mehta S (2017) Towards crafting text adversarial samples. CoRR, abs/1707.02812

  19. Liang B, Li H, Su M, Bian P, Li X, Shi W (2018) Deep text classification can be fooled. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI 2018, Stockholm, pp 4208–4215, https://doi.org/10.24963/ijcai.2018/585

  20. Alzantot M, Sharma Y, Elgohary A, Ho B-J, Srivastava MB, Chang K-W (2018) Generating natural language adversarial examples. In: Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, pp 2890–2896. https://doi.org/10.18653/v1/d18-1316

  21. Ren S, Deng Y, He K, Che W (2019) Generating natural language adversarial examples through probability weighted word saliency. In: Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, Vol. 1, Long Papers, pp 1085–1097, https://doi.org/10.18653/v1/p19-1103

  22. Zang Y, Qi F, Yang C, Liu Z, Zhang M, Liu Q, Sun M (2020) Word-level textual adversarial attacking as combinatorial optimization. In: Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, pp 6066–6080. https://doi.org/10.18653/v1/2020.acl-main.540

  23. Garg S, Ramakrishnan G (2020) BAE: bert-based adversarial examples for text classification. In: Proceedings of the 2020 conference on empirical methods in natural language processing, EMNLP 2020, Online, pp 6174–6181. https://doi.org/10.18653/v1/2020.emnlp-main.498

  24. Li L, Ma R, Guo Q, Xue X, Qiu X (2020) BERT-ATTACK: adversarial attack against BERT using BERT. In: Proceedings of the 2020 conference on empirical methods in natural language processing, EMNLP 2020, pp 6193–6202. https://doi.org/10.18653/v1/2020.emnlp-main.500

  25. Jia R, Liang P (2017) Adversarial examples for evaluating reading comprehension systems. In: Proceedings of the 2017 conference on empirical methods in natural language processing, EMNLP 2017, Copenhagen, pp 2021–2031. Association for computational linguistics. https://doi.org/10.18653/v1/d17-1215

  26. Ribeiro MT, Singh S, Guestrin C (2018) Semantically equivalent adversarial rules for debugging NLP models. In: Proceedings of the 56th annual meeting of the association for computational linguistics, ACL 2018, Melbourne, Association for computational linguistics, pp 856–865, https://doi.org/10.18653/v1/P18-1079

  27. Iyyer M, Wieting J, Gimpel K, Zettlemoyer L (2018) Adversarial example generation with syntactically controlled paraphrase networks. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2018, New Orleans, Louisiana, Association for Computational Linguistics, pp 1875–1885. https://doi.org/10.18653/v1/n18-1170

  28. Wang T, Wang X, Qin Y, Packer B, Li K, Chen J, Beutel A, Chi EH (2020) Cat-gen: Improving robustness in NLP models via controlled adversarial text generation. In: Proceedings of the 2020 conference on empirical methods in natural language processing, EMNLP 2020, Association for computational linguistics, pp 5141–5146. https://doi.org/10.18653/v1/2020.emnlp-main.417

  29. Xu L, Veeramachaneni K (2021) Attacking text classifiers via sentence rewriting sampler. CoRR, abs/2104.08453

  30. Hossam M, Le T, Zhao H, Phung D (2020) Explain2attack: text adversarial attacks via cross-domain interpretability. In: 25th international conference on pattern recognition, ICPR 2020, Virtual Event/Milan, pp 8922–8928, https://doi.org/10.1109/ICPR48806.2021.9412526

  31. Jiang L (2017) The effect of placement of character and word in chinese reading: an eyetracking study. FuJian Normal University, pp 22–25

    Google Scholar 

  32. Cui Y, Che W, Liu T, Qin B, Yang Z (2021) Pre-training with whole word masking for Chinese BERT. IEEE ACM Trans Audio Speech Lang Process 29:3504–3514. https://doi.org/10.1109/TASLP.2021.3124365

    Article  Google Scholar 

  33. Zhang X, Li P, Li H (2021) AMBERT: a pre-trained language model with multi-grained tokenization. In: Findings of the association for computational linguistics: ACL/IJCNLP 2021, Online Event, volume ACL/IJCNLP 2021 of Findings of ACL, pp 421–435. https://doi.org/10.18653/v1/2021.findings-acl.37

  34. Su J (2020) Speeding up without reducing accuracy: Chinese wobert based on word granularity. https://spaces.ac.cn/archives/7758. Accessed 1 Oct 2022

  35. Yu W, Xu H, Meng F, Zhu Y, Ma Y, Wu J, Zou J, Yang K (2020) CH-SIMS: a chinese multimodal sentiment analysis dataset with fine-grained annotation of modality. In: Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, pp 3718–3727. https://doi.org/10.18653/v1/2020.acl-main.343

Download references

Author information

Authors and Affiliations

Authors

Contributions

Hongxu Ou and Long Yu wrote the main manuscript text, and Shengwei Tian and Xin Chen prepared the forms, pictures and other materials. Chen Shi revised the paper. All authors participated in the discussion of the problem, experimental design and the review of the text.

Corresponding author

Correspondence to Long Yu.

Ethics declarations

Conflict of interest

We all declare that we have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the National Natural Science Foundation of China (Grant Numbers 61962057, U2003208); Xinjiang Key R & D Project (Grant Number 2021B01002).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ou, H., Yu, L., Tian, S. et al. An adversarial-example generation method for Chinese sentiment tendency classification based on audiovisual confusion and contextual association. Knowl Inf Syst 65, 5231–5258 (2023). https://doi.org/10.1007/s10115-023-01946-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-01946-y

Keywords

Navigation