An adversarial-example generation method for Chinese sentiment tendency classification based on audiovisual confusion and contextual association

Ou, Hongxu; Yu, Long; Tian, Shengwei; Chen, Xin; Shi, Chen; Wang, Bo; Zhou, Tiejun

doi:10.1007/s10115-023-01946-y

An adversarial-example generation method for Chinese sentiment tendency classification based on audiovisual confusion and contextual association

Regular Paper
Published: 08 August 2023

Volume 65, pages 5231–5258, (2023)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Hongxu Ou^1,2,
Long Yu³,
Shengwei Tian¹,
Xin Chen^1,2,
Chen Shi⁴,
Bo Wang¹ &
…
Tiejun Zhou⁵

190 Accesses
Explore all metrics

Abstract

The generation methods of adversarial examples have been more explored on English data, while the research papers on Chinese adversarial examples are very limited. At the same time, the existing Chinese adversarial attack methods are often characterized by a single form of generation and not rich enough expression. And the attack effect of these methods still has room for improvement. Therefore, this paper proposes SentiAttack, a method to introduce 6 perturbations from two perspectives, according to the characteristics of Chinese. The 6 types of perturbation were obtained from both audiovisual deception (words with similar sound, Chinese characters with similar form, horizontal splitting of Chinese character and reverse order of adjacent Chinese characters within word) and contextualized generation (WoBERT-MLM (Su in Wobert: Word-based chinese bert model - zhuiyiai. Technical report, 2020. https://github.com/ZhuiyiTechnology/WoBERT) word generation and LongLM (Guan et al. in Trans Assoc Comput Linguist 10:434–451, 2022. https://doi.org/10.1162/tacl_a_00469) sentence-piece generation), respectively. In addition, a “fluency” metric is added to further measure the quality of the adversarial examples. We conducted experiments on five datasets (CH-SIMS 3, ChnSentiCorp, online shopping, waimai, and weibo8). With the effective constraints of semantic similarity, expression fluency and perturbation, we obtained 74.40%, 49.10%, 42.90%, 39.90% and 66.20% accuracy decrease, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Breaking Bad: Unraveling Influences and Risks of User Inputs to ChatGPT for Game Story Generation

Learning to Generate Textual Adversarial Examples

Generating More Effective and Imperceptible Adversarial Text Examples for Sentiment Classification

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Notes

IDEA-CCNL (2022) Fengshenbang-LM. https://github.com/IDEA-CCNL/Fengshenbang-LM. Accessed 1 October 2022.
The Scalable Knowledge Intelligence team at IBM Almaden Research Center (2018) DimSim. https://github.com/System-T/DimSim. Accessed 1 October 2022.
XiaoFang (2017) SimilarCharacter. https://github.com/contr4l/SimilarCharacter. Accessed 1 October 2022.
QQXIUZI (2008) chaizi. https://www.qqxiuzi.cn/zh/chaizi.htm. Accessed 1 October 2022.
SophonPlus (2017) ChnSentiCorp_htl_all. https://github.com/SophonPlus/ChineseNlpCorpus/blob/master/datasets/ChnSentiCorp_htl_all. Accessed 1 October 2022.
SophonPlus (2017) online shopping_10_cats. https://github.com/SophonPlus/ChineseNlpCorpus/blob/master/datasets/online_shopping_10_cats. Accessed 1 October 2022.
SophonPlus (2017) waimai_10k. https://github.com/SophonPlus/ChineseNlpCorpus/tree/master/datasets/waimai_10k. Accessed 1 October 2022.
CCF TCCI (2014) Emotion Analysis in Chinese Weibo Texts. http://tcci.ccf.org.cn/conference/2014/pages/page04_sam.html#. Accessed 1 October 2022.
Baidu AI (2021) Short text similarity. https://ai.baidu.com/ai-doc/NLP/ek6z52frp. Accessed 1 October 2022.
Baidu AI (2021) Chinese DNN language model. https://ai.baidu.com/ai-doc/NLP/0k6z52fb4. Accessed 1 October 2022.

References

Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2014) Intriguing properties of neural networks. In: 2nd international conference on learning representations, ICLR 2014, Banff, April 14–16, 2014, conference track proceedings, http://arxiv.org/abs/1312.6199
Goodfellow I, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: 3rd international conference on learning representations, ICLR 2015, San Diego, May 7–9, 2015, conference track proceedings. http://arxiv.org/abs/1412.6572
Nguyen A, Yosinski J, Clune J (2015) Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, June 7–12, pp 427–436. https://doi.org/10.1109/CVPR.2015.7298640
Chakraborty A, Alam M, Dey V, Chattopadhyay A, Mukhopadhyay D (2021) A survey on adversarial attacks and defences. CAAI Trans Intell Technol 6(1):25–45. https://doi.org/10.1049/cit2.12028
Article Google Scholar
Wang W, Wang R, Wang L, Tang B (2019) Adversarial examples generation approach for tendency classification on chinese texts. Ruan Jian Xue Bao/J Softw 30(08):2415–2427
Google Scholar
Tong X, Wang L, Wang R, Wang J (2020) A generation method of word-level adversarial samples for chinese text classification. Netinfo Secur 20(09):12–16
Google Scholar
Li L, Shao Y, Song D, Qiu X, Huang X (2020) Generating adversarial examples in chinese texts using sentence-pieces. CoRR, abs/2012.14769. https://arxiv.org/abs/2012.14769
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, June 2–7, 2019, Vol. 1 (Long and Short Papers), pp 4171–4186. https://doi.org/10.18653/v1/n19-1423
Hongxu Ou, Long Yu, Tian S, Chen X (2022) Chinese adversarial examples generation approach with multi-strategy based on semantic. Knowl Inf Syst 64(4):1101–1119. https://doi.org/10.1007/s10115-022-01652-1
Article Google Scholar
Jin D, Jin Z, Zhou JT, Szolovits P (2020) Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: The thirty-fourth AAAI conference on artificial intelligence, AAAI 2020, the thirty-second innovative applications of artificial intelligence conference, IAAI 2020, The tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, pp 8018–8025. https://ojs.aaai.org/index.php/AAAI/article/view/6311
Eke CI, Norman AA, Shuib L, Nweke HF (2019) A survey of user profiling: state-of-the-art, challenges, and solutions. IEEE Access 7:144907–144924. https://doi.org/10.1109/ACCESS.2019.2944243
Article Google Scholar
Kaddoura S, Chandrasekaran G, Popescu DE, Duraisamy JH (2022) A systematic literature review on spam content detection and classification. PeerJ Comput Sci 8:e830. https://doi.org/10.7717/peerj-cs.830
Article Google Scholar
Wang W, Tang B, Wang R, Wang L, Ye A (2019) A survey on adversarial attacks and defenses in text. CoRR, abs/1902.07285
Papernot N, McDaniel PD, Wu X, Jha S, Swami A (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In: IEEE symposium on security and privacy, SP 2016, San Jose, pp 582–597. https://doi.org/10.1109/SP.2016.41
Belinkov Y, Bisk Y (2018) Synthetic and natural noise both break neural machine translation. In: 6th international conference on learning representations, ICLR 2018, Vancouver, Conference track proceedings. https://openreview.net/forum?id=BJ8vJebC-
Gao J, Lanchantin J, Soffa ML, Qi Y (2018) Black-box generation of adversarial text sequences to evade deep learning classifiers. In: 2018 IEEE security and privacy workshops, SP workshops 2018, San Francisco, pp 50–56. https://doi.org/10.1109/SPW.2018.00016
Li J, Ji S, Du T, Li B, Wang T (2019) Textbugger: generating adversarial text against real-world applications. In: 26th annual network and distributed system security symposium, NDSS 2019, San Diego, https://www.ndss-symposium.org/ndss-paper/textbugger-generating-adversarial-text-against-real-world-applications/
Samanta S, Mehta S (2017) Towards crafting text adversarial samples. CoRR, abs/1707.02812
Liang B, Li H, Su M, Bian P, Li X, Shi W (2018) Deep text classification can be fooled. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI 2018, Stockholm, pp 4208–4215, https://doi.org/10.24963/ijcai.2018/585
Alzantot M, Sharma Y, Elgohary A, Ho B-J, Srivastava MB, Chang K-W (2018) Generating natural language adversarial examples. In: Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, pp 2890–2896. https://doi.org/10.18653/v1/d18-1316
Ren S, Deng Y, He K, Che W (2019) Generating natural language adversarial examples through probability weighted word saliency. In: Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, Vol. 1, Long Papers, pp 1085–1097, https://doi.org/10.18653/v1/p19-1103
Zang Y, Qi F, Yang C, Liu Z, Zhang M, Liu Q, Sun M (2020) Word-level textual adversarial attacking as combinatorial optimization. In: Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, pp 6066–6080. https://doi.org/10.18653/v1/2020.acl-main.540
Garg S, Ramakrishnan G (2020) BAE: bert-based adversarial examples for text classification. In: Proceedings of the 2020 conference on empirical methods in natural language processing, EMNLP 2020, Online, pp 6174–6181. https://doi.org/10.18653/v1/2020.emnlp-main.498
Li L, Ma R, Guo Q, Xue X, Qiu X (2020) BERT-ATTACK: adversarial attack against BERT using BERT. In: Proceedings of the 2020 conference on empirical methods in natural language processing, EMNLP 2020, pp 6193–6202. https://doi.org/10.18653/v1/2020.emnlp-main.500
Jia R, Liang P (2017) Adversarial examples for evaluating reading comprehension systems. In: Proceedings of the 2017 conference on empirical methods in natural language processing, EMNLP 2017, Copenhagen, pp 2021–2031. Association for computational linguistics. https://doi.org/10.18653/v1/d17-1215
Ribeiro MT, Singh S, Guestrin C (2018) Semantically equivalent adversarial rules for debugging NLP models. In: Proceedings of the 56th annual meeting of the association for computational linguistics, ACL 2018, Melbourne, Association for computational linguistics, pp 856–865, https://doi.org/10.18653/v1/P18-1079
Iyyer M, Wieting J, Gimpel K, Zettlemoyer L (2018) Adversarial example generation with syntactically controlled paraphrase networks. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2018, New Orleans, Louisiana, Association for Computational Linguistics, pp 1875–1885. https://doi.org/10.18653/v1/n18-1170
Wang T, Wang X, Qin Y, Packer B, Li K, Chen J, Beutel A, Chi EH (2020) Cat-gen: Improving robustness in NLP models via controlled adversarial text generation. In: Proceedings of the 2020 conference on empirical methods in natural language processing, EMNLP 2020, Association for computational linguistics, pp 5141–5146. https://doi.org/10.18653/v1/2020.emnlp-main.417
Xu L, Veeramachaneni K (2021) Attacking text classifiers via sentence rewriting sampler. CoRR, abs/2104.08453
Hossam M, Le T, Zhao H, Phung D (2020) Explain2attack: text adversarial attacks via cross-domain interpretability. In: 25th international conference on pattern recognition, ICPR 2020, Virtual Event/Milan, pp 8922–8928, https://doi.org/10.1109/ICPR48806.2021.9412526
Jiang L (2017) The effect of placement of character and word in chinese reading: an eyetracking study. FuJian Normal University, pp 22–25
Google Scholar
Cui Y, Che W, Liu T, Qin B, Yang Z (2021) Pre-training with whole word masking for Chinese BERT. IEEE ACM Trans Audio Speech Lang Process 29:3504–3514. https://doi.org/10.1109/TASLP.2021.3124365
Article Google Scholar
Zhang X, Li P, Li H (2021) AMBERT: a pre-trained language model with multi-grained tokenization. In: Findings of the association for computational linguistics: ACL/IJCNLP 2021, Online Event, volume ACL/IJCNLP 2021 of Findings of ACL, pp 421–435. https://doi.org/10.18653/v1/2021.findings-acl.37
Su J (2020) Speeding up without reducing accuracy: Chinese wobert based on word granularity. https://spaces.ac.cn/archives/7758. Accessed 1 Oct 2022
Yu W, Xu H, Meng F, Zhu Y, Ma Y, Wu J, Zou J, Yang K (2020) CH-SIMS: a chinese multimodal sentiment analysis dataset with fine-grained annotation of modality. In: Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, pp 3718–3727. https://doi.org/10.18653/v1/2020.acl-main.343

Download references

Author information

Authors and Affiliations

Software College, Xinjiang University, Ürümqi, 830008, China
Hongxu Ou, Shengwei Tian, Xin Chen & Bo Wang
The Key Laboratory of Software Engineering, Xinjiang University, Ürümqi, 830008, China
Hongxu Ou & Xin Chen
Network Center, Xinjiang University, Ürümqi, 830046, China
Long Yu
Mathematics and Systems Science, Xinjiang University, Ürümqi, 830046, China
Chen Shi
Internet Information Center, Xinjiang, 830000, China
Tiejun Zhou

Authors

Hongxu Ou
View author publications
You can also search for this author in PubMed Google Scholar
Long Yu
View author publications
You can also search for this author in PubMed Google Scholar
Shengwei Tian
View author publications
You can also search for this author in PubMed Google Scholar
Xin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chen Shi
View author publications
You can also search for this author in PubMed Google Scholar
Bo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tiejun Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Hongxu Ou and Long Yu wrote the main manuscript text, and Shengwei Tian and Xin Chen prepared the forms, pictures and other materials. Chen Shi revised the paper. All authors participated in the discussion of the problem, experimental design and the review of the text.

Corresponding author

Correspondence to Long Yu.

Ethics declarations

Conflict of interest

We all declare that we have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the National Natural Science Foundation of China (Grant Numbers 61962057, U2003208); Xinjiang Key R & D Project (Grant Number 2021B01002).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ou, H., Yu, L., Tian, S. et al. An adversarial-example generation method for Chinese sentiment tendency classification based on audiovisual confusion and contextual association. Knowl Inf Syst 65, 5231–5258 (2023). https://doi.org/10.1007/s10115-023-01946-y

Download citation

Received: 05 December 2022
Revised: 23 May 2023
Accepted: 15 July 2023
Published: 08 August 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10115-023-01946-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An adversarial-example generation method for Chinese sentiment tendency classification based on audiovisual confusion and contextual association

Abstract

Access this article

Similar content being viewed by others

Breaking Bad: Unraveling Influences and Risks of User Inputs to ChatGPT for Game Story Generation

Learning to Generate Textual Adversarial Examples

Generating More Effective and Imperceptible Adversarial Text Examples for Sentiment Classification

Data availability

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An adversarial-example generation method for Chinese sentiment tendency classification based on audiovisual confusion and contextual association

Abstract

Access this article

Similar content being viewed by others

Breaking Bad: Unraveling Influences and Risks of User Inputs to ChatGPT for Game Story Generation

Learning to Generate Textual Adversarial Examples

Generating More Effective and Imperceptible Adversarial Text Examples for Sentiment Classification

Data availability

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation