skip to main content
10.1145/3456529.3456556acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccdaConference Proceedingsconference-collections
research-article

WordErrorSim: An Adversarial Examples Generation Method in Chinese by Erroneous Knowledge

Authors Info & Claims
Published:13 July 2021Publication History

ABSTRACT

Deep learning is widely used in many kinds of network applications, but errors in texts challenge the model's performance. In order to study the impact of erroneous knowledge on the Chinese text classification model, in this paper we propose a word-level black box adversarial example generation method called WordErrorSim. The algorithm uses SIGHAN Back-off 2013 and ZDIC to construct erroneous knowledge space. The adversarial examples generate by replacing character with pronounce similar character, shape similar character or other commonly used erroneous character. The adversarial example can achieve the attack without changing the semantic or grammar of the original sentence. Besides, we design and implement Chinese Pronunciation-Shape Similarity (CPSS) algorithm to measure the change between the adversarial example and the original text. We verify the effectiveness of the proposed method with real dataset on Bi-LSTM and TextCNN models. The experiment results show that our method can significantly reduce the classification accuracy of the model.

References

  1. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In Proceedings of 3rd International Conference on Learning Representations (ICLR 2015) . ICLR, San Diego, CA, United states.Google ScholarGoogle Scholar
  2. Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. 2017. Adversarial examples in the physical world. In Proceedings of 5th International Conference on Learning Representations (ICLR 2017) . ICLR, Toulon, France.Google ScholarGoogle Scholar
  3. Suranjana Samanta, and Sameep Mehta. 2017. Towards crafting text adversarial samples. arXiv:1707.02812.Google ScholarGoogle Scholar
  4. Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani B. Srivastava, and Kai-Wei Chang. 2018. Generating natural language adversarial examples. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018) . ACL, Brussels, Belgium, 2890-2896.Google ScholarGoogle ScholarCross RefCross Ref
  5. Matt J. Kusner, Yu Sun, Nicholas I. Kolkin, and Kilian Q. Weinberger. 2015. From word embeddings to document distances. In Proceedings of 32nd International Conference on Machine Learning (ICML 2015) . International Machine Learning Society, Lile, France, 957-966.Google ScholarGoogle Scholar
  6. Wenqi Wang, Run Wang, Lina Wang, and Benxiao Tang. 2019. Adversarial examples generation approach for tendency classification on Chinese texts. Ruan Jian Xue Bao/Journal of Software, 2019, 30(8):2415−2427 (in Chinese). http://www.jos.org.cn/1000-9825/5765.htmGoogle ScholarGoogle Scholar
  7. Xin Tong, Luona Wang, Runzheng Wang, and Jingya Wang. 2020. A Generation Method of Word-level Adversarial Samples for Chinese Text Classification. Netinfo Security, 2020, 20(9): 12-16.Google ScholarGoogle Scholar
  8. Nuo Cheng, G. Chang, H. Gao, G. Pei and Y. Zhang. 2020. WordChange: Adversarial Examples Generation Approach for Chinese Text Classification. IEEE Access, 2020, 8:79561-79572. https://doi.org/10.1109/ACCESS.2020.2988786.Google ScholarGoogle ScholarCross RefCross Ref
  9. C.-L. Liu, M.-H. Lai, K.-W. Tien, Y.-H. Chuang, S.-H. Wu, and C.-Y. Lee. 2011. Visually and Phonologically Similar Characters in Incorrect Chinese Words: Analyses, Identification, and Applications. ACM Transactions on Asian Language Information Processing 10, 2, Article 10 (June 2011), 39 pages. https://doi.org/10.1145/1967293.1967297Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chao-Lin Liu, Kan-Wen Tien, Yi-Hsuan Chuang, Chih-Bin Huang, and Juei-Yu Weng. 2009. Two applications of lexical information to computer-assisted item authoring for elementary chinese. In Proceedings of 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems (IEA/AIE 2009) . Springer Verlag, Tainan, Taiwan, 470-480. https://doi.org/10.1007/978-3-642-02568-6_48Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Shih-Hung Wu, Chao-Lin Liu, and Lung-Hao Lee. 2013. Chinese Spelling Check Evaluation at SIGHAN Bake-off 2013. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing (SIGHAN'13) . ACL, Nagoya, Japan, 35-42.Google ScholarGoogle Scholar
  12. Shengjun Liu, Ningkang Jiang, and Yuanbin Wu. 2020. Visual Attack and Defense on Text. arXiv:2008.10356.Google ScholarGoogle Scholar
  13. Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, and Antoine Bordes. 2017. Supervised learning of universal sentence representations from natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2017) . ACL, Copenhagen, Denmark, 670-680.Google ScholarGoogle ScholarCross RefCross Ref
  14. Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014) . ACL, Doha, Qatar, 1746-1751.Google ScholarGoogle ScholarCross RefCross Ref
  15. Ji Gao, Jack Lanchantin, Mary Lou Soffa, and Yanjun Qi. 2018. Black-box generation of adversarial text sequences to evade deep learning classifiers. In Proceedings of the 2018 IEEE Symposium on Security and Privacy Workshops (SPW 2018) . IEEE, San Francisco, CA, 50-56. https://doi.org/10.1109/SPW.2018.00016Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICCDA '21: Proceedings of the 2021 5th International Conference on Compute and Data Analysis
    February 2021
    194 pages
    ISBN:9781450389112
    DOI:10.1145/3456529

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 July 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited
  • Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format