ABSTRACT
Chinese grammar error correction (CGEC) is a basic application of natural language processing. It can will detect and correct various grammatical errors in the texts. In this paper, we propose a Hybrid Chinese Grammar error checking Model based on Transformer. First, the model introduces Bert pre-training model to process text sequences at the Chinese character level. Next, we integrate point mutual information (PMI) to enable the model to deal with the semantic collocation information between Chinese characters. We benchmark a baseline model against our model on CGED_2018 dataset and Corrupt dataset. The proposed method is better than the baseline model, and obtains state-of-the-art results on CGED_2018 datasets.
- Meng, Yuxian, Xiaoya Li, Xiaofei Sun, Qinghong Han, Arianna Yuan, Jiwei Li. Is Word Segmentation Necessary for Deep Learning of Chinese Representations? [C]//Annual Meeting of the Association for Computational Linguistics. Florence,2019: 3242-3253Google Scholar
- Tsai, J. L., McConkie, G. W. Where Do Chinese Readers Send Their Eyes? [C]//In The Mind's Eye: Cognitive and Applied Aspects of Eye. Elsevier,2003: 159-176.Google Scholar
- Pecina P, Schlesinger P. Combining association measures for collocation extraction [C] // Proceeding Soft of the 21th International Conference on Computational Linguisticsand 44th Annual Meeting of the Association for Computational Linguistics, Sydney. 2006: 651–658.Google Scholar
- Du Liping, Li Xiaoge, Yu gen, Liu Chunli, Liu Rui. Improvement of Chinese word segmentation system based on new word discovery based on mutual information improvement algorithm [J]//Journal of Peking University (NATURAL SCIENCE EDITION). 2016,52 (01): 35-40.Google Scholar
- Wang Chencheng, Yang liner, Wang Yingying, Du Yongping, Yang Erhong. Chinese grammar error correction method based on transformer enhanced architecture [J]//Chinese Journal of information technology. 2020,34 (06): 106-114.Google Scholar
- Devlin J, Chang M W, Le K, BERT:Pre-training of dep bidirectional transformers for language understanding[C]//North American Chapter of the Association for Computational Linguistics:Human Lan-guage Technologies. Minneapolis,2019: 4171- 4186.Google Scholar
- Riseman E M, Hanson A R. A contextual postprocessing system for error correction using binary n-grams[J]. IEEE Transactions on Computers. 1974, 100(5): 480-493.Google ScholarDigital Library
- Islam, Aminul, D. Inkpen. Real-Word Spelling Correction using Google Web 1T 3-grams[C]//Empirical Methods in Natural Language Processing. Singapore,2009: 1241-1249.Google Scholar
- Yang, Yi, Pengjun Xie, Jun Tao, G. Xu, Linlin Li, Si Luo. Alibaba at IJCNLP-2017 Task 1: Embedding Grammatical Features into LSTMs for Chinese Grammatical Error Diagnosis Task[C]//International Joint Conference on Natural Language Processing. Sydney,2017: 41-46.Google Scholar
- Yuan, Zheng, Briscoe, Ted. Grammatical error correction using neural machine translation[J]//North American Chapter of the Association for Computational Linguistics:Human Lan-guage Technologies. San Diego, 2016: 380-386.Google Scholar
- Tan Yongmei, Yang Yixiao, Yang Lin, Liu Shuwen. Automatic syntax error correction method for ESL articles based on LSTM and n-gram [J]//Chinese Journal of information technology. 2018,32 (06): 19-27.Google Scholar
- Chollampatt, Shamil, H. Ng. A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction[C]//The Association for the Advance of Artificial Intelligence Conference on Artificial Intelligence. Virginia,2018: 5755-5762.Google Scholar
- Grundkiewicz, Roman, Marcin Junczys-Dowmunt. Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation[C]//North American Chapter of the Association for Computational Linguistics – Human Language Technologies. New Orleans,2018: 284-290.Google Scholar
- Fu, Kai, J. Huang, Yitao Duan. Youdao's Winning Solution to the NLPCC-2018 Task 2 Challenge: A Neural Machine Translation Approach to Chinese Grammatical Error Correction[C]//Natural Language Processing and Chinese Computing. Hohhot,2018: 341-350.Google Scholar
- Duan Jianyong, Yuan Yang, Wang Hao. Chinese spelling error correction method based on transformer local information and grammar enhancement Architecture [J]//Journal of Peking University (NATURAL SCIENCE EDITION). 2021,57 (01): 61-67.Google Scholar
- Xie Haihua, Li Olin, Li Yabo, Chen Zhiyou, Cheng Jing, LV Xiaoqing, Tang Qi. CPLM-CSC: Chinese typo correction method based on single word level pre training language model [J]//Chinese Journal of information technology. 2021,35 (05): 38-45.Google Scholar
- Papineni, Kishore, S. Roukos, T. Ward, Wei-Jing Zhu. Bleu: a Method for Automatic Evaluation of Machine Translation[C]//Annual Meeting of the Association for Computational Linguistics. Grenoble,2002: 311-318.Google Scholar
- Vaswani A, Shazeer N, Parmar N, Attention is all you need[C]//Advances in neural information processing systems. Red Hook,2017: 5998-6008.Google Scholar
- Xie, Weijian, Peijie Huang, Xinrui Zhang, Kaiduo Hong, Qiang Huang, Bingzhou Chen, Lei Huang. Chinese Spelling Check System Based on N-gram Model[C]//Proceedings of the Eighth SIGHAN Workshop on Chinese Language. Beijing,2015:128-136.Google Scholar
- Wang, Dingmin, Yi Tay, Li Zhong. Confusionset-guided Pointer Networks for Chinese Spelling Check[C]// Annual Meeting of the Association for Computational Linguistics. Florence, 2019: 5780-5785.Google Scholar
- Nguyen, Minh-Thuan, Gia H. Ngo, Nancy F. Chen. [C]/ Domain-Shift Conditioning Using Adaptable Filtering Via Hierarchical Embeddings for Robust Chinese Spell Check /Institute of Electrical and Electronics Engineers Transactions on Audio, Speech, and Language Processing (Volume: 29). 2021: 2027-2036.Google Scholar
Index Terms
- Hybrid Chinese Grammar Error Checking Model Based on Transformer
Recommendations
Panini: a transformer-based grammatical error correction method for Bangla
AbstractThe purpose of the Bangla grammatical error correction task is to spontaneously identify and correct syntactic, morphological, semantic, and punctuation mistakes in written Bangla text using computational models, ultimately enhancing language ...
Hybrid model for Chinese character recognition based on Tesseract-OCR
Optical character recognition (OCR) is an important way to input information into a computer. And text information can be extracted by OCR from an image. Currently, the accuracy rate of Chinese OCR can also be improved. This study proposes a hybrid ...
English Grammar Error Correction Algorithm Based on Classification Model
English grammar error correction algorithm refers to the use of computer programming technology to automatically recognize and correct the grammar errors contained in English text written by nonnative language learners. Classification model is the core of ...
Comments