Chinese Spelling Error Detection and Correction Based on Knowledge Graph

Sun, Ximin; Zhou, Jing; Wang, Shuai; Li, Huichao; Jia, Jiangkai; Zhu, Jiazheng

doi:10.1007/978-3-031-11217-1_11

Ximin Sun¹⁰,
Jing Zhou¹⁰,
Shuai Wang¹¹,
Huichao Li¹¹,
Jiangkai Jia¹¹ &
…
Jiazheng Zhu¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13248))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1381 Accesses
1 Citations

Abstract

Spelling error correction is a task in which errors in a natural language sentence can be detected and corrected. In this paper, we consider Chinese spelling error correction (CSC) for generality. A previous state-of-the-art method for this task connects a detection network with a correction network based on BERT by soft masking. This method does solve the problem that BERT has the insufficient capability to detect the position of errors. However, we find that it still lacks sufficient inference ability and world knowledge by analyzing its results. To solve this issue, we propose a novel correction approach based on knowledge graphs (KGs), which queries triples from KGs and injects them into the sentences as domain knowledge. Moreover, we leverage MLM as correction to improve the inference ability of BERT and adopt a denoising filter to increase the accuracy of results. Experimental results on the SIGHAN dataset verify that the performance of our approach is better than state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Generalized Strategy of Chinese Grammatical Error Diagnosis Based on Task Decomposition and Transformation

Chinese Grammar Correction Model Based on Semantic Enhancement and Feedback Mechanism

A Combination of BERT and Transformer for Vietnamese Spelling Correction

Notes

1.
http://kanji-database.sourceforge.net/

References

Yu, J., Li, Z.: Chinese spelling error detection and correction based on language model, pronunciation, and shape. In: 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 220–223. Association for Computational Linguistics, Wuhan, China (2014)
Google Scholar
Yu, L.-C., Lee, L.-H., Tseng, Y.-H., Chen, H.-H.: Overview of SIGHAN 2014 bake-off for Chinese spelling check. In: 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 126–132. Association for Computational Linguistics, Wuhan, China (2014)
Google Scholar
Wang, D., Song, Y., Li, J., Han, J., Zhang, H.: A hybrid approach to automatic corpus generation for Chinese spelling check. In: 23rd Conference on Empirical Methods in Natural Language Processing, pp. 2517–2527. Association for Computational Linguistics, Brussels, Belgium (2018)
Google Scholar
Afli, H., Qiu, Z., Way, A., Sheridan, P.: Using SMT for OCR error correction of historical texts. In: 10th International Conference on Language Resources and Evaluation, pp. 962–966. European Language Resources Association, Portorož, Slovenia (2016)
Google Scholar
Liu, C.-L., Lai, M.-H., Chuang, Y.-H., Lee, C.-Y.: Visually and phonologically similar characters in incorrect simplified Chinese words. In: 23rd International Conference on Computational Linguistics, pp. 739–747. Chinese Information Processing Society of China, Beijing, China (2010)
Google Scholar
Guo, J., Sainath, T.N., Weiss, R.J.: A spelling correction model for end-to-end speech recognition. In: 43rd International Conference on Acoustics, Speech and Signal Processing, pp. 5651–5655. IEEE, Brighton, United Kingdom (2019)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: 23rd Conference of the North American Chapter of the Association for Computational Linguistics, pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019)
Google Scholar
Cui, Y., Che, W., Liu, T., Qin, B., Wang, S., Hu, G.: Revisiting pre-trained models for Chinese natural language processing. In: 25th Findings of Empirical Methods in Natural Language Processing, pp. 657–668. Association for Computational Linguistics (2020)
Google Scholar
Liu, W., et al.: K-BERT: enabling language representation with knowledge graph. In: 34th AAAI Conference on Artificial Intelligence, pp. 2901–2908. AAAI Press, New York (2020)
Google Scholar
Xu, B., et al.: CN-DBpedia: a never-ending Chinese knowledge extraction system. In: 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, pp. 428–438. Springer, Arras, France (2017)
Google Scholar
Dong, Z., Dong, Q., Hao, C.: HowNet and its computation of meaning. In: 23rd International Conference on Computational Linguistics: Demonstrations, pp. 53–56. Association for Computational Linguistics, Beijing, China (2010)
Google Scholar
Xiong, J., Zhang, Q., Zhang, S., Hou, J., Cheng, X.: HANSpeller: a unified framework for Chinese spelling correction. Int. J. Comput. Linguist. Chinese Lang. Process. 20(1), 38–45 (2015)
Google Scholar
Hong, Y., Yu, X., He, N., Liu, N., Liu, J.: FASPell: a fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm. In: 5th Workshop on Noisy User-generated Text, pp. 160–169. Association for Computational Linguistics, Hong Kong, China (2019)
Google Scholar
Wang, D., Tay, Y., Li, Z.: Confusionset-guided pointer networks for Chinese spelling check. In: 57th Conference of the Association for Computational Linguistics, pp. 5780–5785. Association for Computational Linguistics, Florence, Italy (2019)
Google Scholar
Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked BERT. In: 58th Annual Meeting of the Association for Computational Linguistics, pp. 882–890. Association for Computational Linguistics, Online (2020)
Google Scholar
Gao, J., Li, X., Micol, D., Quirk, C., Sun, X.: A large scale ranker-based system for search query spelling correction. In: 23rd International Conference on Computational Linguistics, pp. 358–366. Tsinghua University Press, Beijing, China (2010)
Google Scholar
Tseng, Y.-H., Lee, L.-H., Chang, L.-P., Chen, H.-H.: Introduction to sighan 2015 bake-off for Chinese spelling check. In: 8th SIGHAN Workshop on Chinese Language Processing, pp. 32–37. Association for Computational Linguistics, Beijing, China (2015)
Google Scholar
Zhao, H., Cai, D., Xin, Y., Wang, Y., Jia, Z.: A hybrid model for Chinese spelling check. ACM Trans. Asian Low Resour. Lang. Inf. Process. 16(3), 21:1–21:22 (2017)
Google Scholar
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: 23rd Workshop: Analyzing and Interpreting Neural Networks for NLP, pp. 353–355. Association for Computational Linguistics, Brussels, Belgium (2018)
Google Scholar
Wu, Y., et al.: Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (2016). http://arxiv.org/abs/1609.08144
Cui, Y., et al.: Pre-training with whole word masking for Chinese BERT. http://arxiv.org/abs/1906.08101 (2019)
Brown, T.B., et al.: Language models are few-shot learners. In: 34rd Annual Conference on Neural Information Processing Systems. Association for Computational Linguistics, Vancouver, virtual (2020)
Google Scholar
Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: 33rd Annual Conference on Neural Information Processing Systems, pp. 5754–5764. Association for Computational Linguistics, Vancouver, BC, Canada (2019)
Google Scholar
Liu, Y., et al.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692 (2019)
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. In: 8th International Conference on Learning Representations. OpenReview.net, Addis Ababa, Ethiopia (2020)
Google Scholar
Song, K., Tan, X., Qin, T., Lu, J., Liu, T.-Y.: MASS: masked sequence to sequence pre-training for language generation. In: 36th International Conference on Machine Learning, pp. 5926–5936. PMLR, California, USA (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

State Grid Electronic Commerce Co., Ltd./State Grid Financial Technology Group, Beijing, China
Ximin Sun & Jing Zhou
State Grid Ecommerce Technology Co., Ltd., Beijing, China
Shuai Wang, Huichao Li & Jiangkai Jia
College of Intelligence and Computing, Tianjin University, Peiyang Park Campus, Tianjin, China
Jiazheng Zhu

Authors

Ximin Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Huichao Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiangkai Jia
View author publications
You can also search for this author in PubMed Google Scholar
Jiazheng Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Zhou .

Editor information

Editors and Affiliations

University of Aizu, Aizu, Japan
Uday Kiran Rage
Indraprastha Institute of Information Technology, Delhi, India
Vikram Goyal
Data Sciences and Analytics Center, International Institute of Information Technology, Hyderabad, Telangana, India
P. Krishna Reddy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, X., Zhou, J., Wang, S., Li, H., Jia, J., Zhu, J. (2022). Chinese Spelling Error Detection and Correction Based on Knowledge Graph. In: Rage, U.K., Goyal, V., Reddy, P.K. (eds) Database Systems for Advanced Applications. DASFAA 2022 International Workshops. DASFAA 2022. Lecture Notes in Computer Science, vol 13248. Springer, Cham. https://doi.org/10.1007/978-3-031-11217-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-11217-1_11
Published: 16 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11216-4
Online ISBN: 978-3-031-11217-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Chinese Spelling Error Detection and Correction Based on Knowledge Graph