Skip to main content

Chinese Spelling Error Detection and Correction Based on Knowledge Graph

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13248))

Abstract

Spelling error correction is a task in which errors in a natural language sentence can be detected and corrected. In this paper, we consider Chinese spelling error correction (CSC) for generality. A previous state-of-the-art method for this task connects a detection network with a correction network based on BERT by soft masking. This method does solve the problem that BERT has the insufficient capability to detect the position of errors. However, we find that it still lacks sufficient inference ability and world knowledge by analyzing its results. To solve this issue, we propose a novel correction approach based on knowledge graphs (KGs), which queries triples from KGs and injects them into the sentences as domain knowledge. Moreover, we leverage MLM as correction to improve the inference ability of BERT and adopt a denoising filter to increase the accuracy of results. Experimental results on the SIGHAN dataset verify that the performance of our approach is better than state-of-the-art methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://kanji-database.sourceforge.net/

References

  1. Yu, J., Li, Z.: Chinese spelling error detection and correction based on language model, pronunciation, and shape. In: 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 220–223. Association for Computational Linguistics, Wuhan, China (2014)

    Google Scholar 

  2. Yu, L.-C., Lee, L.-H., Tseng, Y.-H., Chen, H.-H.: Overview of SIGHAN 2014 bake-off for Chinese spelling check. In: 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 126–132. Association for Computational Linguistics, Wuhan, China (2014)

    Google Scholar 

  3. Wang, D., Song, Y., Li, J., Han, J., Zhang, H.: A hybrid approach to automatic corpus generation for Chinese spelling check. In: 23rd Conference on Empirical Methods in Natural Language Processing, pp. 2517–2527. Association for Computational Linguistics, Brussels, Belgium (2018)

    Google Scholar 

  4. Afli, H., Qiu, Z., Way, A., Sheridan, P.: Using SMT for OCR error correction of historical texts. In: 10th International Conference on Language Resources and Evaluation, pp. 962–966. European Language Resources Association, Portorož, Slovenia (2016)

    Google Scholar 

  5. Liu, C.-L., Lai, M.-H., Chuang, Y.-H., Lee, C.-Y.: Visually and phonologically similar characters in incorrect simplified Chinese words. In: 23rd International Conference on Computational Linguistics, pp. 739–747. Chinese Information Processing Society of China, Beijing, China (2010)

    Google Scholar 

  6. Guo, J., Sainath, T.N., Weiss, R.J.: A spelling correction model for end-to-end speech recognition. In: 43rd International Conference on Acoustics, Speech and Signal Processing, pp. 5651–5655. IEEE, Brighton, United Kingdom (2019)

    Google Scholar 

  7. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: 23rd Conference of the North American Chapter of the Association for Computational Linguistics, pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019)

    Google Scholar 

  8. Cui, Y., Che, W., Liu, T., Qin, B., Wang, S., Hu, G.: Revisiting pre-trained models for Chinese natural language processing. In: 25th Findings of Empirical Methods in Natural Language Processing, pp. 657–668. Association for Computational Linguistics (2020)

    Google Scholar 

  9. Liu, W., et al.: K-BERT: enabling language representation with knowledge graph. In: 34th AAAI Conference on Artificial Intelligence, pp. 2901–2908. AAAI Press, New York (2020)

    Google Scholar 

  10. Xu, B., et al.: CN-DBpedia: a never-ending Chinese knowledge extraction system. In: 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, pp. 428–438. Springer, Arras, France (2017)

    Google Scholar 

  11. Dong, Z., Dong, Q., Hao, C.: HowNet and its computation of meaning. In: 23rd International Conference on Computational Linguistics: Demonstrations, pp. 53–56. Association for Computational Linguistics, Beijing, China (2010)

    Google Scholar 

  12. Xiong, J., Zhang, Q., Zhang, S., Hou, J., Cheng, X.: HANSpeller: a unified framework for Chinese spelling correction. Int. J. Comput. Linguist. Chinese Lang. Process. 20(1), 38–45 (2015)

    Google Scholar 

  13. Hong, Y., Yu, X., He, N., Liu, N., Liu, J.: FASPell: a fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm. In: 5th Workshop on Noisy User-generated Text, pp. 160–169. Association for Computational Linguistics, Hong Kong, China (2019)

    Google Scholar 

  14. Wang, D., Tay, Y., Li, Z.: Confusionset-guided pointer networks for Chinese spelling check. In: 57th Conference of the Association for Computational Linguistics, pp. 5780–5785. Association for Computational Linguistics, Florence, Italy (2019)

    Google Scholar 

  15. Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked BERT. In: 58th Annual Meeting of the Association for Computational Linguistics, pp. 882–890. Association for Computational Linguistics, Online (2020)

    Google Scholar 

  16. Gao, J., Li, X., Micol, D., Quirk, C., Sun, X.: A large scale ranker-based system for search query spelling correction. In: 23rd International Conference on Computational Linguistics, pp. 358–366. Tsinghua University Press, Beijing, China (2010)

    Google Scholar 

  17. Tseng, Y.-H., Lee, L.-H., Chang, L.-P., Chen, H.-H.: Introduction to sighan 2015 bake-off for Chinese spelling check. In: 8th SIGHAN Workshop on Chinese Language Processing, pp. 32–37. Association for Computational Linguistics, Beijing, China (2015)

    Google Scholar 

  18. Zhao, H., Cai, D., Xin, Y., Wang, Y., Jia, Z.: A hybrid model for Chinese spelling check. ACM Trans. Asian Low Resour. Lang. Inf. Process. 16(3), 21:1–21:22 (2017)

    Google Scholar 

  19. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: 23rd Workshop: Analyzing and Interpreting Neural Networks for NLP, pp. 353–355. Association for Computational Linguistics, Brussels, Belgium (2018)

    Google Scholar 

  20. Wu, Y., et al.: Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (2016). http://arxiv.org/abs/1609.08144

  21. Cui, Y., et al.: Pre-training with whole word masking for Chinese BERT. http://arxiv.org/abs/1906.08101 (2019)

  22. Brown, T.B., et al.: Language models are few-shot learners. In: 34rd Annual Conference on Neural Information Processing Systems. Association for Computational Linguistics, Vancouver, virtual (2020)

    Google Scholar 

  23. Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: 33rd Annual Conference on Neural Information Processing Systems, pp. 5754–5764. Association for Computational Linguistics, Vancouver, BC, Canada (2019)

    Google Scholar 

  24. Liu, Y., et al.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692 (2019)

  25. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. In: 8th International Conference on Learning Representations. OpenReview.net, Addis Ababa, Ethiopia (2020)

    Google Scholar 

  26. Song, K., Tan, X., Qin, T., Lu, J., Liu, T.-Y.: MASS: masked sequence to sequence pre-training for language generation. In: 36th International Conference on Machine Learning, pp. 5926–5936. PMLR, California, USA (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, X., Zhou, J., Wang, S., Li, H., Jia, J., Zhu, J. (2022). Chinese Spelling Error Detection and Correction Based on Knowledge Graph. In: Rage, U.K., Goyal, V., Reddy, P.K. (eds) Database Systems for Advanced Applications. DASFAA 2022 International Workshops. DASFAA 2022. Lecture Notes in Computer Science, vol 13248. Springer, Cham. https://doi.org/10.1007/978-3-031-11217-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-11217-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-11216-4

  • Online ISBN: 978-3-031-11217-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics