Skip to main content

Online Self-boost Learning for Chinese Grammatical Error Correction

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13551))

  • 2801 Accesses

Abstract

Grammatical error correction (GEC) aims to automatically detect and correct grammatical errors in sentences. With the development of deep learning, neural machine translation-based approach becomes the mainstream approach for this task. Recently, Chinese GEC attracts a certain amount of attention. However, Chinese GEC has two main problems that limit model learning: (1) insufficient data; (2) flexible error forms. In this paper, we attempt to address these limitations by proposing a method called online self-boost learning for Chinese GEC. Online self-boost learning enables the model to generate multiple instances with different errors for model’s weaknesses from each original sample within each batch and to learn the new data in time without additional I/O. And taking advantage of the features of the new data, a consistency loss is introduced to drive the model to produce similar distributions for different inputs with the same target. Our method is capable of fully exploiting the potential knowledge of the annotated data. Meanwhile, it allows for the use of unlabeled data to extend to a semi-supervised method. Sufficient experiments and analyses show the effectiveness of our method. Besides, our method achieves a state-of-the-art result on the Chinese benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/nusnlp/m2scorer.

  2. 2.

    https://github.com/fastnlp/CPT.

  3. 3.

    https://github.com/pytorch/fairseq.

References

  1. Barrault, L., et al.: Findings of the 2020 conference on machine translation (WMT20). In: WMT@EMNLP 2020, pp. 1–55 (2020)

    Google Scholar 

  2. Brockett, C., Dolan, W.B., Gamon, M.: Correcting ESL errors using phrasal SMT techniques. In: ACL 2006 (2006). https://doi.org/10.3115/1220175.1220207

  3. Chollampatt, S., Ng, H.T.: Connecting the dots: towards human-level grammatical error correction. In: BEA@EMNLP 2017, pp. 327–333 (2017). https://doi.org/10.18653/v1/w17-5037

  4. Chollampatt, S., Taghipour, K., Ng, H.T.: Neural network translation models for grammatical error correction. In: IJCAI 2016, pp. 2768–2774 (2016)

    Google Scholar 

  5. Fu, K., Huang, J., Duan, Y.: Youdao’s winning solution to the NLPCC-2018 task 2 challenge: a neural machine translation approach to Chinese grammatical error correction. In: Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H. (eds.) NLPCC 2018. LNCS (LNAI), vol. 11108, pp. 341–350. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99495-6_29

    Chapter  Google Scholar 

  6. Ge, T., Wei, F., Zhou, M.: Fluency boost learning and inference for neural grammatical error correction. In: Gurevych, I., Miyao, Y. (eds.) ACL 2018, pp. 1055–1065 (2018). https://doi.org/10.18653/v1/P18-1097

  7. Grundkiewicz, R., Junczys-Dowmunt, M., Heafield, K.: Neural grammatical error correction systems with unsupervised pre-training on synthetic data. In: BEA@ACL 2019, pp. 252–263 (2019). https://doi.org/10.18653/v1/w19-4427

  8. Hoffer, E., Ben-Nun, T., Hubara, I., Giladi, N., Hoefler, T., Soudry, D.: Augment your batch: Improving generalization through instance repetition. In: CVPR 2020, pp. 8126–8135 (2020). https://doi.org/10.1109/CVPR42600.2020.00815

  9. Junczys-Dowmunt, M., Grundkiewicz, R., Guha, S., Heafield, K.: Approaching neural grammatical error correction as a low-resource machine translation task. In: NAACL-HLT 2018, pp. 595–606 (2018). https://doi.org/10.18653/v1/n18-1055

  10. Katsumata, S., Komachi, M.: Stronger baselines for grammatical error correction using a pretrained encoder-decoder model. In: AACL/IJCNLP 2020, pp. 827–832 (2020)

    Google Scholar 

  11. Kiyono, S., Suzuki, J., Mita, M., Mizumoto, T., Inui, K.: An empirical study of incorporating pseudo data into grammatical error correction. In: EMNLP-IJCNLP 2019, pp. 1236–1242 (2019). https://doi.org/10.18653/v1/D19-1119

  12. Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: ACL 2020, pp. 7871–7880 (2020). https://doi.org/10.18653/v1/2020.acl-main.703

  13. Liang, X., et al.: R-drop: regularized dropout for neural networks. In: NeurIPS 2021, pp. 10890–10905 (2021)

    Google Scholar 

  14. Loshchilov, I., Hutter, F.: Fixing weight decay regularization in Adam. CoRR abs/1711.05101 (2017)

    Google Scholar 

  15. Ma, X., Gao, Y., Hu, Z., Yu, Y., Deng, Y., Hovy, E.H.: Dropout with expectation-linear regularization. In: ICLR 2017 (2017)

    Google Scholar 

  16. Naber, D.: A rule-based style and grammar checker. University of Bielefeld (2003)

    Google Scholar 

  17. Ren, H., Yang, L., Xun, E.: A sequence to sequence learning for Chinese grammatical error correction. In: Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H. (eds.) NLPCC 2018. LNCS (LNAI), vol. 11109, pp. 401–410. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99501-4_36

    Chapter  Google Scholar 

  18. Sajjadi, M., Javanmardi, M., Tasdizen, T.: Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In: NIPS 2016, pp. 1163–1171 (2016)

    Google Scholar 

  19. Shao, Y., et al.: CPT: a pre-trained unbalanced transformer for both Chinese language understanding and generation. CoRR abs/2109.05729 (2021)

    Google Scholar 

  20. Shen, D., Zheng, M., Shen, Y., Qu, Y., Chen, W.: A simple but tough-to-beat data augmentation approach for natural language understanding and generation. CoRR abs/2009.13818 (2020)

    Google Scholar 

  21. Sun, X., Ge, T., Ma, S., Li, J., Wei, F., Wang, H.: A unified strategy for multilingual grammatical error correction with pre-trained cross-lingual language model. CoRR abs/2201.10707 (2022)

    Google Scholar 

  22. Wang, C., Yang, L., Yingying Wang, Y.D., Yang., E.: Chinese grammatical error correction method based on transformer enhanced architecture, no. 6, p. 9 (2020)

    Google Scholar 

  23. Wang, H., Kurosawa, M., Katsumata, S., Komachi, M.: Chinese grammatical correction using BERT-based pre-trained model. In: AACL/IJCNLP 2020, pp. 163–168 (2020)

    Google Scholar 

  24. Wang, L., Zheng, X.: Improving grammatical error correction models with purpose-built adversarial examples. In: EMNLP 2020, pp. 2858–2869 (2020). https://doi.org/10.18653/v1/2020.emnlp-main.228

  25. Yuan, Z., Briscoe, T.: Grammatical error correction using neural machine translation. In: NAACL HLT 2016, pp. 380–386 (2016). https://doi.org/10.18653/v1/n16-1042

  26. Zhao, Z., Wang, H.: MaskGEC: improving neural grammatical error correction via dynamic masking. In: AAAI 2020, pp. 1226–1233 (2020)

    Google Scholar 

  27. Zhou, J., Li, C., Liu, H., Bao, Z., Xu, G., Li, L.: Chinese grammatical error correction using statistical and neural models. In: Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H. (eds.) NLPCC 2018. LNCS (LNAI), vol. 11109, pp. 117–128. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99501-4_10

    Chapter  Google Scholar 

Download references

Acknowledgement

This research is supported by the National Natural Science Foundation of China under the grant No. 61976119 and the Natural Science Foundation of Tianjin under the grant No. 18ZXZNGX00310.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xie, J., Dang, K., Liu, J. (2022). Online Self-boost Learning for Chinese Grammatical Error Correction. In: Lu, W., Huang, S., Hong, Y., Zhou, X. (eds) Natural Language Processing and Chinese Computing. NLPCC 2022. Lecture Notes in Computer Science(), vol 13551. Springer, Cham. https://doi.org/10.1007/978-3-031-17120-8_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17120-8_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17119-2

  • Online ISBN: 978-3-031-17120-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics