Skip to main content
Log in

Combining trigram and automatic weight distribution in Chinese spelling error correction

  • Correspondence
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

The researches on spelling correction aiming at detecting errors in texts tend to focus on context-sensitive spelling error correction, which is more difficult than traditional isolated-word error correction. A novel and efficient algorithm for the system of Chinese spelling error correction, CInsunSpell, is presented. In this system, the work of correction includes two parts: checking phase and correcting phase. At the first phase, a Trigram algorithm within one fixed-size window is designed to located potential errors in local area. The second phase employs a new method of automatically and dynamically distributing weights among the characters in the confusion set as well as in the Bayesian language model. The tactics use above exhibits good performances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Kukich K. Techniques for automatically correcting words in text.ACM Computing Surveys, 1992, 24(4): 377–439.

    Article  Google Scholar 

  2. Mays Eric, Damerau F J, Mercer Robert L. Context-based spelling correction.Information Processing and Management, 1991, 27(5): 517–522.

    Article  Google Scholar 

  3. Golding Andrew R. A Bayesian hybrid method for context-sensitive spelling correction. InProc. the Third Workshop on Very Large Corpora, MIT, Cambridge, Massachusetts, USA, 1995, pp.39–53.

    Google Scholar 

  4. Golding Andrew R, Schabes Yves. Combining trigram-based and feature-based methods for context-sensitive spelling correction. InProc. the 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, CA, 1996, pp. 71–78.

  5. Roth Dan, Zelenko Dmitry. Part of speech tagging using a network of linear separators. InProc. COLING’98, Montreal, Canada, 1998, pp.1136–1142.

  6. Golding Andrew R. A window-based approach to context-sensitive spelling correction.Machine Learning, February, 1999, 34: pp.107–130.

    Article  MATH  Google Scholar 

  7. Golding Andrew R, Roth Dan. Applying window to context-sensitive spelling correction. InMachine Learning: Proceedings of the 13th International Conference, 1996, pp.182–190.

  8. Kukich K. Spelling correction for the telecommunications network for the deaf.Communication ACM, 1992, 35(5): 80–90.

    Article  Google Scholar 

  9. Littlestone N. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm.Machine Learning, 1988, 2(4): 285–318.

    Google Scholar 

  10. Littlestone N, Warmuth M K. The weighted majority algorithm.Information and Computation, 1994, 108(2): 212–261.

    Article  MATH  MathSciNet  Google Scholar 

  11. Meknavin Surapant. Combining trigram and window in Thai OCR error correction. InProc. COLING’98, Montreal, Canada, 1998, pp. 836–842.

  12. Nagata Masaaki. Japanese OCR error correction using character shape similarity and statistical language. InProc. COLING’98, Montreal, Canada, 1998, pp.922–928.

  13. Schneider David, McCoy Kathleen F. Recognizing syntactic errors in the writing of second language learners. InProc. COLING’98, 1998, pp.1198–1204.

  14. Oflazer Kemal. Error-tolerant finite state recognition with applications to morphological analysis and spelling correction.Computational Linguistics, 1996, 22(1): 73–89.

    Google Scholar 

  15. Li Jianhua, Wang Xiaolong, Sun Yuqi. The research of Chinese text proofreading algorithms.High Technology Letters, 2000, 6(1): 1–7.

    Google Scholar 

  16. Ng Hwee Tou, Zelle John. Corpus-based approaches to semantic interpretation in natural language processing.AI Magazine, Winter, 1997, pp. 45–64.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Jianhua.

Additional information

This research is supported by the National Natural Science Foundation of China under Grant No.69973015.

LI Jianhua was born in 1965. She is a Ph.D. candidate in the School of Computer Science and Technology, Harbin Institute of Technology. Her research interests include text error correction, natural language processing.

WANG Xiaolong was born in 1955. He is a professor and a Ph.D. supervisor in the School of Computer Science and Technology, Harbin Institute of Technology. His research interests include artificial intelligence, natural language processing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Wang, X. Combining trigram and automatic weight distribution in Chinese spelling error correction. J. Compt. Sci. & Technol. 17, 915–923 (2002). https://doi.org/10.1007/BF02960784

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02960784

Key words

Navigation