Abstract
The researches on spelling correction aiming at detecting errors in texts tend to focus on context-sensitive spelling error correction, which is more difficult than traditional isolated-word error correction. A novel and efficient algorithm for the system of Chinese spelling error correction, CInsunSpell, is presented. In this system, the work of correction includes two parts: checking phase and correcting phase. At the first phase, a Trigram algorithm within one fixed-size window is designed to located potential errors in local area. The second phase employs a new method of automatically and dynamically distributing weights among the characters in the confusion set as well as in the Bayesian language model. The tactics use above exhibits good performances.
Similar content being viewed by others
References
Kukich K. Techniques for automatically correcting words in text.ACM Computing Surveys, 1992, 24(4): 377–439.
Mays Eric, Damerau F J, Mercer Robert L. Context-based spelling correction.Information Processing and Management, 1991, 27(5): 517–522.
Golding Andrew R. A Bayesian hybrid method for context-sensitive spelling correction. InProc. the Third Workshop on Very Large Corpora, MIT, Cambridge, Massachusetts, USA, 1995, pp.39–53.
Golding Andrew R, Schabes Yves. Combining trigram-based and feature-based methods for context-sensitive spelling correction. InProc. the 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, CA, 1996, pp. 71–78.
Roth Dan, Zelenko Dmitry. Part of speech tagging using a network of linear separators. InProc. COLING’98, Montreal, Canada, 1998, pp.1136–1142.
Golding Andrew R. A window-based approach to context-sensitive spelling correction.Machine Learning, February, 1999, 34: pp.107–130.
Golding Andrew R, Roth Dan. Applying window to context-sensitive spelling correction. InMachine Learning: Proceedings of the 13th International Conference, 1996, pp.182–190.
Kukich K. Spelling correction for the telecommunications network for the deaf.Communication ACM, 1992, 35(5): 80–90.
Littlestone N. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm.Machine Learning, 1988, 2(4): 285–318.
Littlestone N, Warmuth M K. The weighted majority algorithm.Information and Computation, 1994, 108(2): 212–261.
Meknavin Surapant. Combining trigram and window in Thai OCR error correction. InProc. COLING’98, Montreal, Canada, 1998, pp. 836–842.
Nagata Masaaki. Japanese OCR error correction using character shape similarity and statistical language. InProc. COLING’98, Montreal, Canada, 1998, pp.922–928.
Schneider David, McCoy Kathleen F. Recognizing syntactic errors in the writing of second language learners. InProc. COLING’98, 1998, pp.1198–1204.
Oflazer Kemal. Error-tolerant finite state recognition with applications to morphological analysis and spelling correction.Computational Linguistics, 1996, 22(1): 73–89.
Li Jianhua, Wang Xiaolong, Sun Yuqi. The research of Chinese text proofreading algorithms.High Technology Letters, 2000, 6(1): 1–7.
Ng Hwee Tou, Zelle John. Corpus-based approaches to semantic interpretation in natural language processing.AI Magazine, Winter, 1997, pp. 45–64.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research is supported by the National Natural Science Foundation of China under Grant No.69973015.
LI Jianhua was born in 1965. She is a Ph.D. candidate in the School of Computer Science and Technology, Harbin Institute of Technology. Her research interests include text error correction, natural language processing.
WANG Xiaolong was born in 1955. He is a professor and a Ph.D. supervisor in the School of Computer Science and Technology, Harbin Institute of Technology. His research interests include artificial intelligence, natural language processing.
Rights and permissions
About this article
Cite this article
Li, J., Wang, X. Combining trigram and automatic weight distribution in Chinese spelling error correction. J. Compt. Sci. & Technol. 17, 915–923 (2002). https://doi.org/10.1007/BF02960784
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02960784