Skip to main content

Using Large N-gram for Vietnamese Spell Checking

  • Conference paper
Knowledge and Systems Engineering

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 326))

Abstract

Spell checking is a process including detecting, correcting or providing spelling suggestions for misspelled words. In this paper, we present our spell checking system relied on the context and our experimental results when doing for Vietnamese. This system uses N-gram model with large corpus. N-grams is compressed to save the memory. Furthermore, we take the contexts in both sides of syllables to improve the system’s performance. Our system got high accuracy approximate 94% F-score on the Vietnamese text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blair, C.: A program for correcting errors. Information and Control, 60–70 (1960)

    Google Scholar 

  2. Carlson, A., Rosen, J., Roth, D.: Scaling up context-sensitive text correction. In: Proceedings of the 13th Innovative Applications of Artificial Intelligence Conference, pp. 45–50 (2001)

    Google Scholar 

  3. Carlson, A., Fette, I.: Memory-based Context-Sensitive Spelling Correction at Web Scale. In: Proceedings of the 6th International Conference on Machine Learning and Applications, pp. 166–171 (2007)

    Google Scholar 

  4. Chen, Y.Z., Wu, S.H., Yang, P.C., Ku, T., Chen, G.D.: Improve the detection of improperly used Chinese characters in student’s essays with error model. In: Int. J. Cont. Engineering Education and Life-Long Learning, pp. 103–116 (2001)

    Google Scholar 

  5. Cucerzan, S., Brill, E.: Spelling correction as an iterative process that exploits the collective knowledge of web users. In: Proceedings of EMNLP, pp. 293–300 (2004)

    Google Scholar 

  6. Damerau, F.: A technique for computer detection and correction of spelling errors. Communications of the ACM 7, 171–176 (1964)

    Article  Google Scholar 

  7. Deorowicz, S., Ciura, M.G.: Correcting Spelling Errors by Modelling Their Causes. International Journal of Applied Mathematics and Computer Science 15, 275–285 (2005)

    Google Scholar 

  8. Golding, A., Roth, D.: A winnow-based approach to context-sensitive spelling correction. Machine Learning 34(1-3), 107–130 (1999)

    Article  MATH  Google Scholar 

  9. Islam, A., Inkpen, D.: Real-word spelling correction using googleweb 1t 3-grams. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2009), pp. 1241–1249 (2009)

    Google Scholar 

  10. Liu, W., Allison, B., Guthrie, L.: Professor or screaming beast? Detecting words misuse in Chinese. In: The 6th edition of the Language Resources and Evaluation Conference (2008)

    Google Scholar 

  11. Liu, C.L., Lai, M.H., Tien, K.W., Chuang, Y.H., Wu, S.H., Lee, C.Y.: Visually and phonologically similar characters in incorrect Chinese words: Analyses, identification, and applications. ACM Transactions on Asian Language Information Processing, 1–39 (2011)

    Google Scholar 

  12. Verberne, S.: Context-sensitive spell checking based on word trigram probabilities. Master thesis, University of Nijmegen (2002)

    Google Scholar 

  13. Whitelaw, C., Hutchinson, B., Chung, G.Y., Ellis, G.: Using the Web for Language Independent Spellchecking and Autocorrection. In: Proceedings of Conference on Empirical Methods In Natural Language Processing (EMNLP 2009), pp. 890–899 (2009)

    Google Scholar 

  14. Wu, S.H., Chen, Y.Z., Yang, P.C., Ku, T., Liu, C.L.: Reducing the False Alarm Rate of Chinese Character Error Detection and Correction. In: Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP 2010), pp. 54–61 (2010)

    Google Scholar 

  15. Zhang, L., Zhou, M., Huang, C.N., Pan, H.H.: Automatic detecting/correcting errors in Chinese text by an approximate word-matching algorithm. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 248–254 (2000)

    Google Scholar 

  16. Li, J., Wang, X.: Combine trigram and Automatic Weight Distribution in Chinese Spelling ErrorCorrection. Journal of Computer Science and Technology Archive 17(6), 915–923 (2002)

    Article  MATH  Google Scholar 

  17. Mitton, R.: Ordering the Suggestions of a Spellchecker Without Using Context. Natural Language Engineering 15, 173–192 (2008)

    Article  Google Scholar 

  18. Hai, N.D., Nhi, N.P.H.: Syntactic parser in Vietnamese sentences and its application in Spell Checking. In: Vietnamese, bachelor thesis, in University of Science Ho Chi Minh city (1999)

    Google Scholar 

  19. Duy, N.T.N., Dien, D.: An approach in Vietnamese spell checking. In: Vietnamese, bachelor thesis in University of Science Ho Chi Minh city (2004)

    Google Scholar 

  20. Quang, N.H.T.: Language model and word segmentation in Vietnamese Spell Checking. In: Vietnamese, bachelor thesis in University of Engineering and Technology, Hanoi National University (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nguyen Thi Xuan Huong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Thi Xuan Huong, N., Dang, TT., Nguyen, TT., Le, AC. (2015). Using Large N-gram for Vietnamese Spell Checking. In: Nguyen, VH., Le, AC., Huynh, VN. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 326. Springer, Cham. https://doi.org/10.1007/978-3-319-11680-8_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11680-8_49

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11679-2

  • Online ISBN: 978-3-319-11680-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics