ABSTRACT
This paper presents a new Vietnamese spelling correction system that allows users to correct spelling errors in their text. Our system is an interactive writing assistant that integrates advanced technologies in natural language processing to (i) identify spelling errors and (ii) replace those errors with their corrected version. To the best of our knowledge, our system is the first Vietnamese spelling correction tool that interacts with the users via Web Interface, Microsoft Word and Chrome extensions to provide the best user experience. We also perform automatic and human evaluations to demonstrate the effectiveness of our system. Our system is publicly available at https://grammar.vinai.io/.
- Christopher Bryant, Mariano Felice, and Ted Briscoe. 2017. Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction. In Proceedings of ACL. 793–805.Google ScholarCross Ref
- Kenneth W Church and William A Gale. 1991. Probability scoring for spelling correction. Statistics and Computing 1, 2 (1991), 93–103.Google ScholarCross Ref
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT. 4171–4186.Google Scholar
- Dinh-Truong Do, Ha Thanh Nguyen, Thang Ngoc Bui, and Hieu Dinh Vo. 2021. VSEC: Transformer-Based Model for Vietnamese Spelling Correction. In Proceedings of PRICAI. 259–272.Google ScholarDigital Library
- Mariano Felice, Christopher Bryant, and Ted Briscoe. 2016. Automatic Extraction of Learner Errors in ESL Sentences Using Linguistically Enhanced Alignments. In Proceedings of COLING. 825–835.Google Scholar
- Claudia Leacock, Martin Chodorow, Michael Gamon, and Joel Tetreault. 2010. Automated grammatical error detection for language learners. Synthesis lectures on human language technologies 3, 1(2010), 1–134.Google Scholar
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692 (2019).Google Scholar
- Ritika Mishra and Navjot Kaur. 2013. A survey of spelling error detection and correction techniques. International Journal of Computer Trends and Technology 4, 3 (2013), 372–374.Google Scholar
- Trung Hieu Ngo, Ham Duong Tran, Tin Huynh, and Kiem Hoang. 2022. A Combination of BERT and Transformer for Vietnamese Spelling Correction. In Proceedings of ACIIDS, Part I. 545–558.Google ScholarDigital Library
- Dat Quoc Nguyen and Anh Tuan Nguyen. 2020. PhoBERT: Pre-trained language models for Vietnamese. In Findings of EMNLP. 1037–1042.Google Scholar
- Dat Quoc Nguyen, Thanh Vu, Dai Quoc Nguyen, Mark Dras, and Mark Johnson. 2017. From Word Segmentation to POS Tagging for Vietnamese. In Proceedings of ALTA. 108–113.Google Scholar
- Ha Thanh Nguyen, Tran Binh Dang, and Le Minh Nguyen. 2019. Deep Learning Approach for Vietnamese Consonant Misspell Correction. In Proceedings of PACLING. 497–504.Google Scholar
- Linh Thuy Nguyen, Ban Phuoc Dao, Duc-Vu Nguyen, and Ngan Luu-Thuy Nguyen. 2020. Vietnamese Context-Sensitive Malicious Spelling Error Correction. In Proceedings of NICS. 48–53.Google ScholarCross Ref
- Phuong H. Nguyen, Thuan D. Ngo, Dung A. Phan, Thu P. T. Dinh, and Thang Q. Huynh. 2008. Vietnamese spelling detection and correction using Bi-gram, Minimum Edit Distance, SoundEx algorithms with some additional heuristics. In Proceedings of RIVF. 96–102.Google ScholarCross Ref
- Thien Hai Nguyen, Tuan-Duy H Nguyen, Duy Phung, Duy Tran-Cong Nguyen, Hieu Minh Tran, Manh Luong, Tin Duy Vo, Hung Hai Bui, Dinh Phung, and Dat Quoc Nguyen. 2022. A Vietnamese-English Neural Machine Translation System. In Proceedings INTERSPEECH. 5543–5544.Google Scholar
- Kostiantyn Omelianchuk, Vitaliy Atrasevych, Artem Chernodub, and Oleksandr Skurzhanskyi. 2020. GECToR – Grammatical Error Correction: Tag, Not Rewrite. In Proceedings of BEA. 163–170.Google Scholar
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of ACL. 1715–1725.Google ScholarCross Ref
- Dong Nguyen Tien, Tuoi Tran Thi Minh, Loi Le Vu, and Tuan Dang Minh. 2022. Vietnamese Spelling Error Detection and Correction Using BERT and N-gram Language Model. In Proceedings of ICISN. 427–436.Google ScholarCross Ref
- Hieu Tran, Cuong V. Dinh, Long Phan, and Son T. Nguyen. 2021. Hierarchical Transformer Encoders for Vietnamese Spelling Correction. In Proceedings of IEA/AIE. 547–556.Google Scholar
- Nguyen Luong Tran, Duong Minh Le, and Dat Quoc Nguyen. 2022. BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese. In Proceedings of INTERSPEECH. 1751–1755.Google ScholarCross Ref
- Tin Duy Vo, Manh Luong, Duong Minh Le, Hieu Tran, Nhan Do, Tuan-Duy H. Nguyen, Thien Nguyen, Hung Hai Bui, Dat Quoc Nguyen, and Dinh Phung. 2022. Vietnamese Speech-Based Question Answering over Car Manuals. In Companion Proceedings of IUI. 117–119.Google ScholarDigital Library
- Yu Wang, Yuelin Wang, Kai Dang, Jie Liu, and Zhuo Liu. 2021. A Comprehensive Survey of Grammatical Error Correction. ACM Trans. Intell. Syst. Technol. 12, 5, Article 65(2021).Google ScholarDigital Library
- Shaohua Zhang, Haoran Huang, Jicong Liu, and Hang Li. 2020. Spelling Error Correction with Soft-Masked BERT. In Proceedings of ACL. 882–890.Google ScholarCross Ref
Index Terms
- A Vietnamese Spelling Correction System
Recommendations
Hierarchical Transformer Encoders for Vietnamese Spelling Correction
Advances and Trends in Artificial Intelligence. Artificial Intelligence PracticesAbstractIn this paper, we propose a Hierarchical Transformer model for Vietnamese spelling correction problem. The model consists of multiple Transformer encoders and utilizes both character-level and word-level to detect errors and make corrections. In ...
Context-aware correction of spelling errors in Hungarian medical documents
HighlightsWe propose two methods to automatically correct Hungarian clinical text.Method 1 generates a ranked list of correction candidates disregarding context.Method 2 uses an SMT decoder to implement context-aware error correction.Method 1 is ...
Word2Vec based spelling correction method of Twitter message
SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied ComputingTwitter1 became popular owing to the devices like smartphones and tablets, with which short messages can be easily composed. Due to the popularity of Twitter, the volume of Twitter messages has increased rapidly. Accordingly, studies have been carried ...
Comments