skip to main content
research-article

BERT-Inspired Progressive Stacking to Enhance Spelling Correction in Bengali Text

Published: 08 August 2024 Publication History

Abstract

Common spelling checks in the current digital era have trouble reading languages such as Bengali, which employ English letters differently. In response, we have created a better Bidirectional Encoder Representations from Transformers (BERT)–based spell checker that makes use of a convolutional neural network (CNN) sub-model (Semantic Network). Our novelty, which we term progressive stacking, concentrates on improving BERT model training while expediting the corrective process. We discovered that, when comparing shallow and deep versions, deeper models could require less training time. There is potential for improving spelling corrections with this technique. We categorized and utilized as a test set a 6,300-word dataset that Nayadiganta Mohiuddin supplied, some of which had spelling errors. The most popular terms were the same as those found in the Prothom-Alo artificial error dataset.

References

[1]
Ahmad Al Hanbali, Haitham H. Saleh, Omar G. Alsawafy, Ahmed M. Attia, Ahmed M. Ghaithan, and Awsan Mohammed. 2022. Spare parts supply with incoming quality control and inspection errors in condition based maintenance. Computers & Industrial Engineering 172 (2022), 108534.
[2]
Mohammed Attia, Pavel Pecina, Younes Samih, Khaled Shaalan, and Josef Van Genabith. 2012. Improved spelling error detection and correction for Arabic. In Proceedings of COLING 2012: Posters. 103–112.
[3]
Mohammed Attia, Pavel Pecina, Younes Samih, Khaled Shaalan, and Josef Van Genabith. 2016. Arabic spelling error detection and correction. Natural Language Engineering 22, 5 (2016), 751–773.
[4]
Mohammed A. Attia. 2008. Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation. The University of Manchester (United Kingdom).
[5]
Mohammed El Hadi Attia, Zied Driss, A. Muthu Manokar, and Ravishankar Sathyamurthy. 2020. Effect of aluminum balls on the productivity of solar distillate. Journal of Energy Storage 30 (2020), 101466.
[6]
Debajyoty Banik, Pushpak Bhattacharyya, and Asif Ekbal. 2016. Rule based hardware approach for machine transliteration: A first thought. In 2016 6th International Symposium on Embedded Computing and System Design (ISED). IEEE, 192–195.
[7]
Debajyoty Banik, Asif Ekbal, and Pushpak Bhattacharyya. 2020. Statistical machine translation based on weighted syntax–semantics. Sādhanā 45 (2020), 1–12.
[8]
Debajyoty Banik, Asif Ekbal, Pushpak Bhattacharyya, and Siddhartha Bhattacharyya. 2019. Assembling translations from multi-engine machine translation outputs. Applied Soft Computing 78 (2019), 230–239.
[9]
Debajyoty Banik, Asif Ekbal, Pushpak Bhattacharyya, Siddhartha Bhattacharyya, and Jan Platos. 2019. Statistical-based system combination approach to gain advantages over different machine translation systems. Heliyon 5, 9 (2019).
[10]
Debajyoty Banik, Sukanta Sen, Asif Ekbal, and Pushpak Bhattacharyya. 2016. Can SMT and RBMT improve each other’s performance?—An experiment with English-Hindi translation. In Proceedings of the 13th International Conference on Natural Language Processing. 10–19.
[11]
Ondřej Bojar, Vojtěch Diatka, Pavel Rychlỳ, Pavel Straňák, Vít Suchomel, Aleš Tamchyna, and Daniel Zeman. 2014. Hindencorp-Hindi-English and Hindi-only corpus for machine translation. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). 3550–3555.
[12]
Xingyi Cheng, Weidi Xu, Kunlong Chen, Shaohua Jiang, Feng Wang, Taifeng Wang, Wei Chu, and Yuan Qi. 2020. SpellGCN: Incorporating phonological and visual similarities into language models for Chinese spelling check. arXiv preprint arXiv:2004.14166 (2020).
[13]
Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Łukasz Kaiser. 2018. Universal transformers. arXiv preprint arXiv:1807.03819 (2018).
[14]
Satya Deo and Debajyoty Banik. 2022. Text summarization using TextRank and LexRank through latent semantic analysis. In 2022 OITS International Conference on Information Technology (OCIT). 113–118.
[15]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[16]
Pravallika Etoori, Manoj Chinnakotla, and Radhika Mamidi. 2018. Automatic spelling correction for resource-scarce languages using deep learning. In Proceedings of ACL 2018, Student Research Workshop. 146–152.
[17]
Linyuan Gong, Di He, Zhuohan Li, Tao Qin, Liwei Wang, and Tieyan Liu. 2019. Efficient training of BERT by progressively stacking. In International Conference on Machine Learning. PMLR, 2337–2346.
[18]
Yuzhong Hong, Xianguo Yu, Neng He, Nan Liu, and Junhui Liu. 2019. FASPell: A fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019). 160–169.
[19]
Sadidul Islam, Mst Farhana Sarkar, Towhid Hussain, Md Mehedi Hasan, Dewan Md Farid, and Swakkhar Shatabda. 2018. Bangla sentence correction using deep neural network based sequence to sequence learning. In 2018 21st International Conference of Computer and Information Technology (ICCIT). IEEE, 1–6.
[20]
Liming Jiang, Xingfa Shen, Qingbiao Zhao, and Jian Yao. 2024. MLSL-Spell: Chinese spelling check based on multi-label annotation. Applied Sciences 14, 6 (2024), 2541.
[21]
Nur Hossain Khan, Gonesh Chandra Saha, Bappa Sarker, and Md Habibur Rahman. 2014. Checking the correctness of Bangla words using n-gram. International Journal of Computer Application 89, 11 (2014).
[22]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[23]
Prianka Mandal and B. M. Mainul Hossain. 2017. Clustering-based Bangla spell checker. In 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR). IEEE, 1–6.
[24]
Hanar Hoshyar Mustafa and Rebwar M. Nabi. 2023. Kurdish Kurmanji lemmatization and spell-checker with spell-correction. UHD Journal of Science and Technology 7, 1 (2023), 43–52.
[25]
Abisola Olayiwola, Dare Olayiwola, and Ajibola Oyedeji. 2024. Development of an automatic grammar checker for Yorùbá word processing using government and binding theory. Expert Systems with Applications 236 (2024), 121351.
[26]
M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer. 1802. Deep contextualized word representations. arXiv 2018. arXiv preprint arXiv:1802.05365 12 (1802).
[27]
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018. Improving language understanding by generative pre-training. (2018).
[28]
Chowdhury Rafeed Rahman, M. D. Rahman, Samiha Zakir, Mohammad Rafsan, and Mohammed Eunus Ali. 2022. BSpell: A CNN-blended BERT based Bengali spell checker. arXiv preprint arXiv:2208.09709 (2022).
[29]
Sukanta Sen, Debajyoty Banik, Asif Ekbal, and Pushpak Bhattacharyya. 2016. IITP English-Hindi machine translation system at WAT 2016. In Proceedings of the 3rd Workshop on Asian Translation (WAT2016). The COLING 2016 Organizing Committee, Osaka, Japan, 216–222. https://aclanthology.org/W16-4622
[30]
Md Habibur Rahman Sifat, Chowdhury Rafeed Rahman, Mohammad Rafsan, and Hasibur Rahman. 2020. Synthetic error dataset generation mimicking Bengali writing pattern. In 2020 IEEE Region 10 Symposium (TENSYMP). IEEE, 1363–1366.
[31]
Naushad UzZaman and Mumit Khan. 2005. A double metaphone encoding for Bangla and its application in spelling checker. In 2005 International Conference on Natural Language Processing and Knowledge Engineering. IEEE, 705–710.
[32]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017).
[33]
Dingmin Wang, Yi Tay, and Li Zhong. 2019. Confusionset-guided pointer networks for chinese spelling check. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5780–5785.
[34]
Jinhua Xiong, Qiao Zhang, Shuiyuan Zhang, Jianpeng Hou, and Xueqi Cheng. 2015. HANSpeller: A unified framework for Chinese spelling correction. In International Journal of Computational Linguistics & Chinese Language Processing, Volume 20, Number 1, June 2015—Special Issue on Chinese as a Foreign Language.
[35]
Shaohua Zhang, Haoran Huang, Jicong Liu, and Hang Li. 2020. Spelling error correction with soft-masked BERT. arXiv preprint arXiv:2005.07421 (2020).

Index Terms

  1. BERT-Inspired Progressive Stacking to Enhance Spelling Correction in Bengali Text

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 8
      August 2024
      343 pages
      EISSN:2375-4702
      DOI:10.1145/3613611
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 August 2024
      Online AM: 05 July 2024
      Accepted: 24 May 2024
      Revised: 05 April 2024
      Received: 10 October 2023
      Published in TALLIP Volume 23, Issue 8

      Check for updates

      Author Tags

      1. BERT
      2. neural networks
      3. progressive stacking
      4. Bengali
      5. text tagging

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 148
        Total Downloads
      • Downloads (Last 12 months)148
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 02 Mar 2025

      Other Metrics

      Citations

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media