Abstract
The purpose of the Bangla grammatical error correction task is to spontaneously identify and correct syntactic, morphological, semantic, and punctuation mistakes in written Bangla text using computational models, ultimately enhancing language precision and eloquence. The significance of the task encompasses bolstering linguistic acumen, fostering efficacious communication, and ensuring utmost lucidity and meticulousness in written expression, thereby mitigating the potential for obfuscation or dissemination of fallacious connotations. Prior endeavors have centered around surmounting the constraints inherent in rule-based and statistical methods through the exploration of machine learning and deep learning methods, aiming to enhance accuracy by apprehending intricate linguistic patterns, comprehending contextual cues, and discerning semantic nuances. In this study, we address the absence of a baseline for the task by developing a large-scale parallel corpus comprising 7.7M source-target pairs and exploring the untapped potential of transformers. Alongside the corpus, we introduce a Vaswani-style efficient monolingual transformer-based method named Bangla grammatical error corrector, Panini by leveraging transfer learning, which has become the state-of-the-art method for the task by surpassing the performance of both BanglaT5 and T5-Small by 18.81% and 23.8% of accuracy scores, and 11.5 and 15.6 of SacreBLEU scores, respectively. The empirical findings of the method substantiate its superiority over other approaches when it comes to capturing intricate linguistic rules and patterns. Moreover, the efficacy of our proposed method has been compared with the Bangla paraphrase task, showcasing its superior capability by outperforming the previous state-of-the-art method for the task as well. The BanglaGEC corpus and Panini, along with the baselines of BGEC and the Bangla paraphrase task, have been made publicly accessible at https://tinyurl.com/BanglaGEC.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Datasets will be made available on request.
References
Rozovskaya A, Roth D (2019) Grammar error correction in morphologically rich languages: the case of Russian. Trans Assoc Comput Linguist 7:1–17
Hu L, Tang Y, Wu X, Zeng J (2022) Considering optimization of English grammar error correction based on neural network. Neural Comput Appl 66:1–13
Grundkiewicz R, Junczys-Dowmunt M, Heafield K (2019) Neural grammatical error correction systems with unsupervised pre-training on synthetic data. In: Proceedings of the fourteenth workshop on innovative use of NLP for building educational applications, pp 252–263
Wang Y, Wang Y, Dang K, Liu J, Liu Z (2021) A comprehensive survey of grammatical error correction. ACM Trans Intell Syst Technol 12(5):1–51
Hasan KA, Mondal A, Saha A (2010) A context free grammar and its predictive parser for Bangla grammar recognition. In: 2010 13th International conference on computer and information technology (ICCIT). IEEE, pp 87–91
Hasan K, Mondal A, Saha A et al (2012) Recognizing Bangla grammar using predictive parser. arXiv preprint arXiv:1201.2010
Islam MA, Hasan KA, Rahman MM (2012) Basic hpsg structure for Bangla grammar. In: 2012 15th International conference on computer and information technology (ICCIT). IEEE, pp 185–189
Purohit PP, Hoque MM, Hassan MK (2014) An empirical framework for semantic analysis of Bangla sentences. In: 2014 9th International forum on strategic technology (IFOST). IEEE, pp 34–39
Purohit PP, Hoque MM, Hassan MK (2014) Feature based semantic analyzer for parsing Bangla complex and compound sentences. In: The 8th International conference on software, knowledge, information management and applications (SKIMA 2014). IEEE, pp 1–7
Karim MS, Robi FRH, Hossain MM, Rahman MT et al (2018) Implementation and performance evaluation of semantic features analysis system for Bangla assertive, imperative and interrogative sentences. In: 2018 International conference on bangla speech and language processing (ICBSLP). IEEE, pp 1–5
Hasan KA, Hozaifa M, Dutta S (2014) Detection of semantic errors from simple Bangla sentences. In: 2014 17th International conference on computer and information technology (ICCIT). IEEE, pp 296–299
Rabbi RZ, Shuvo MIR, Hasan KA (2016) Bangla grammar pattern recognition using shift reduce parser. In: 2016 5th International conference on informatics, electronics and vision (ICIEV). IEEE, pp 229–234
Al Hadi A, Khan MYA, Sayed MA (2016) Extracting semantic relatedness for Bangla words. In: 2016 5th International conference on informatics, electronics and vision (ICIEV). IEEE, pp 10–14
Alamgir T, Arefin MS (2017) An empirical framework for parsing Bangla imperative, optative and exclamatory sentences. In: 2017 International conference on electrical, computer and communication engineering (ECCE). IEEE, pp 164–169
Khatun S, Hoque MM (2018) Semantic analysis of Bengali sentences. In: 2018 International conference on bangla speech and language processing (ICBSLP). IEEE, pp 1–6
Saha Prapty A, Rifat Anwar M, Azharul Hasan K (2021) A rule-based parsing for Bangla grammar pattern detection. In: Proceedings of international joint conference on advances in computational intelligence: IJCACI 2020. Springer, pp 319–331
Afroz S, Susmoy M, Anjum F, Nowshin N (2021) Examining lexical and grammatical difficulties in Bengali language using nlp with machine learning. PhD thesis, Brac University
Faisal AMF, Rahman MA, Farah T (2021) A rule-based Bengali grammar checker. In: 2021 Fifth world conference on smart trends in systems security and sustainability (WorldS4). IEEE, pp 113–117
Alam M, UzZaman N, Khan M et al (2007) N-gram based statistical grammar checker for Bangla and English
Kundu B, Chakraborti S, Choudhury SK (2011) Nlg approach for Bangla grammatical error correction. In: 9th International conference on natural language processing, ICON, pp 225–230
Kundu B, Chakraborti S, Choudhury SK (2012) Combining confidence score and mal-rule filters for automatic creation of Bangla error corpus: grammar checker perspective. In: Computational linguistics and intelligent text processing: 13th international conference, CICLing 2012, New Delhi, India, March 11–17, 2012, Proceedings, Part II 13. Springer, pp 462–477
Sinha M, Dasgupta T, Jana A, Basu A (2014) Design and development of a Bangla semantic lexicon and semantic similarity measure. Int J Comput Appl 975:8887
Khan NH (2014) Verification of Bangla sentence structure using n-gram. Glob J Comput Sci Technol 14:1–5
Rahman MR, Habib MT, Rahman MS, Shuvo SB, Uddin MS (2016) An investigative design based statistical approach for determining Bangla sentence validity. Int J Comput Sci Netw Secur 16(11):30–37
Nipu AS, Pal U (2017) A machine learning approach on latent semantic analysis for ambiguity checking on Bengali literature. In: 2017 20th International conference of computer and information technology (ICCIT). IEEE, pp 1–4
Husna A, Mostofa M, Khatun A, Islam J, Mahin M (2018) A framework for word clustering of Bangla sentences using higher order n-gram language model. In: 2018 International conference on innovation in engineering and technology (ICIET). IEEE, pp 1–6
Rana MM, Sultan MT, Mridha M, Khan MEA, Ahmed MM, Hamid MA (2018) Detection and correction of real-word errors in Bangla language. In: 2018 International conference on bangla speech and language processing (ICBSLP). IEEE, pp 1–4
Mridha M, Rana MM, Hamid MA, Khan MEA, Ahmed MM, Sultan MT (2019) An approach for detection and correction of missing word in Bengali sentence. In: 2019 International conference on electrical, computer and communication engineering (ECCE). IEEE, pp 1–4
Rahman MR, Habib MT, Rahman MS, Islam GZ, Khan MAA (2020) An exploratory research on grammar checking of Bangla sentences using statistical language models. Int J Electr Comput Eng 10(3):3244–3252
Hossain N, Islam S, Huda MN (2021) Development of Bangla spell and grammar checkers: resource creation and evaluation. IEEE Access 9:141079–141097
Kundu SB, Chakraborti S, Choudhury SK (2013) Complexity guided active learning for Bangla grammar correction. In: 10th International conference on natural language processing, ICON, vol 1, p 4
Mridha M, Hamid MA, Rana MM, Khan MEA, Ahmed MM, Sultan MT (2019) Semantic error detection and correction in Bangla sentence. In: 2019 Joint 8th international conference on informatics, electronics & vision (ICIEV) and 2019 3rd international conference on imaging, vision & pattern recognition (icIVPR). IEEE, pp 184–189
Islam S, Sarkar MF, Hussain T, Hasan MM, Farid DM, Shatabda S (2018) Bangla sentence correction using deep neural network based sequence to sequence learning. In: 2018 21st International conference of computer and information technology (ICCIT). IEEE, pp 1–6
Shajalal M, Aono M (2018) Semantic textual similarity in Bengali text. In: 2018 International conference on bangla speech and language processing (ICBSLP). IEEE, pp 1–5
Abujar S, Masum AKM, Chowdhury SMH, Hasan M, Hossain SA (2019) Bengali text generation using bi-directional rnn. In: 2019 10th International conference on computing, communication and networking technologies (ICCCNT). IEEE, pp 1–5
Rakib OF, Akter S, Khan MA, Das AK, Habibullah KM (2019) Bangla word prediction and sentence completion using gru: an extended version of rnn on n-gram language model. In: 2019 International conference on sustainable technologies for Industry 4.0 (STI). IEEE, pp 1–6
Islam MS, Mousumi SSS, Abujar S, Hossain SA (2019) Sequence-to-sequence Bangla sentence generation with lstm recurrent neural networks. Procedia Comput Sci 152:51–58
Pandit R, Sengupta S, Naskar SK, Dash NS, Sardar MM (2019) Improving semantic similarity with cross-lingual resources: a study in Bangla—a low resourced language. In: Informatics, vol 6. MDPI, p 19
Noshin Jahan M, Sarker A, Tanchangya S, Abu Yousuf M (2020) Bangla real-word error detection and correction using bidirectional lstm and bigram hybrid model. In: Proceedings of international conference on trends in computational and cognitive engineering: proceedings of TCCE 2020. Springer, pp 3–13
Chowdhury MAH, Mumenin N, Taus M, Yousuf MA (2021) Detection of compatibility, proximity and expectancy of Bengali sentences using long short term memory. In: 2021 2nd International conference on robotics, electrical and signal processing techniques (ICREST). IEEE, pp 233–237
Iqbal MA, Sharif O, Hoque MM, Sarker IH (2021) Word embedding based textual semantic similarity measure in Bengali. Procedia Comput Sci 193:92–101
Anbukkarasi S, Varadhaganapathy S (2022) Neural network-based error handler in natural language processing. Neural Comput Appl 66:1–10
Dhar AC, Roy A, Habib MA, Akhand M, Siddique N (2022) Transformer deep learning model for Bangla–English machine translation. In: Proceedings of 2nd international conference on artificial intelligence: advances and applications: ICAIAA 2021. Springer, pp 255–265
Aurpa TT, Sadik R, Ahmed MS (2022) Abusive Bangla comments detection on Facebook using transformer-based deep learning models. Soc Netw Anal Min 12(1):24
Bijoy MH, Hossain N, Islam S, Shatabda S (2022) Dpcspell: a transformer-based detector–purificator–corrector framework for spelling error correction of Bangla and resource scarce Indic languages. arXiv preprint arXiv:2211.03730
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:66
Akil A, Sultana N, Bhattacharjee A, Shahriyar R (2022) Banglaparaphrase: a high-quality Bangla paraphrase dataset. arXiv preprint arXiv:2210.05109
Shahgir H, Sayeed KS (2023) Bangla grammatical error detection using t5 transformer model. arXiv preprint arXiv:2303.10612
Junczys-Dowmunt M, Grundkiewicz R, Dwojak T, Hoang H, Heafield K, Neckermann T, Seide F, Germann U, Aji AF, Bogoychev N et al (2018) Marian: fast neural machine translation in c++. arXiv preprint arXiv:1804.00344
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(1):5485–5551
Acknowledgements
This research is funded by Institute of Advanced Research (Grant No. UIU/IAR/02/2021/SE/22), United International University, Bangladesh.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
One of the earliest linguists and grammararians, Bangla grammar follows the rules set by Panini.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hossain, N., Bijoy, M.H., Islam, S. et al. Panini: a transformer-based grammatical error correction method for Bangla. Neural Comput & Applic 36, 3463–3477 (2024). https://doi.org/10.1007/s00521-023-09211-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09211-7