Automatic back transliteration of Romanized Bengali (Banglish) to Bengali

Shibli, G. M. Shahariar; Shawon, Md. Tanvir Rouf; Nibir, Anik Hassan; Miandad, Md. Zabed; Mandal, Nibir Chandra

doi:10.1007/s42044-022-00122-9

Automatic back transliteration of Romanized Bengali (Banglish) to Bengali

Research
Published: 01 November 2022

Volume 6, pages 69–80, (2023)
Cite this article

Iran Journal of Computer Science Aims and scope Submit manuscript

G. M. Shahariar Shibli¹,
Md. Tanvir Rouf Shawon¹,
Anik Hassan Nibir¹,
Md. Zabed Miandad¹ &
…
Nibir Chandra Mandal²

239 Accesses
Explore all metrics

Abstract

Back transliteration of Romanized Bengali to Bengali is the process of converting text written in the Latin alphabet back into the Bengali script. This is often done in order to improve the readability of Bengali text for Bengali speakers using a simple rules-based system, or an interactive transliteration tool. There are many ways to back transliterate from Romanized Bengali to Bengali, but most of them are either grapheme or phoneme based. This paper introduces a unique pipeline that uses nine open source back transliteration tools to automatically back transliterate Romanized Bengali to Bengali. The pipeline consists of seven steps: (1) processing the Romanized Bengali input; (2) acquiring human transliteration for performance comparison; (3) employing transliteration tools; (4) generating candidate transliterations; (5) post-processing the candidate transliterations; (6) selecting best candidate transliteration, and (7) evaluating the quality of the transliterations through several performance metrics. Experimental results reveal that our approach produced the highest BLEU-1 score of 81.28, BLEU-2 score of 60.75, BLEU-3 score of 44.45, BLEU-4 score of 30.46, and the lowest average Word Error Rate and Word Information Lost of 29.21 and 43.68, respectively, on 1000 Romanized Bengali texts. In terms of recall, we achieved a Rouge-L score of 0.7190.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Study on Transliteration Techniques and Conventional Transliteration Schemes for Indian Languages

A Joint Source Channel Model for the English to Bengali Back Transliteration

Automatic English to Kannada Back-Transliteration Using Combination-Based Approach

Availability of supporting data

The links of all the tools used in this work are mentioned in the footnote and the dataset we developed is available at— https://github.com/nibir1234/banglish_to_bengali.

Notes

References

List of languages by total number of speakers. en.wikipedia.org. [Online; accessed 25 August 2022] (2019). https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers
Dey, N., Rahman, M.S., Mredula, M.S., Hosen, A.S., Ra, I.-H.: Using machine learning to detect events on the basis of Bengali and Banglish facebook posts. Electronics 10(19), 2367 (2021)
Article Google Scholar
Sazzed, S.: Abusive content detection in transliterated Bengali-English social media corpus. In: Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching, pp. 125–130 (2021)
Ahmed, M.T., Rahman, M., Nur, S., Islam, A., Das, D.: Deployment of machine learning and deep learning algorithms in detecting cyberbullying in bangla and romanized bangla text: a comparative study. In: 2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), pp. 1–10. IEEE (2021)
Hassan, A., Amin, M.R., Al Azad, A.K., Mohammed, N.: Sentiment analysis on bangla and romanized bangla text using deep recurrent models. In: 2016 International Workshop on Computational Intelligence (IWCI), pp. 51–56. IEEE (2016)
Hossain, M.S., Nayla, N., Rassel, A.A.: Product market demand analysis using nlp in banglish text with sentiment analysis and named entity recognition. In: 2022 56th Annual Conference on Information Sciences and Systems (CISS), pp. 166–171. IEEE (2022)
Ekbal, A., Naskar, S.K., Bandyopadhyay, S.: A modified joint source-channel model for transliteration. In: Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pp. 191–198 (2006)
Das, A., Saikh, T., Mondal, T., Ekbal, A., Bandyopadhyay, S.: English to Indian languages machine transliteration system at news 2010. In: Proceedings of the 2010 Named Entities Workshop, pp. 71–75 (2010)
Dasgupta, T., Sinha, M., Basu, A.: A joint source channel model for the English to Bengali back transliteration. In: Mining Intelligence and Knowledge Exploration, pp. 751–760. Springer, Berlin (2013)
Dasgupta, T., Sinha, M., Anupam, B.: Resource creation and development of an English-Bangla back transliteration system. Int. J. Knowl.-Based Intell. Eng. Syst. 19, 35–46 (2015). https://doi.org/10.3233/KES-150307
Article Google Scholar
Sarkar, K., Chatterjee, S.: Bengali-to-English forward and backward machine transliteration using support vector machines. In: International Conference on Computational Intelligence, Communications, and Business Analytics. Springer, pp. 552–566 (2017)
UzZaman, N., Zaheen, A., Khan, M.: A comprehensive roman (English)-to-Bangla transliteration scheme (2006)
Chaudhuri, S.: Transliteration from non-standard phonetic Bengali to standard Bengali. In: Satellite Workshop, p. 41 (2006)
Rizvee, R.A., Mahmood, A., Mullick, S.S., Hakim, S.: Arobust three-stage hybrid framework for english to bangla transliteration. Int. J. Nat. Lang. Comput. 11(1) (2022)
Lee, J.S., Choi, K.-S.: English to Korean statistical transliteration for information retrieval. Comp. Process. Oriental Lang. 12(1), 17–37 (1998)
Google Scholar
Bilac, S., Tanaka, H.: Improving back-transliteration by combining information sources. In: International Conference on Natural Language Processing. Springer, pp. 216–223 (2004)
Schuster, M., Johnson, M., Thorat, N.: Zero-shot translation with google’s multilingual neural machine translation system. Google AI Blog 22 (2016)
Roark, B., Wolf-Sonkin, L., Kirov, C., Mielke, S.J., Johny, C., Demirsahin, I., Hall, K.: Processing south asian languages written in the latin script: the dakshina dataset. arXiv preprint arXiv:2007.01176 (2020)
Google IME. en.wikipedia.org. [Online; Accessed 27 August 2022] (2012). https://en.wikipedia.org/wiki/Google_IME
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Lin, C.-Y.: Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Morris, A.C., Maier, V., Green, P.: From wer and ril to mer and wil: improved evaluation measures for connected speech recognition. In: Eighth International Conference on Spoken Language Processing (2004)
Errattahi, R., El Hannani, A., Ouahmane, H.: Automatic speech recognition errors detection and correction: a review. Proc. Comp. Sci. 128, 32–37 (2018)
Article Google Scholar
Mihalcea, R., Tarau, P.: Textrank: Bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab, CA (1999)
Google Scholar
Bhattacharjee, A., Hasan, T., Samin, K., Islam, M.S., Rahman, M.S., Iqbal, A., Shahriyar, R.: Banglabert: Combating embedding barrier in multilingual models for low-resource language understanding. arXiv preprint arXiv:2101.00204 (2021)
Han, J., Kamber, M., Pei, J., et al.: Getting to know your data. In: Data Mining, vol. 2, pp. 39–82. Morgan Kaufmann, Boston, MA (2012)
Hossain, M.M., Labib, M.F., Rifat, A.S., Das, A.K., Mukta, M.: Auto-correction of English to Bengali transliteration system using Levenshtein distance. In: 2019 7th International Conference on Smart Computing & Communications (ICSCC), pp. 1–5. IEEE (2019)

Download references

Acknowledgements

Not applicable.

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Ahsanullah University of Science and Technology, Dhaka, Bangladesh
G. M. Shahariar Shibli, Md. Tanvir Rouf Shawon, Anik Hassan Nibir & Md. Zabed Miandad
University of Virginia, Charlottesville, VA, USA
Nibir Chandra Mandal

Authors

G. M. Shahariar Shibli
View author publications
You can also search for this author in PubMed Google Scholar
Md. Tanvir Rouf Shawon
View author publications
You can also search for this author in PubMed Google Scholar
Anik Hassan Nibir
View author publications
You can also search for this author in PubMed Google Scholar
Md. Zabed Miandad
View author publications
You can also search for this author in PubMed Google Scholar
Nibir Chandra Mandal
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Shibli and Shawon set the research scope, coordinated this research, coded, ran a few experiments, and drafted the manuscript. Nibir and Miandad wrote codes, collected data and ran most experiments. Mandal ran a few experiments.

Corresponding author

Correspondence to G. M. Shahariar Shibli.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Human and animal ethics

Not applicable.

Ethical approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shibli, G.M.S., Shawon, M.T.R., Nibir, A.H. et al. Automatic back transliteration of Romanized Bengali (Banglish) to Bengali. Iran J Comput Sci 6, 69–80 (2023). https://doi.org/10.1007/s42044-022-00122-9

Download citation

Received: 30 August 2022
Accepted: 07 October 2022
Published: 01 November 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s42044-022-00122-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic back transliteration of Romanized Bengali (Banglish) to Bengali

Abstract

Access this article

Similar content being viewed by others

A Study on Transliteration Techniques and Conventional Transliteration Schemes for Indian Languages

A Joint Source Channel Model for the English to Bengali Back Transliteration

Automatic English to Kannada Back-Transliteration Using Combination-Based Approach

Availability of supporting data

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Human and animal ethics

Ethical approval and consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic back transliteration of Romanized Bengali (Banglish) to Bengali

Abstract

Access this article

Similar content being viewed by others

A Study on Transliteration Techniques and Conventional Transliteration Schemes for Indian Languages

A Joint Source Channel Model for the English to Bengali Back Transliteration

Automatic English to Kannada Back-Transliteration Using Combination-Based Approach

Availability of supporting data

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Human and animal ethics

Ethical approval and consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation