Skip to main content
Log in

Transliterating Latin to Amharic scripts using user-defined rules and character mappings

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

As social media platforms become increasingly accessible, individuals’ usage of new forms of textual communication (posts, comments, chats, etc.) on social media using local language scripts such as Amharic has increased tremendously. However, many users prefer to post comments in Latin scripts instead of local ones due to the availability of more convenient forms of character input using Latin keyboards. In existing Latin to Amharic transliteration systems, missing consideration of double consonants and double vowels has caused transliteration errors. Further, as there are multiple ways of character mapping conventions in existing systems, social media texts are susceptible to a wide variety of user adoptions during script production. The current systems have failed to address these gaps and adoptions. In this work, we present the RBLatAm (Rule-Based Latin to Amharic) transliteration system, a generic rule-based system that converts Amharic words which have been written using Latin script back into their native Amharic script. The system is based on mapping rules engineered from three existing transliteration systems (Microsoft, Google, SERA) and additional rules for double consonants, and conventions adopted on social media by speakers of Amharic. When tested on transliterated Amharic words of non-named entities, and named entities of persons, the system achieves an accuracy of 75.8% and 84.6%, respectively. The system also correctly transliterates words reported as errors in previous studies. This system drastically improves the basis for performing research on text mining for Amharic language texts by being able to process such texts even if they have originally been produced in Latin scripts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. https://play.google.com/store/apps/details?id=com google.android.inputmethod.latin&hl = en_US&gl = US.

  2. https://avantassessment.com/writing-input-guide-windows-10.

  3. https://en.wikipedia.org/wiki/Ge’ez_script.

  4. https://en.wikipedia.org/wiki/Amharic.

  5. https://www.facebook.com/EBCzena

  6. Not all the Amharic characters are displayed on the Table because of space limitations.

  7. https://doi.org/10.5281/zenodo.7317713.

  8. https://www.amazon.com/Amarigna-Tigrigna-Roots-English-Language/dp/1503295192.

  9. https://doi.org/10.5281/zenodo.5723141.

References

  1. Sumikawa, Y., Jatowt, A.: Analyzing history related posts in Twitter. Int. J. Digit. Libr. 22(1), 105–134 (2021)

    Article  Google Scholar 

  2. Benites, F., Duivesteijn, G., von, P., Cieliebak, M.: Translit: a large-scale name transliteration resource. In: Proceedings of 12th Language Resources and Evaluation Conference (LREC) 2020, pp. 3258–3264. European Language Resources Association (2020).

  3. Owen, C.B., Ford, J., Makedon, F., Steinberg, T.: Parallel text alignment. In: Proceedings of International Conference on Theory and Practice of Digital Libraries, pp. 235–260. Springer (1998)

  4. Wang, J., Lu, W., Chien, L.: Toward web mining of cross-language query translations in digital libraries. Int. J. Digit. Libr. 4(4), 247–257 (2004)

    Article  Google Scholar 

  5. Klouche, B., Benslimane, S.: Arabizi chat alphabet transliteration to Algerian dialect. In: Proceedings of International Conference in Artificial Intelligence in Renewable Energetic Systems, pp. 790–797. Springer (2020)

  6. Appel, G., Grewal, L., Hadi, R., Stephen, A.: The future of social media in marketing. J. Acad. Mark. Sci 48(1), 79–95 (2020)

    Article  Google Scholar 

  7. Ruan, S., Wobbrock, J.O., Liou, K., Ng, A., Landay, J.A.: Comparing speech and keyboard text entry for short messages in two languages on touchscreen phones. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol 1, pp. 1–23. (2018)

  8. Van, E., Sarbar, E., Lucassen, T., O’Brien, J., Breiner, T., Prasad, M., Crew, E., Nguyen, C., Beaufays, F.: Writing across the world’s languages: Deep internationalization for Gboard, the Google keyboard. arXiv preprint arXiv:1912.01218., pp. 1–27 (2019)

  9. Yimam, B.: Ethiopian writing system. Dialogue 1(1), 17–41 (1992)

    Google Scholar 

  10. Munye, M., Atnafu, S.: Amharic-English bilingual web search engine. In: Proceedings of the International Conference on Management of Emergent Digital EcoSystems, pp. 32–39 (2012)

  11. Tedla, T.: amLite: Amharic transliteration using key map dictionary. arXiv e-prints, 1509 (2015)

  12. Wright, S.: The transliteration of Amharic. Int. J. Ethiop. Stud. 2(1), 1–10 (1964)

    Google Scholar 

  13. Yaqob, D.: Transliteration on the internet: the case of Ethiopic. In: Proceedings of the International Symposium on Multilingual Information Processing, Tsukuba, Japan. (1997)

  14. Chinnakotla, M.K., Damani, O.P., Satoskar, A.: Transliteration for resource-scarce languages. ACM Trans. Asian Lang. Inform. Process. 9(4), 30 (2010)

    Google Scholar 

  15. Sharma, A., Kabra, A., Jain, M.: Ceasing hate with moh: Hate speech detection in Hindi–English code-switched language. Inf. Process. Manag. 59(1), 102760 (2022)

    Article  Google Scholar 

  16. Firdyiwek, Y., Yaqob, D.: The Ethiopian script in ASCII. J. Ethio-Sci. 3(1), 8 (1997)

    Google Scholar 

  17. Bhalla, D., Joshi, N., Mathur, I.: Rule based transliteration scheme for English to Punjabi. Int. J. Nat. Lang. Comput. 2(2), 67–73 (2013)

    Article  Google Scholar 

  18. Sajjad, H., Durrani, N., Schmid, H., Fraser, A.: Comparing two techniques for learning transliteration models using a parallel corpus. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 129–137 (2011)

  19. Kaur, K., Singh, P.: Review of machine transliteration techniques. Int. J. Comput. Appl. 107(20) (2014)

  20. AbdulJaleel, N., Larkey, L.S.: Statistical transliteration for English–Arabic cross language information retrieval. In: Proceedings of the 12th International Conference on Information and Knowledge Management, pp. 139–146. (2003)

  21. Masmoudi, A., Khmekhem, M.E., Khrouf, M., Belguith, L.H.: Transliteration of Arabizi into Arabic script for Tunisian dialect. Asian Low-Resour. Lang. Inf. Process. 19(2), 1–21 (2019)

    Google Scholar 

  22. Nair, J., Sadasivan, A.: A Roman to Devanagari back-transliteration algorithm based on Harvard-Kyoto convention. In: Proceedings of 5th International Conference for Convergence in Technology (I2CT), pp. 1–6, IEEE (2019)

  23. Guellil, I., Adeel, A., Azouaou, F., Benali, F., Hachani, A., Hussain, A.: Arabizi sentiment analysis based on transliteration and automatic corpus annotation. In: Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 335–341. (2018)

  24. Deep, K., Goyal, V.: Development of a Punjabi to English transliteration system. Int. J. Comput. Sci. Commun. Netw. 2(2), 521–526 (2011)

    Google Scholar 

  25. Garg, K.D., Singh, U., Gupta, S.: Hidden markov model based Punjabi to English machine transliteration system. Int. J. Control Autom. 12(4), 199–206 (2019)

    Google Scholar 

  26. Malik, M.G.A., Boitet, C., Bhattacharyya, P.: Hindi Urdu machine transliteration using finite-state transducers. In: 22nd International Conference on Computational Linguistics (COLING), pp. 537–544. ICCL (2008)

  27. Malik, M.G.A., Besacier, L., Boitet, C., Bhattacharyya, P.: A hybrid model for Urdu Hindi transliteration. In: Joint Conference of the 47th Annual Meeting of the Association of Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of NLP ACL/IJCNLP Workshop on Named Entities (NEWS-09), pp. 177–185 (2009)

  28. Ahmadi, S.: A rule-based Kurdish text transliteration system. Asian Low-Resour. Lang. Inf. Process. 18(2), 1–8 (2019)

    Article  Google Scholar 

  29. Singh, S.K., Sachan, M.K.: Grt: Gurmukhi to Roman transliteration system using character mapping and handcrafted rules. Int. J. Eng. Innov. Technol. 8(9), 2758–2763 (2019)

    Google Scholar 

  30. Deep, K., Goyal, V.: Development of a Punjabi to English transliteration system. Int. J. Comput. Sci. Commun. 2(2), 521–526 (2011)

    Google Scholar 

  31. Deep, K., Goyal, V.: English to Tamil transliteration using weka system. Int. J. Recent Trends Eng. 1(1), 498–500 (2009)

    Google Scholar 

  32. Deep, K., Goyal, V.: Transliteration for resource scarce language. ACM Trans. Asian Lang. Inform. Process. 9(4), 1–30 (2010)

    Article  Google Scholar 

  33. Kore, M., Goyal, V.: Machine transliteration for English to Amharic proper nouns. Int. J. Comput. Sci. Trends Technol. 5(4) (2017)

  34. Bende, M.L.: The origin of Amharic. Ethiop. J. Lang. Lit. 1(1), 41–52 (1983)

    Google Scholar 

  35. Asker, L., Argaw, A.A., Gambäck, B., Asfeha, S.E., Habte, L.N.: Classifying Amharic web news. Inf. Retrieval 12(3), 416–435 (2009)

    Article  Google Scholar 

  36. Argaw, A.A., Asker, L.: An Amharic stemmer: reducing words to their citation forms. In: Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources, pp. 104–110. (2007)

  37. Gambäck, B., Asker, L.: Experiences with developing language processing tools and corpora for Amharic. In: 2010 IST-Africa, pp. 1–8. IEEE (2010)

  38. Afework, Y.: Automatic Amharic text categorization. M.Sc. Thesis, Addis Ababa University, Addis Ababa (2007)

  39. Bender, M.L., Bowen, J.D., Cooper, R.L., Ferguson, C.A.: Languages in Ethiopia. Oxford University Press, London (1976)

    Google Scholar 

  40. Mossie, Z., Wang, J.: Social network hate speech detection for Amharic language. Comput. Sci. Inform. Technol. 41–55 (2018)

  41. Mossie, Z., Wang, J.: Vulnerable community identification using hate speech detection on social media. Inf. Process. Manag 57(3), 102087 (2020)

    Article  Google Scholar 

  42. Gagliardone, I., Patel, A., Pohjonen, M.: Mapping and analysing hate speech online: Opportunities and challenges for Ethiopia. SSRN J. (2014). https://doi.org/10.2139/ssrn.2601792

    Article  Google Scholar 

  43. Gagliardone, P.M.I.: Mechachal: online debates and elections in Ethiopia from hate speech to engagement in social media. SSRN J. (2016). https://doi.org/10.2139/ssrn.2831369

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zeleke Abebaw.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abebaw, Z., Rauber, A. & Atnafu, S. Transliterating Latin to Amharic scripts using user-defined rules and character mappings. Int J Digit Libr 24, 63–75 (2023). https://doi.org/10.1007/s00799-023-00346-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-023-00346-5

Keywords

Navigation