Transliterating Latin to Amharic scripts using user-defined rules and character mappings

Abebaw, Zeleke; Rauber, Andreas; Atnafu, Solomon

doi:10.1007/s00799-023-00346-5

Transliterating Latin to Amharic scripts using user-defined rules and character mappings

Published: 02 March 2023

Volume 24, pages 63–75, (2023)
Cite this article

International Journal on Digital Libraries Aims and scope Submit manuscript

142 Accesses
4 Altmetric
Explore all metrics

Abstract

As social media platforms become increasingly accessible, individuals’ usage of new forms of textual communication (posts, comments, chats, etc.) on social media using local language scripts such as Amharic has increased tremendously. However, many users prefer to post comments in Latin scripts instead of local ones due to the availability of more convenient forms of character input using Latin keyboards. In existing Latin to Amharic transliteration systems, missing consideration of double consonants and double vowels has caused transliteration errors. Further, as there are multiple ways of character mapping conventions in existing systems, social media texts are susceptible to a wide variety of user adoptions during script production. The current systems have failed to address these gaps and adoptions. In this work, we present the RBLatAm (Rule-Based Latin to Amharic) transliteration system, a generic rule-based system that converts Amharic words which have been written using Latin script back into their native Amharic script. The system is based on mapping rules engineered from three existing transliteration systems (Microsoft, Google, SERA) and additional rules for double consonants, and conventions adopted on social media by speakers of Amharic. When tested on transliterated Amharic words of non-named entities, and named entities of persons, the system achieves an accuracy of 75.8% and 84.6%, respectively. The system also correctly transliterates words reported as errors in previous studies. This system drastically improves the basis for performing research on text mining for Amharic language texts by being able to process such texts even if they have originally been produced in Latin scripts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Study on Transliteration Techniques and Conventional Transliteration Schemes for Indian Languages

Arabic Transliteration of Romanized Tunisian Dialect Text: A Preliminary Investigation

Machine transliteration and transliterated text retrieval: a survey

Article 07 June 2018

Notes

https://play.google.com/store/apps/details?id=com google.android.inputmethod.latin&hl = en_US&gl = US.
https://avantassessment.com/writing-input-guide-windows-10.
https://en.wikipedia.org/wiki/Ge’ez_script.
https://en.wikipedia.org/wiki/Amharic.
https://www.facebook.com/EBCzena
Not all the Amharic characters are displayed on the Table because of space limitations.
https://doi.org/10.5281/zenodo.7317713.
https://www.amazon.com/Amarigna-Tigrigna-Roots-English-Language/dp/1503295192.
https://doi.org/10.5281/zenodo.5723141.

References

Sumikawa, Y., Jatowt, A.: Analyzing history related posts in Twitter. Int. J. Digit. Libr. 22(1), 105–134 (2021)
Article Google Scholar
Benites, F., Duivesteijn, G., von, P., Cieliebak, M.: Translit: a large-scale name transliteration resource. In: Proceedings of 12th Language Resources and Evaluation Conference (LREC) 2020, pp. 3258–3264. European Language Resources Association (2020).
Owen, C.B., Ford, J., Makedon, F., Steinberg, T.: Parallel text alignment. In: Proceedings of International Conference on Theory and Practice of Digital Libraries, pp. 235–260. Springer (1998)
Wang, J., Lu, W., Chien, L.: Toward web mining of cross-language query translations in digital libraries. Int. J. Digit. Libr. 4(4), 247–257 (2004)
Article Google Scholar
Klouche, B., Benslimane, S.: Arabizi chat alphabet transliteration to Algerian dialect. In: Proceedings of International Conference in Artificial Intelligence in Renewable Energetic Systems, pp. 790–797. Springer (2020)
Appel, G., Grewal, L., Hadi, R., Stephen, A.: The future of social media in marketing. J. Acad. Mark. Sci 48(1), 79–95 (2020)
Article Google Scholar
Ruan, S., Wobbrock, J.O., Liou, K., Ng, A., Landay, J.A.: Comparing speech and keyboard text entry for short messages in two languages on touchscreen phones. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol 1, pp. 1–23. (2018)
Van, E., Sarbar, E., Lucassen, T., O’Brien, J., Breiner, T., Prasad, M., Crew, E., Nguyen, C., Beaufays, F.: Writing across the world’s languages: Deep internationalization for Gboard, the Google keyboard. arXiv preprint arXiv:1912.01218., pp. 1–27 (2019)
Yimam, B.: Ethiopian writing system. Dialogue 1(1), 17–41 (1992)
Google Scholar
Munye, M., Atnafu, S.: Amharic-English bilingual web search engine. In: Proceedings of the International Conference on Management of Emergent Digital EcoSystems, pp. 32–39 (2012)
Tedla, T.: amLite: Amharic transliteration using key map dictionary. arXiv e-prints, 1509 (2015)
Wright, S.: The transliteration of Amharic. Int. J. Ethiop. Stud. 2(1), 1–10 (1964)
Google Scholar
Yaqob, D.: Transliteration on the internet: the case of Ethiopic. In: Proceedings of the International Symposium on Multilingual Information Processing, Tsukuba, Japan. (1997)
Chinnakotla, M.K., Damani, O.P., Satoskar, A.: Transliteration for resource-scarce languages. ACM Trans. Asian Lang. Inform. Process. 9(4), 30 (2010)
Google Scholar
Sharma, A., Kabra, A., Jain, M.: Ceasing hate with moh: Hate speech detection in Hindi–English code-switched language. Inf. Process. Manag. 59(1), 102760 (2022)
Article Google Scholar
Firdyiwek, Y., Yaqob, D.: The Ethiopian script in ASCII. J. Ethio-Sci. 3(1), 8 (1997)
Google Scholar
Bhalla, D., Joshi, N., Mathur, I.: Rule based transliteration scheme for English to Punjabi. Int. J. Nat. Lang. Comput. 2(2), 67–73 (2013)
Article Google Scholar
Sajjad, H., Durrani, N., Schmid, H., Fraser, A.: Comparing two techniques for learning transliteration models using a parallel corpus. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 129–137 (2011)
Kaur, K., Singh, P.: Review of machine transliteration techniques. Int. J. Comput. Appl. 107(20) (2014)
AbdulJaleel, N., Larkey, L.S.: Statistical transliteration for English–Arabic cross language information retrieval. In: Proceedings of the 12th International Conference on Information and Knowledge Management, pp. 139–146. (2003)
Masmoudi, A., Khmekhem, M.E., Khrouf, M., Belguith, L.H.: Transliteration of Arabizi into Arabic script for Tunisian dialect. Asian Low-Resour. Lang. Inf. Process. 19(2), 1–21 (2019)
Google Scholar
Nair, J., Sadasivan, A.: A Roman to Devanagari back-transliteration algorithm based on Harvard-Kyoto convention. In: Proceedings of 5th International Conference for Convergence in Technology (I2CT), pp. 1–6, IEEE (2019)
Guellil, I., Adeel, A., Azouaou, F., Benali, F., Hachani, A., Hussain, A.: Arabizi sentiment analysis based on transliteration and automatic corpus annotation. In: Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 335–341. (2018)
Deep, K., Goyal, V.: Development of a Punjabi to English transliteration system. Int. J. Comput. Sci. Commun. Netw. 2(2), 521–526 (2011)
Google Scholar
Garg, K.D., Singh, U., Gupta, S.: Hidden markov model based Punjabi to English machine transliteration system. Int. J. Control Autom. 12(4), 199–206 (2019)
Google Scholar
Malik, M.G.A., Boitet, C., Bhattacharyya, P.: Hindi Urdu machine transliteration using finite-state transducers. In: 22nd International Conference on Computational Linguistics (COLING), pp. 537–544. ICCL (2008)
Malik, M.G.A., Besacier, L., Boitet, C., Bhattacharyya, P.: A hybrid model for Urdu Hindi transliteration. In: Joint Conference of the 47th Annual Meeting of the Association of Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of NLP ACL/IJCNLP Workshop on Named Entities (NEWS-09), pp. 177–185 (2009)
Ahmadi, S.: A rule-based Kurdish text transliteration system. Asian Low-Resour. Lang. Inf. Process. 18(2), 1–8 (2019)
Article Google Scholar
Singh, S.K., Sachan, M.K.: Grt: Gurmukhi to Roman transliteration system using character mapping and handcrafted rules. Int. J. Eng. Innov. Technol. 8(9), 2758–2763 (2019)
Google Scholar
Deep, K., Goyal, V.: Development of a Punjabi to English transliteration system. Int. J. Comput. Sci. Commun. 2(2), 521–526 (2011)
Google Scholar
Deep, K., Goyal, V.: English to Tamil transliteration using weka system. Int. J. Recent Trends Eng. 1(1), 498–500 (2009)
Google Scholar
Deep, K., Goyal, V.: Transliteration for resource scarce language. ACM Trans. Asian Lang. Inform. Process. 9(4), 1–30 (2010)
Article Google Scholar
Kore, M., Goyal, V.: Machine transliteration for English to Amharic proper nouns. Int. J. Comput. Sci. Trends Technol. 5(4) (2017)
Bende, M.L.: The origin of Amharic. Ethiop. J. Lang. Lit. 1(1), 41–52 (1983)
Google Scholar
Asker, L., Argaw, A.A., Gambäck, B., Asfeha, S.E., Habte, L.N.: Classifying Amharic web news. Inf. Retrieval 12(3), 416–435 (2009)
Article Google Scholar
Argaw, A.A., Asker, L.: An Amharic stemmer: reducing words to their citation forms. In: Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources, pp. 104–110. (2007)
Gambäck, B., Asker, L.: Experiences with developing language processing tools and corpora for Amharic. In: 2010 IST-Africa, pp. 1–8. IEEE (2010)
Afework, Y.: Automatic Amharic text categorization. M.Sc. Thesis, Addis Ababa University, Addis Ababa (2007)
Bender, M.L., Bowen, J.D., Cooper, R.L., Ferguson, C.A.: Languages in Ethiopia. Oxford University Press, London (1976)
Google Scholar
Mossie, Z., Wang, J.: Social network hate speech detection for Amharic language. Comput. Sci. Inform. Technol. 41–55 (2018)
Mossie, Z., Wang, J.: Vulnerable community identification using hate speech detection on social media. Inf. Process. Manag 57(3), 102087 (2020)
Article Google Scholar
Gagliardone, I., Patel, A., Pohjonen, M.: Mapping and analysing hate speech online: Opportunities and challenges for Ethiopia. SSRN J. (2014). https://doi.org/10.2139/ssrn.2601792
Article Google Scholar
Gagliardone, P.M.I.: Mechachal: online debates and elections in Ethiopia from hate speech to engagement in social media. SSRN J. (2016). https://doi.org/10.2139/ssrn.2831369
Article Google Scholar

Download references

Author information

Andreas Rauber and Solomon Atnafu have contributed equally to this work.

Authors and Affiliations

IT Doctoral Program, Addis Ababa University, 1176, Addis Ababa, Ethiopia
Zeleke Abebaw
Institute of Information Systems Engineering, Technical University of Vienna, Favoritenstraße 9-11/194-04, 1040, Vienna, Austria
Andreas Rauber
Department of Computer Science, Addis Ababa University, 1176, Addis Ababa, Ethiopia
Solomon Atnafu

Authors

Zeleke Abebaw
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Rauber
View author publications
You can also search for this author in PubMed Google Scholar
Solomon Atnafu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zeleke Abebaw.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Abebaw, Z., Rauber, A. & Atnafu, S. Transliterating Latin to Amharic scripts using user-defined rules and character mappings. Int J Digit Libr 24, 63–75 (2023). https://doi.org/10.1007/s00799-023-00346-5

Download citation

Received: 18 January 2022
Revised: 21 January 2023
Accepted: 22 January 2023
Published: 02 March 2023
Issue Date: March 2023
DOI: https://doi.org/10.1007/s00799-023-00346-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transliterating Latin to Amharic scripts using user-defined rules and character mappings

Abstract

Access this article

Similar content being viewed by others

A Study on Transliteration Techniques and Conventional Transliteration Schemes for Indian Languages

Arabic Transliteration of Romanized Tunisian Dialect Text: A Preliminary Investigation

Machine transliteration and transliterated text retrieval: a survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Transliterating Latin to Amharic scripts using user-defined rules and character mappings

Abstract

Access this article

Similar content being viewed by others

A Study on Transliteration Techniques and Conventional Transliteration Schemes for Indian Languages

Arabic Transliteration of Romanized Tunisian Dialect Text: A Preliminary Investigation

Machine transliteration and transliterated text retrieval: a survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation