Abstract
Language resources and particularly the lack of computational lexicons is one of the central problems that face making progress in Moroccan Arabic NLP tasks. Another problem is resources reusability where it is difficult to employ the few existing resources in common NLP tasks. In this paper, we propose to alleviate these problems by building a reusable bi-lingual lexicon addressing both Moroccan Arabic and Arabic languages. For this purpose, we compiled data from different sources including printed and digital lexicons as well as speech and social media text. To meet users’ needs in a systematic and easily accessible way, the developed resource is structured following the Lexical Markup Framework standard and then hosted in software architecture with full respect to interoperability rules. Our lexicon contains almost 13000 lemmas with their Arabic equivalents and is manually annotated with useful metadata such as Part of Speech, Origin, and root. The lexicon is designed for practical NLP use such as morphological analysis and automatic translation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 596 (2008)
Bouamor, H., et al.: The MADAR Arabic Dialect Corpus and Lexicon, Miyazaki, Japan, European Language Resources Association (ELRA), pp. 3387–3396 (2018)
Bouzoubaa, K., Jaafar, Y., Namly, D.T.R.: SAFAR Framework for Arabic Natural Language Processing. ACL, , Kyiv (2021)
de Prémare, A.-L.: Dictionnaire arabe-français: langue et culture marocaines. L’Harmattan, s.l. (2000)
Fellbaum, C., Miller, G.: WordNet: An Electronic Lexical Database. MIT Press, s.l. (1998)
Francopoulo, G., et al.: Lexical Markup Framework (LMF) for NLP Multilingual Resources, pp. 1–8. Association for Computational Linguistics, Sydney (2006)
Grefenstette, G.: The Future of Linguistics and Lexicographers: Will there be Lexicographers in the Year 3000?, pp. 25–41. Euralex, Liège (1998)
Guellil, I., Saâdane, H., Azouaou, F., Gueni, B., Nouvel, D.: Arabic natural language processing: an overview. J. King Saud Univ. Comput. Inf. Sci. 1319–1578 (2019). https://doi.org/10.1016/j.jksuci.2019.02.006
Habash, N., Diab, M., Rabmow, O.: Conventional Orthography for Dialectal Arabic. s.n., Istanbul (2012)
Habash, N., et al.: Unified Guidelines and Resources for Arabic Dialect Orthography. European Language Resources Association (ELRA), Miyazaki (2018)
Harrell, R.S.: A Dictionary of Moroccan Arabic: Moroccan English. Georgetown University Press, s.l. (1963)
Iraqui Sinaceur, Z.: Le Dictionnaire Colin d’Arabe Dialectal Marocain. Ministère des Affaires Culturelles, al-Manahil (1994)
Jaafar, H.: PhD Thesis: Les noms et les adjectifs en arabe marocain. s.n., Fes (2012)
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Litkowski, K.C.: Computational lexicons and dictionaries. Encyclopedia Lang. Linguist. 2, 753–761 (2005)
Meniani, H.: Moroccan Arabic Origins. GMS Print, Casablanca (2017)
Mrini, K., Bond, F.: Building the Moroccan Darija Wordnet (MDW) using Bilingual Resources. s.n., Casablanca (2017)
Mrini, K., Bond, F.: Putting Figures on Influences on Moroccan Darija from Arabic, French and Spanish using the WordNet. s.n., Singapore (2018)
Outchakoucht, A., Es-samaali, H.: Moroccan Dialect-Darija-Open Dataset. Online, s.n. (2021)
Piotr, B., PrzepiĂłrkowski, A.: TEI P5 as a Text Encoding Standard for Multilevel Corpus Annotation. Office for Humanities Communication and Centre for Computing, London (2010)
Rosner, M., Ahmadi, S., Apostol, E.-S.: Cross-lingual link discovery for under-resourced languages. In: 13th Conference on Language Resources and Evaluation (LREC 2022), pp. 181–192. European Language Resources Association (ELRA), Marseille (2022)
Tachicart, R., et al.: Towards automatic normalization of the Moroccan dialectal Arabic user generated text. In: Arabic Language Processing: From Theory to Practice, pp. 264–275. Springer, s.l. (2019)
Tachicart, R., Bouzoubaa, K.: Moroccan Arabic vocabulary generation using a rule-based approach. J. King Saud Univ. Comput. Inf. Sci. (2021)
Zarnoufi, R., Jaafar, H., Abik, M.: Machine normalization: bringing social media text from non-standard to standard form. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 19(4), 1–30 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tachicart, R., Bouzaoubaa, K., Namly, D. (2025). Compiling a Bilingual Lexicon Using a Semi-automatic Approach. In: Hdioud, B., Aouragh, S.L. (eds) Arabic Language Processing: From Theory to Practice. ICALP 2024. Communications in Computer and Information Science, vol 2340. Springer, Cham. https://doi.org/10.1007/978-3-031-80438-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-80438-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-80437-3
Online ISBN: 978-3-031-80438-0
eBook Packages: Computer ScienceComputer Science (R0)