ABSTRACT
We present in this paper1 a morpho-syntactical description with broad coverage of lexical entries of standard/classical Arabic. This work will be presented in the form of an electronic dictionary named Al-Erfan, based on operators of the NooJ platform and implemented by local grammars in the form of finite state machines (FST). Our work is inspired by the mathematical model of Z. Harris (the transformations) and the linguistic theoretical framework "lexicon grammar" developed by Maurice Gross. The starting point of our approach is the fundamental fact that Arabic is based on the merger between the two components: Root/Pattern. This is opposed to the set-theoretic approach represented by the formula Prefix-Lemma-Suffix, which is specific to the morpho-syntactic system of the Latin languages. Our approach consists in the fusion of 480 patterns of Arabic which operate on 12400 usual roots constituting the base of any morpho-syntactic derivation of this language. The implementation of this process, via the linguistic-computer techniques of the NooJ platform, has enabled us to generate more than 120 million entries including all morpholexical categories. These data are all contained in the electronic dictionary Al-Erfan developed from a database built manually during the past 20 years in different research laboratories specialized in ANLP. We will conclude this article by examining the category V-a extracted from our Al-Erfan electronic dictionary.
- Beesley, Kenneth R & Karttunen, Lauri, Finite-states non concatenative morphotactics. Procedings of the 38th annaul meeting of the association for comutational linguistics (ACL-00), 2000,191--198. Google ScholarDigital Library
- Boudchiche M, Mazroui A, Ould Abdallahi Ould Bebah M, Lakhouaja A, Boudlal A, AlKhalil Morpho Sys 2: A robust Arabic morpho-syntactic analyzer, Journal of King Saud University - Computer and Information Sciences, Volume 29, Issue 2 (2017) 141--146. Google ScholarDigital Library
- Buckwalter Tim, Backwalter arabic morphological analyzer, Version 1.0, Linguistic data consortium, Philadelphia, 2002.Google Scholar
- Diab, Mona & alii, Automatic processing of modern standard arabic text, Soudi Abdelhadi (editor), 2007, Springer.Google Scholar
- Dichy J, Linguistic Knowledge integration in optical Arabic word and text recognition process, Linguistica Communicatio journal, Sprcial issues, 2013.Google Scholar
- Elghamry, Khaled, A constraint-based algotithm for the identification of arabic roots, Proceeding of the Midwest computational linguistics colloquium, 2004.Google Scholar
- El Hannach, Mohamed, Sytaxe des verbes psychologiques de l'arabe, Thèse de doctorat d'Etat, Université Paris VII, 1988.Google Scholar
- El Hannach, Mohamed, Syntaxe des verbes qualitatifs de l'arabe, Synergie monde arabe, Vol. I, 2001.Google Scholar
- Farghaly Ali, Handbook for language engineers, CSLI Publications, 2003. Google ScholarDigital Library
- Goldsmith, John A, An algorithm for the unsupervised learning of morphology, Natural language engineering, 2006, 12 (4): 353--371. Google ScholarDigital Library
- Gross Maurice, Métodes en synatxe, Hermann, Paris, 1975.Google Scholar
- Harris, Zellig S, Structure mathématique du langage, Duno, Paris, 1972.Google Scholar
- Isabelle T, Apprentissage automatique pour le TAL, inria-00541535, 2010.Google Scholar
- Kenneth R. Beesly, Arabic finite-state Morphological analysis and generation, Bank Xerox research center, Gonoble, 2009.Google Scholar
- Khaled Shaalan, Amin Allam, and Abdallah Gomah, Towards Automatic Spell Checking for Arabic, Conference on Language Engineering, ELSE, Cairo, Egypt, 2003, 36.Google Scholar
- Nizar Habash & Ryan M Roth, CATib: The Columbia Arabic treebank, Proceeding of the ACL-IJCNLP Conference Short Papers, 2009, 221--224. Google ScholarDigital Library
- Soudi, Abdelhadi & alii, Arabic Computational Morphology: Knowledge-Based and Empirical Methods, 2007, Springer. Google ScholarDigital Library
- Saleh Najim, Inheritance-based Approach to Arabic Verbal Root-and-Pattern Morphology, Soudi A, 2007, Springer.Google Scholar
- Siberztein Max & al., Atomatic Processing of Natural-Language Electronic Texts with Nooj, 2015.Google Scholar
- Silberztein Max, La formalisation des langues: l'approche NoojIste editions, London, 2015.Google Scholar
- مدخل إلى اللسانيات الحاسوبية، تنسيق عبد لله بن يحي الفيفي، مركز الملك عبد لله للغة العربية،الرياض 2017 (كتاب جماعي)Google Scholar
- الخلاف بين النحاة البصريين والكوفيين، أبو البركات بن الانباري،Google Scholar
- لغويات المدونة الحاسوبية، المنهج والنظرية والتطبيق، طوني ماك إينري، و أندريو هاردي، ترجمة د. سلطان بن ناصر المجيول، دار جامعة الملك سعود للنشر، 2016Google Scholar
- المعالجة الآلية للغة العربية، المشاكل والحلول، دة. سلوى حمادة، دار غريب، القاهرة، 2009Google Scholar
- لغويات المدونة الحاسوبية، تطبيقاتها تحليلية على العربية الطبيعية، د. سلطان المجيول، مركز الملك عبد لله للغة العربية، 2016Google Scholar
Index Terms
- Formalization of the Arabic grammatical category (V-a) using the NooJ platform
Recommendations
A computer science electronic dictionary for NOOJ
NLDB'07: Proceedings of the 12th international conference on Applications of Natural Language to Information SystemsAn automatic text analysis system cannot lexically recognize a word unless it already exists in the electronic dictionary. Our works applies to the NOOJ system. Work remains to be made to build terminological dictionaries. To build the terms dictionary, ...
A Link Prediction Approach for Accurately Mapping a Large-scale Arabic Lexical Resource to English WordNet
Success of Natural Language Processing (NLP) models, just like all advanced machine learning models, rely heavily on large -scale lexical resources. For English, English WordNet (EWN) is a leading example of a large-scale resource that has enabled ...
Building an Arabic Sentiment Lexicon Using Semi-supervised Learning
Sentiment analysis is the process of determining a predefined sentiment from text written in a natural language with respect to the entity to which it is referring. A number of lexical resources are available to facilitate this task in English. One such ...
Comments