Abstract
Language resources are a necessary component to language Development in NLP. They are useful for any empirical language study including linguistic analysis, language translation and language disambiguation. The linguistic development environment NooJ (http://www.nooj4nlp.net/) allow formalizing complex linguistic phenomena such as compound words generation, processing as well as analysis. NooJ offers the possibility to use the dynamic library NoojEngine.dll or the command-line program: noojapply.exe. In this study, we will take advantage of the noojapply.exe program that is freely available in the Standard edition of NooJ. Noojapply.exe allows users to apply dictionaries and grammars automatically to texts from external environments.
In this paper, we introduce a module for Arabic MWEs recognition that is based on rules grammar. MWEs module allows recognizing several types of morphosyntactic variations that can occur to a Multi Word Expression. Then, these linguistic resources are compiled to be used as parameters in the command-line noojapply.exe in order to be integrated within an Arabic language processing environment for linguistic disambiguation. Our work is divided into three sections. First, we deal with a literature review on disambiguation tasks in the Arabic language. Then, we give a detailed description of our Integrated NooJ environment for Arabic linguistic disambiguation and the associated grammars. Finally, a set of tests and experiments is carried out to measure the impact of multi- word expression recognition in Word disambiguation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
“El-DicAr” Electronic Dictionary for Arabic linguistic resources.
- 2.
Verbs, nouns and adjectives codes in El-DicAr are listed in the Appendix.
- 3.
Apocopate cutting off the last sound or syllable of a word.
- 4.
A form of the noun in some languages, which shows the relationship of possession or origin between one thing and another.
References
Ditters, E.: A formal grammar for the description of sentence structure in modern standard Arabic. In: The Proceeding of Arabic NLP Workshop at ACL/EACL (2001)
El Jihad, A., Yousfi, A.: Etiquetage morpho-syntaxique des textes arabes par modèle de Markov caché. In: Proceedings of Rencontre des Etudiants Chercheurs en Informatique pour le Traitement Automatique des Langues, pp. 649–65 (2005)
Kamir, D., Soreq, N., Neeman, Y.: A comprehensive NLP system for modern standard Arabic and modern Hebrew. In: Proceedings of the ACL-02 Workshop on Computational Approaches to Semitic Languages. Association for Computational Linguistics, pp. 1–9 (2002)
Attia, M.A.: Accommodating multiword expressions in an Arabic LFG grammar. In: Salakoski, Tapio, Ginter, Filip, Pyysalo, Sampo, Pahikkala, Tapio (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 87–98. Springer, Heidelberg (2006). https://doi.org/10.1007/11816508_11
Attia, M.: An ambiguity-controlled morphological analyzer for modern standard Arabic modeling finite state networks. In: Challenges of Arabic for NLP/MT Conference, The British Computer Society, London, UK, vol. 200610, no. 1.72 (2006)
Le Minh, P.: Silicon light emitting devices for integrated applications (2003)
Paroubek, P., Rajman, M.: Etiquetage morpho-syntaxique. Ingénierie des langues, 131–150 (2000)
Silberztein, M.: La formalisation des langues: l’approche NooJ. ISTE, London (2015)
Mesfar, S.: Analyse morpho-syntaxique automatique et reconnaissance des entités nommées en arabe standard (Doctoral dissertation, Université de Franche-Comté. UFR des Sciences du langage, de l’homme et de la société) (2008)
Silberztein, M.: “NooJ’s Dictionaries”. In: the Proceedings of the 2nd Language and Technology Conference, Poznan (2005)
Najar, D., Mesfar, S., Ghezela, H.B.: A large terminological dictionary of arabic compound words. In: Okrut, T., Hetsevich, Y., Silberztein, M., Stanislavenka, H. (eds.) NooJ 2015. CCIS, vol. 607, pp. 16–28. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42471-2_2
Najar, D., Mesfar, S.: Opinion mining and sentiment analysis for Arabic on-line texts: application on the political domain. Int. J. Speech Technol. 20(3), 575–585 (2017). https://doi.org/10.1007/s10772-017-9422-4
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Verb’s codes in El-DicAr | P | Transitive - Indicative |
I | Intransitive - Past | |
S | Subjunctive | |
C | Apocopate | |
F | Future | |
Y | Imperative | |
A | Active form | |
K | Passive form | |
Noun’s and adjective’s codes in El-DicAr | a | Accusative |
u | Nominative | |
i | Genitive | |
an | Tanwin, Nominative | |
un | Tanwin, Accusative | |
in | Tanwin, Genitive | |
Noun’s and adjective’s codes in El-DicAr | 1, 2, 3 | 1st, 2d, 3d person |
M, f | Male, female | |
S, d, p | Singular, dual, plural | |
S, d, p | Singular, dual, plural |
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Najar, D., Mesfar, S., Ghezela, H.B. (2022). Integrated NooJ Environment for Arabic Linguistic Disambiguation Improvement Using MWEs. In: González, M., Reyes, S.S., Rodrigo, A., Silberztein, M. (eds) Formalizing Natural Languages: Applications to Natural Language Processing and Digital Humanities. NooJ 2022. Communications in Computer and Information Science, vol 1758. Springer, Cham. https://doi.org/10.1007/978-3-031-23317-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-23317-3_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23316-6
Online ISBN: 978-3-031-23317-3
eBook Packages: Computer ScienceComputer Science (R0)