Skip to main content

Integrated NooJ Environment for Arabic Linguistic Disambiguation Improvement Using MWEs

  • Conference paper
  • First Online:
Formalizing Natural Languages: Applications to Natural Language Processing and Digital Humanities (NooJ 2022)

Abstract

Language resources are a necessary component to language Development in NLP. They are useful for any empirical language study including linguistic analysis, language translation and language disambiguation. The linguistic development environment NooJ (http://www.nooj4nlp.net/) allow formalizing complex linguistic phenomena such as compound words generation, processing as well as analysis. NooJ offers the possibility to use the dynamic library NoojEngine.dll or the command-line program: noojapply.exe. In this study, we will take advantage of the noojapply.exe program that is freely available in the Standard edition of NooJ. Noojapply.exe allows users to apply dictionaries and grammars automatically to texts from external environments.

In this paper, we introduce a module for Arabic MWEs recognition that is based on rules grammar. MWEs module allows recognizing several types of morphosyntactic variations that can occur to a Multi Word Expression. Then, these linguistic resources are compiled to be used as parameters in the command-line noojapply.exe in order to be integrated within an Arabic language processing environment for linguistic disambiguation. Our work is divided into three sections. First, we deal with a literature review on disambiguation tasks in the Arabic language. Then, we give a detailed description of our Integrated NooJ environment for Arabic linguistic disambiguation and the associated grammars. Finally, a set of tests and experiments is carried out to measure the impact of multi- word expression recognition in Word disambiguation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    “El-DicAr” Electronic Dictionary for Arabic linguistic resources.

  2. 2.

    Verbs, nouns and adjectives codes in El-DicAr are listed in the Appendix.

  3. 3.

    Apocopate cutting off the last sound or syllable of a word.

  4. 4.

    A form of the noun in some languages, which shows the relationship of possession or origin between one thing and another.

References

  1. Ditters, E.: A formal grammar for the description of sentence structure in modern standard Arabic. In: The Proceeding of Arabic NLP Workshop at ACL/EACL (2001)

    Google Scholar 

  2. El Jihad, A., Yousfi, A.: Etiquetage morpho-syntaxique des textes arabes par modèle de Markov caché. In: Proceedings of Rencontre des Etudiants Chercheurs en Informatique pour le Traitement Automatique des Langues, pp. 649–65 (2005)

    Google Scholar 

  3. Kamir, D., Soreq, N., Neeman, Y.: A comprehensive NLP system for modern standard Arabic and modern Hebrew. In: Proceedings of the ACL-02 Workshop on Computational Approaches to Semitic Languages. Association for Computational Linguistics, pp. 1–9 (2002)

    Google Scholar 

  4. Attia, M.A.: Accommodating multiword expressions in an Arabic LFG grammar. In: Salakoski, Tapio, Ginter, Filip, Pyysalo, Sampo, Pahikkala, Tapio (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 87–98. Springer, Heidelberg (2006). https://doi.org/10.1007/11816508_11

    Chapter  Google Scholar 

  5. Attia, M.: An ambiguity-controlled morphological analyzer for modern standard Arabic modeling finite state networks. In: Challenges of Arabic for NLP/MT Conference, The British Computer Society, London, UK, vol. 200610, no. 1.72 (2006)

    Google Scholar 

  6. Le Minh, P.: Silicon light emitting devices for integrated applications (2003)

    Google Scholar 

  7. Paroubek, P., Rajman, M.: Etiquetage morpho-syntaxique. Ingénierie des langues, 131–150 (2000)

    Google Scholar 

  8. Silberztein, M.: La formalisation des langues: l’approche NooJ. ISTE, London (2015)

    Google Scholar 

  9. Mesfar, S.: Analyse morpho-syntaxique automatique et reconnaissance des entités nommées en arabe standard (Doctoral dissertation, Université de Franche-Comté. UFR des Sciences du langage, de l’homme et de la société) (2008)

    Google Scholar 

  10. Silberztein, M.: “NooJ’s Dictionaries”. In: the Proceedings of the 2nd Language and Technology Conference, Poznan (2005)

    Google Scholar 

  11. Najar, D., Mesfar, S., Ghezela, H.B.: A large terminological dictionary of arabic compound words. In: Okrut, T., Hetsevich, Y., Silberztein, M., Stanislavenka, H. (eds.) NooJ 2015. CCIS, vol. 607, pp. 16–28. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42471-2_2

    Chapter  Google Scholar 

  12. Najar, D., Mesfar, S.: Opinion mining and sentiment analysis for Arabic on-line texts: application on the political domain. Int. J. Speech Technol. 20(3), 575–585 (2017). https://doi.org/10.1007/s10772-017-9422-4

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dhekra Najar .

Editor information

Editors and Affiliations

Appendix

Appendix

Verb’s codes in El-DicAr

P

Transitive - Indicative

 

I

Intransitive - Past

S

Subjunctive

C

Apocopate

F

Future

Y

Imperative

A

Active form

K

Passive form

Noun’s and adjective’s codes in El-DicAr

a

Accusative

 

u

Nominative

i

Genitive

an

Tanwin, Nominative

un

Tanwin, Accusative

in

Tanwin, Genitive

Noun’s and adjective’s codes in El-DicAr

1, 2, 3

1st, 2d, 3d person

 

M, f

Male, female

S, d, p

Singular, dual, plural

S, d, p

Singular, dual, plural

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Najar, D., Mesfar, S., Ghezela, H.B. (2022). Integrated NooJ Environment for Arabic Linguistic Disambiguation Improvement Using MWEs. In: González, M., Reyes, S.S., Rodrigo, A., Silberztein, M. (eds) Formalizing Natural Languages: Applications to Natural Language Processing and Digital Humanities. NooJ 2022. Communications in Computer and Information Science, vol 1758. Springer, Cham. https://doi.org/10.1007/978-3-031-23317-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23317-3_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23316-6

  • Online ISBN: 978-3-031-23317-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics