Skip to main content

Parsing Arabic with a Semi-automatically Generated TAG: Dealing with Linguistic Phenomena

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2018)

Abstract

Arabic is a challenging language when it comes to grammar production and parsing. It combines complex linguistic phenomena with a rich morphology that make its processing particularly ambiguous. This leaded us to choose the Tree-Adjoining Grammar (TAG) formalism. Indeed, TAG provides sufficient constraints for handling diverse linguistic phenomena and seems to be adequate to represent Arabic syntactic structures. In this paper, we present a semi-automatically generated TAG for modern standard Arabic using a compiler and a metagrammatical description language called XMG (eXtensible MetaGrammar). We describe the linguistic coverage of our grammar, and show how we used TAG and XMG’s properties to define in an expressive and concise way different linguistic phenomena. To check the coverage of our grammar, we have set up a development environment including a parser and using a test corpus of linguistic phenomena gathering both grammatical and ungrammatical sentences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://sourcesup.renater.fr/xmg/frenchmetagrammar/index.html.

  2. 2.

    http://homepages.inf.ed.ac.uk/s0896251/XMG-basedXTAG/titlepage.html.

  3. 3.

    http://www.sfs.uni-tuebingen.de/emmy/res.html.

  4. 4.

    XMG2 extends XMG by including a meta metagrammar compiler.

  5. 5.

    A black node is a resource and can be unified with 0 or more white nodes; a white node is a need and must be unified with a black node; a red node is saturated and cannot be unified with any other node.

  6. 6.

    In our metagrammatical description, tree fragment names are in French (e.g. EpineVerbe) and so are syntactic categories (e.g. SV for Syntagme Verbal).

  7. 7.

    For the elliptical subject.

  8. 8.

    ObjetCanon [Objet1] \(\longrightarrow \) ObjetCanonSN[Objet1] \(\vee \) ObjetCanonClit[Objet1] \(\vee \) ObjetIndCanon[Objet1]

  9. 9.

    ObjetCanon[Objet2]\(\longrightarrow \) ObjetCanonSN[Objet2] \(\vee \) ObjetCanonClit[Objet2] \(\vee \) ObjetIndCanon[Objet2]

  10. 10.

    In order to decrease the size of the image some features have been omitted.

  11. 11.

    We did not include phrasal structures.

References

  1. Belguith, L., Aloulou, C., Ben Hamadou.: MASPAR: De la segmentation À l’analyse syntaxique de textes arabes. CÉPADUÈS-Editions, editeur, Revue Information Interaction Intelligence I, Vol. 3, 9–36 (2007)

    Google Scholar 

  2. Loukam, M., Laskri, M.T.: PHARAS: Une plateforme d’analyse basée sur le formalisme HPSG pour l’arabe standard: Développements récents et perspectives. JED’08, Journées de l’Ecole Doctorale, University Badji Mokhtar, Annaba, Algeria (2008)

    Google Scholar 

  3. Attia, M.: Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation. Ph.D. Dissertation. University of Manchester, Faculty of Humanities (2008)

    Google Scholar 

  4. Habash, N. and Rambow, O.: Extracting a tree adjoining grammar from the penn arabic treebank. In: Proceedings of Traitement Automatique du Langage Naturel (TALN-04), pp. 277–284 (2004)

    Google Scholar 

  5. Crabbé, B., Duchier, D., Gardent, C.. Le., Roux, J., Parmentier, Y.: XMG : eXtensible MetaGrammar. Comput. Linguist. 39(3), 591–629 (2013)

    Google Scholar 

  6. Ben Khelil, C., Duchier, D., Parmentier, P., Zribi, C., Ben Fraj, F.: ArabTAG : from a Handcrafted to a Semi-automatically Generated TAG, In TAG+12 : 12th International Workshop on Tree-Adjoining Grammars and Related Formalisms, Düsseldorf, Germany (2016)

    Google Scholar 

  7. Joshi, A., Levy, L., Takahashi, M.: Tree adjunct grammars. J. Comput. Syst. Sci. 10(1), 136–163 (1975)

    Article  Google Scholar 

  8. Maamouri, M., Bies, A., Jin, H., Buckwalter, T.: Arabic treebank: Part 1 v 2.0. LDC Catalog No.: LDC2003T06, ISBN: 1-58563-261-9, ISLRN: pp. 333-321-196-670-5 (2003)

    Google Scholar 

  9. Maamouri, M., Bies, A.: Developing an arabic treebank: Methods, guidelines, procedures, and tools. In: Ali Farghaly and Karine Megerdoomian, editors, COLING 2004 Computational Approaches to Arabic Script-based Languages, pp. 2–9, Geneva, Switzerland (2004)

    Google Scholar 

  10. Ben Fraj, F.: Construction d’une grammaire d’arbres adjoints pour la langue arabe. In: Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles, Montpellier, France, June. Association pour le Traitement Automatique des Langues (2011)

    Google Scholar 

  11. Kouloughli, D.: La grammaire Arabe pour tous. Press Pocket (1992)

    Google Scholar 

  12. Simon Petitjean, S.: Génération Modulaire de Grammaires Formelles. Ph.D. thesis, Université d’Orléans, France (2014)

    Google Scholar 

  13. XTAG Research Group,: A lexicalized tree adjoining grammar for english, Technical Report IRCS-01-03, IRCS, University of Pennsylvania (2001)

    Google Scholar 

  14. Parmentier, Y., Kallmeyer, L., Lichte, T., Maier, W., Dellert, J.: TuLiPA : A Syntax-Semantics Parsing Environment for Mildly Context-Sensitive Formalisms. In: 9th International Workshop on Tree-Adjoining Grammar and Related Formalisms (TAG+9),121–128, Tübingen, Germany (2008)

    Google Scholar 

  15. Ben Khelil, C., Ben Othmane Zribi, C., Duchier, D., Parmentier, Y.: A new syntactic-semantic interface for ArabTAG an Arabic Tree Adjoining grammar. In: Proceedings of International Arabic Conference of Information Technology (ACIT 2017). Hammamet, Tunisia (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cherifa Ben Khelil .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ben Khelil, C., Othmane Zribi, C.B., Duchier, D., Parmentier, Y. (2023). Parsing Arabic with a Semi-automatically Generated TAG: Dealing with Linguistic Phenomena. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2018. Lecture Notes in Computer Science, vol 13397. Springer, Cham. https://doi.org/10.1007/978-3-031-23804-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23804-8_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23803-1

  • Online ISBN: 978-3-031-23804-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics