Skip to main content

A Purely Surface-Oriented Approach to Handling Arabic Morphology

  • Conference paper
  • First Online:
Formal Grammar (FG 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11668))

Included in the following conference series:

  • 290 Accesses

Abstract

In this paper, we introduce a completely lexicalist approach to deal with Arabic morphology. This purely surface-oriented treatment is part of a comprehensive mathematical approach to integrate Arabic syntax and semantics using overt morphological features in the string-to-meaning translation. The basic motivation of our approach is to combine semantic representations with formal descriptions of morphological units. That is, the lexicon is a collection of signs; each sign \(\delta \) is a triple \(\delta = \langle E, C, M\rangle \), such that E is the exponent, C is the combinatorics and M is the meaning of the sign. Here, we are only concerned with the exponents, i.e. the components of a morphosemantic lexicon (for a fragment of Arabic). To remain surface-oriented, we allow for discontinuity in the constituents; constituents are sequences of strings, which can only be concatenated or duplicated, but no rule can delete, add or modify any string. Arabic morphology is very well known for its complexity and richness. The word formation in Arabic poses real challenges because words are derived from roots, which bear the core meaning of their derivatives, formed by inserting vowels and maybe other consonants. The units in the sequences are so-called glued strings rather than only strings. A glued string is a string that has left and right context conditions. Optimally morphs are combined in a definite and non-exceptional linear way, as in many cases in different languages (e.g. plural in English). The process of Arabic word formation is rather complex; it is not just a sequential concatenation of morphs by placing them next to each other. But the constituents are discontinuous. Vowels and more consonants are inserted between, before and after the root consonants resulting in what we call “fractured glued string”, i.e. as a sequence of glued strings combined in diverse ways; forward concatenation, backward concatenation, forward wrapping, reduction, forward transfixation and, going beyond the multi-context free grammars (MCFGs), also reduplication.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The basic notions of the proposed approach are introduced in [3]. However, we concentrate in this paper on the most related ones to the nature of Arabic morphology and how they are applied to it.

  2. 2.

    In this work, we use Buckwalter’s transliteration model, but we made some modifications to make our work easier.

  3. 3.

    There shouldn’t be a confusion with the term sign as used in the abstract. A sign here is just plus \(+\) or minus −.

References

  1. CIA: CIA World Fact Book. Central Intelligence Agency, Washington, D.C. (2018)

    Google Scholar 

  2. Habash, N.: Introduction to Arabic natural language processing. Morgan and Claypool Publishers (2010)

    Google Scholar 

  3. Kracht, M.: Agreement morphology, argument structure and syntax. Revision 8 (2016, unpublished manuscript)

    Google Scholar 

  4. Ryding, K.: A Reference Grammar of Modern Standard Arabic. Cambridge University Press, Cambridge (2005)

    Book  Google Scholar 

  5. Al-Sughaiyer, I., Al-Kharashi, I.: Arabic morphological analysis techniques: a comprehensive survey. J. Assoc. Inf. Sci. Technol. 55(3), 189–213 (2004)

    Article  Google Scholar 

  6. Soudi, A., Neumann, G., Van den Bosch, A.: Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Springer, Cham (2007). https://doi.org/10.1007/978-1-4020-6046-5

    Book  Google Scholar 

  7. Dichy, J., Farghaly, A.: Grammar-lexis relations in the computational morphology of Arabic. In: Soudi, A., Neumann, G., Van den Bosch, A. (eds.) Arabic Computational Morphology: Knowledge-based and Empirical Methods, pp. 115–140. Springer, Dordrecht (2007). https://doi.org/10.1007/978-1-4020-6046-5_7

    Chapter  Google Scholar 

  8. Boudchiche, M., et al.: AlKhalil Morpho Sys 2: a robust Arabic morpho-syntactic analyzer. J. King Saud Univ.-Comput. Inf. Sci. 29(2), 141–146 (2017)

    Google Scholar 

  9. Sawalha, M., Atwell, E.: Comparative evaluation of Arabic language morphological analysers and stemmers. In: Coling 2008: Companion volume: Posters, pp. 107–110 (2008)

    Google Scholar 

  10. Kay, M.: Nonconcatenative finite-state morphology. In: Proceedings of the Third Conference of the European chapter of the Association for Computational Linguistics, pp. 2–10 (1987)

    Google Scholar 

  11. Beesley, K.: Finite-state morphological analysis and generation of arabic at xerox research: status and plans in 2001. In: ACL Workshop on Arabic Language Processing: Status and Perspective, pp. 1–8 (2001)

    Google Scholar 

  12. Attia, M., et al.: A corpus-based finite-state morphological toolkit for contemporary Arabic. J. Logic Comput. 24(2), 455–472 (2013)

    Article  Google Scholar 

  13. Aboamer, Y., Farghaly, A.: Mariam ComLex: A Bi-Directional Finite State Morphological Transducer for MSA. In: The 29th Annual Symposium on Arabic Linguistics, at the University of Wisconsin-Milwaukee, USA (2015)

    Google Scholar 

  14. Buckwalter, T.: Buckwalter Arabic Morphological Analyzer, Version 1.0. Linguistic Data Consortium, University of Pennsylvania, LDC Catalog No: LDC 2002 L49 (2002). ISBN 1-58563-257-0

    Google Scholar 

  15. Buckwalter, T.: Buckwalter Arabic Morphological Analyzer, Version 2.0. Linguistic Data Consortium, University of Pennsylvania, LDC Catalog No: LDC 2004 L02 (2004). ISBN 1-58563-324-0

    Google Scholar 

  16. Habash, N., Rambow, O, Roth, R.: MADA + TOKAN: a toolkit for arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In: Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt (2009)

    Google Scholar 

  17. Maamouri, M., et al.: LDC Standard Arabic morphological analyzer SAMA v. 3.1. Linguistic Data Consortium, University of Pennsylvania, LDC Catalog No. LDC2010L01. ISBN 1-58563-555-3

    Google Scholar 

  18. Sawalha, M., Atwell, E., Abushariah, M.: SALMA: standard arabic language morphological analysis. In: 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA), pp. 1–6 (2013)

    Google Scholar 

  19. Abdelali, A., et al.: Farasa: a fast and furious segmenter for Arabic. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 11–16 (2016)

    Google Scholar 

  20. Taji, D., et al.: An Arabic morphological analyzer and generator with copious features. In: Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology, pp. 140–150 (2018)

    Google Scholar 

  21. Habash, N., Eskander, R., Hawwari, A.: A morphological analyzer for Egyptian Arabic. In: Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology, pp. 1–9 (2012)

    Google Scholar 

  22. Partee, B., ter Meulen, A., Wall, R.: Mathematical Methods in Linguistic. Linguistic Society of America (1990)

    Google Scholar 

  23. Crystal, D.: A Dictionary of Linguistics and Phonetics, 6th edn. Blackwell Publishing Ltd. (2008)

    Google Scholar 

  24. Seki, H., et al.: On multiple context-free grammars. Theor. Comput. Sci. 88(2), 191–229 (1991)

    Article  MathSciNet  Google Scholar 

  25. Kracht, M., Aboamer, Y.: Argument structure and referent systems. In: 12th International Conference on Computational Semantics IWCS (2017)

    Google Scholar 

  26. McCarthy, J.: A prosodic theory of nonconcatenative morphology. Linguist. Inquiry 12(3), 373–418 (1981)

    Google Scholar 

  27. Kasami, T., Seki, H., Fujii, M.: Generalized Context-free Grammars, Multiple Context-free Grammars and Head Grammars. Preprint of WG on Natural Language of IPSJ (1987)

    Google Scholar 

  28. Soudi, A., Violetta C., Jamari, A.: The Arabic noun system generation. In: Proceedings of the International Symposium on the Processing of Arabic (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yousuf Aboamer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer-Verlag GmbH Germany, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aboamer, Y., Kracht, M. (2019). A Purely Surface-Oriented Approach to Handling Arabic Morphology. In: Bernardi, R., Kobele, G., Pogodalla, S. (eds) Formal Grammar. FG 2019. Lecture Notes in Computer Science(), vol 11668. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-59648-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-59648-7_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-59647-0

  • Online ISBN: 978-3-662-59648-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics