Abstract
In this paper, we introduce a completely lexicalist approach to deal with Arabic morphology. This purely surface-oriented treatment is part of a comprehensive mathematical approach to integrate Arabic syntax and semantics using overt morphological features in the string-to-meaning translation. The basic motivation of our approach is to combine semantic representations with formal descriptions of morphological units. That is, the lexicon is a collection of signs; each sign \(\delta \) is a triple \(\delta = \langle E, C, M\rangle \), such that E is the exponent, C is the combinatorics and M is the meaning of the sign. Here, we are only concerned with the exponents, i.e. the components of a morphosemantic lexicon (for a fragment of Arabic). To remain surface-oriented, we allow for discontinuity in the constituents; constituents are sequences of strings, which can only be concatenated or duplicated, but no rule can delete, add or modify any string. Arabic morphology is very well known for its complexity and richness. The word formation in Arabic poses real challenges because words are derived from roots, which bear the core meaning of their derivatives, formed by inserting vowels and maybe other consonants. The units in the sequences are so-called glued strings rather than only strings. A glued string is a string that has left and right context conditions. Optimally morphs are combined in a definite and non-exceptional linear way, as in many cases in different languages (e.g. plural in English). The process of Arabic word formation is rather complex; it is not just a sequential concatenation of morphs by placing them next to each other. But the constituents are discontinuous. Vowels and more consonants are inserted between, before and after the root consonants resulting in what we call “fractured glued string”, i.e. as a sequence of glued strings combined in diverse ways; forward concatenation, backward concatenation, forward wrapping, reduction, forward transfixation and, going beyond the multi-context free grammars (MCFGs), also reduplication.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The basic notions of the proposed approach are introduced in [3]. However, we concentrate in this paper on the most related ones to the nature of Arabic morphology and how they are applied to it.
- 2.
In this work, we use Buckwalter’s transliteration model, but we made some modifications to make our work easier.
- 3.
There shouldn’t be a confusion with the term sign as used in the abstract. A sign here is just plus \(+\) or minus −.
References
CIA: CIA World Fact Book. Central Intelligence Agency, Washington, D.C. (2018)
Habash, N.: Introduction to Arabic natural language processing. Morgan and Claypool Publishers (2010)
Kracht, M.: Agreement morphology, argument structure and syntax. Revision 8 (2016, unpublished manuscript)
Ryding, K.: A Reference Grammar of Modern Standard Arabic. Cambridge University Press, Cambridge (2005)
Al-Sughaiyer, I., Al-Kharashi, I.: Arabic morphological analysis techniques: a comprehensive survey. J. Assoc. Inf. Sci. Technol. 55(3), 189–213 (2004)
Soudi, A., Neumann, G., Van den Bosch, A.: Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Springer, Cham (2007). https://doi.org/10.1007/978-1-4020-6046-5
Dichy, J., Farghaly, A.: Grammar-lexis relations in the computational morphology of Arabic. In: Soudi, A., Neumann, G., Van den Bosch, A. (eds.) Arabic Computational Morphology: Knowledge-based and Empirical Methods, pp. 115–140. Springer, Dordrecht (2007). https://doi.org/10.1007/978-1-4020-6046-5_7
Boudchiche, M., et al.: AlKhalil Morpho Sys 2: a robust Arabic morpho-syntactic analyzer. J. King Saud Univ.-Comput. Inf. Sci. 29(2), 141–146 (2017)
Sawalha, M., Atwell, E.: Comparative evaluation of Arabic language morphological analysers and stemmers. In: Coling 2008: Companion volume: Posters, pp. 107–110 (2008)
Kay, M.: Nonconcatenative finite-state morphology. In: Proceedings of the Third Conference of the European chapter of the Association for Computational Linguistics, pp. 2–10 (1987)
Beesley, K.: Finite-state morphological analysis and generation of arabic at xerox research: status and plans in 2001. In: ACL Workshop on Arabic Language Processing: Status and Perspective, pp. 1–8 (2001)
Attia, M., et al.: A corpus-based finite-state morphological toolkit for contemporary Arabic. J. Logic Comput. 24(2), 455–472 (2013)
Aboamer, Y., Farghaly, A.: Mariam ComLex: A Bi-Directional Finite State Morphological Transducer for MSA. In: The 29th Annual Symposium on Arabic Linguistics, at the University of Wisconsin-Milwaukee, USA (2015)
Buckwalter, T.: Buckwalter Arabic Morphological Analyzer, Version 1.0. Linguistic Data Consortium, University of Pennsylvania, LDC Catalog No: LDC 2002 L49 (2002). ISBN 1-58563-257-0
Buckwalter, T.: Buckwalter Arabic Morphological Analyzer, Version 2.0. Linguistic Data Consortium, University of Pennsylvania, LDC Catalog No: LDC 2004 L02 (2004). ISBN 1-58563-324-0
Habash, N., Rambow, O, Roth, R.: MADA + TOKAN: a toolkit for arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In: Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt (2009)
Maamouri, M., et al.: LDC Standard Arabic morphological analyzer SAMA v. 3.1. Linguistic Data Consortium, University of Pennsylvania, LDC Catalog No. LDC2010L01. ISBN 1-58563-555-3
Sawalha, M., Atwell, E., Abushariah, M.: SALMA: standard arabic language morphological analysis. In: 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA), pp. 1–6 (2013)
Abdelali, A., et al.: Farasa: a fast and furious segmenter for Arabic. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 11–16 (2016)
Taji, D., et al.: An Arabic morphological analyzer and generator with copious features. In: Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology, pp. 140–150 (2018)
Habash, N., Eskander, R., Hawwari, A.: A morphological analyzer for Egyptian Arabic. In: Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology, pp. 1–9 (2012)
Partee, B., ter Meulen, A., Wall, R.: Mathematical Methods in Linguistic. Linguistic Society of America (1990)
Crystal, D.: A Dictionary of Linguistics and Phonetics, 6th edn. Blackwell Publishing Ltd. (2008)
Seki, H., et al.: On multiple context-free grammars. Theor. Comput. Sci. 88(2), 191–229 (1991)
Kracht, M., Aboamer, Y.: Argument structure and referent systems. In: 12th International Conference on Computational Semantics IWCS (2017)
McCarthy, J.: A prosodic theory of nonconcatenative morphology. Linguist. Inquiry 12(3), 373–418 (1981)
Kasami, T., Seki, H., Fujii, M.: Generalized Context-free Grammars, Multiple Context-free Grammars and Head Grammars. Preprint of WG on Natural Language of IPSJ (1987)
Soudi, A., Violetta C., Jamari, A.: The Arabic noun system generation. In: Proceedings of the International Symposium on the Processing of Arabic (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer-Verlag GmbH Germany, part of Springer Nature
About this paper
Cite this paper
Aboamer, Y., Kracht, M. (2019). A Purely Surface-Oriented Approach to Handling Arabic Morphology. In: Bernardi, R., Kobele, G., Pogodalla, S. (eds) Formal Grammar. FG 2019. Lecture Notes in Computer Science(), vol 11668. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-59648-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-662-59648-7_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-59647-0
Online ISBN: 978-3-662-59648-7
eBook Packages: Computer ScienceComputer Science (R0)