Abstract
A successful computational treatment of multiword expressions (MWEs) in natural languages leads to a robust NLP system which considers the long-standing problem of language ambiguity caused primarily by this complex linguistic phenomenon. The first step in addressing this challenge is building an extensive reliable MWEs language resource LR with comprehensive computational representations across all linguistic levels. This forms the cornerstone in understanding the heterogeneous linguistic behaviour of MWEs in their various manifestations. This paper presents a detailed framework for computational representations of Arabic MWEs (ArMWEs) across all linguistic levels based on the state-of-the-art lexical mark-up framework (LMF) with the necessary modifications to suit the distinctive properties of Modern Standard Arabic (MSA). This work forms part of a larger project that aims to develop a comprehensive computational lexicon of ArMWEs for NLP and language pedagogy LP (JOMAL project).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). doi:10.1007/3-540-45715-1_1
Savary, A., Sailer, M., Parmentier, Y., Rosner, M., Rosén, V., Przepiórkowski, A., Krstev, C., Vincze, V., Wójtowicz, B., Losnegaard, G.S.: PARSEME–PARSing and multiword expressions within a European multilingual network. In: 7th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics LTC (2015)
Losnegaard, G.S., Sangati, F., Escartín, C.P., Savary, A., Bargmann, S., Monti, J.: PARSEME survey on MWE resources. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC, Portorož (2016)
Rosén, V., De Smedt, K., Losnegaard, G.S., Bejček, E., Savary, A., Osenova, P.: MWEs in Treebanks: from survey to guidelines. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC, Portorož (2016)
Cardey, S., Chan, R., Greenfield, P.: The development of a multilingual collocation dictionary. In: Proceedings of the Workshop on Multilingual Language Resources and Interoperability, pp. 32–39. Association for Computational Linguistics (2006)
Al-Sabbagh, R., Girju, R., Diesner, J.: Unsupervised construction of a lexicon and a repository of variation patterns for Arabic modal multiword expressions. In: EACL (2014)
Arts, T.: Oxford Arabic Dictionary: Arabic-English, English-Arabic. Oxford University Press, Oxford (2014)
Attia, M.A.: Accommodating multiword expressions in an Arabic LFG grammar. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS, vol. 4139, pp. 87–98. Springer, Heidelberg (2006). doi:10.1007/11816508_11
Butt, M.: A Grammar Writer’s Cookbook. CSLI, Vancouver (1999)
Dipper, S.: Implementing and Documenting Large-scale Grammars-German LFG. Inst. für Maschinelle Sprachverarbeitung, Univ. (2003)
Wanner, L.: Lexical Functions in Lexicography and Natural Language Processing. John Benjamins Publishing, Amsterdam (1996)
Bounhas, I., Slimani, Y.: A hybrid approach for Arabic multi-word term extraction. In: A Hybrid Approach for Arabic Multi-word Term Extraction, pp. 1–8. IEEE Press (2009)
Hawwari, A., Bar, K., Diab, M.: Building an Arabic multiword expressions repository. In: Proceedings of the 50th ACL, pp. 24–29. Citeseer (2012)
Hawwari, A., Attia, M., Diab, M.: A framework for the classification and annotation of multiword expressions in dialectal Arabic. In: Proceedings of the ANLP (2014)
Calzolari, N., Fillmore, C.J., Grishman, R., Ide, N., Lenci, A., MacLeod, C., Zampolli, A.: Towards best practice for multiword expressions in computational lexicons. In: Proceedings of the LREC (2002)
Tanabe, T., Takahashi, M., Shudo, K.: A lexicon of multiword expressions for linguistically precise, wide-coverage natural language processing. Comput. Speech Lang. 28(6), 1317–1339 (2014)
Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)
Bar, K., Diab, M., Hawwari, A.: Arabic multiword expressions. In: Dershowitz, N., Nissan, E. (eds.) Language, Culture, Computation. Computational Linguistics and Linguistics. LNCS, vol. 8003, pp. 64–81. Springer, Heidelberg (2014). doi:10.1007/978-3-642-45327-4_5
Odijk, J.: Identification and lexical representation of multiword expressions. In: Spyns, P., Odijk, J. (eds.) Essential Speech and Language Technology for Dutch. Theory and Applications of Natural Language Processing, pp. 201–217. Springer, Heidelberg (2013)
Francopoulo, G.: LMF Lexical Markup Framework. ISTE Ltd., London (2013)
Francopoulo, G., Huang, C.-R.: Lexical markup framework: an ISO standard for electronic lexicons and its implications for Asian languages. Lexicography 1(1), 37–51 (2014)
Francopoulo, G., George, M.: Language resource management-Lexical markup framework (LMF), ISO/TC (2008)
Atwell, E.: Development of tag sets for part-of-speech tagging. In: Ludeling, A., Kyto, M. (eds.) Corpus Linguistics: An International Handbook, pp. 501–526. Walter de Gruyter (2008)
Alghamdi, A., Atwell, E.: Constructing a corpus-informed listing of Arabic formulaic sequences ArFSs for language pedagogy and technology, Under review paper submitted to International Journal of Corpus Linguistics (2017)
Alghamdi, A., Atwell, E.: An empirical study of Arabic formulaic sequence extraction methods. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC, Portorož, pp. 502–506 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
XML fragment for the MWE, fī ʾams alḥāja,
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Alghamdi, A., Atwell, E. (2017). Towards Comprehensive Computational Representations of Arabic Multiword Expressions. In: Mitkov, R. (eds) Computational and Corpus-Based Phraseology. EUROPHRAS 2017. Lecture Notes in Computer Science(), vol 10596. Springer, Cham. https://doi.org/10.1007/978-3-319-69805-2_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-69805-2_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69804-5
Online ISBN: 978-3-319-69805-2
eBook Packages: Computer ScienceComputer Science (R0)