Abstract
This paper attempts to deal with the problems resulting from the translation of Arabic Multiword Expressions (MWEs), this translation may lead to many difficulties due to the specialized parallel corpus (Arabic-French texts). We first extract monolingual MWEs from each part of the parallel corpus. The second step consists of acquiring bilingual (Arabic-French and Arabic-English) correspondances of MWEs. In order to assess the quality of the mined expression, a statistical and symbolic approach for NooJ Machine Translation (NMT) task-based evaluation is followed. We investigate the performance of a hybrid strategy to integrate extern lexical resources and bilingual MWEs in NMT system. We propose, here, two discriminative strategies to integrate Arabic MWEs in a real parsing context (identification with pre-regrouping and re-ranking parses) with features dedicated to Arabic MWEs. Experimental results show that such a structure as a lexical entry improves the quality of translation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Meghawri, S., et al.: Semantic extraction of Arabic multiword expressions. Computer science and Information Technology (CS&IT), pp. 23–31 (2015). https://doi.org/10.5121/csit.2015.50203
Najar, D., Mesfar, S., Ghezela, H.B.: A large terminological dictionary of Arabic compound words. In: Okrut, T., Hetsevich, Y., Silberztein, M., Stanislavenka, H. (eds.) NooJ 2015. CCIS, vol. 607, pp. 16–28. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42471-2_2
Constant, M.: Mettre les expressions multi-mots au cœur de l’analyse automatique de textes: sur l’exploitation de ressources symboliques externes. Université Paris-Est (2012)
Attia, M., et al.: Automatic extraction of Arabic multiword expressions. In: Proceedings of the Workshop on Multiword Expressions: from Theory to Applications, Beijing. August- MWE, pp. 18–56 (2010)
Pecina, P.: Lexical association measure: collocation extraction. Institute of Formal and Applied Linguistics (2009). Editor in chief, Jan Hajič
Baldwin, T.: Multiword expressions, an advanced course. the Australasian language technology summer school (ALTSS). Sydney, Australia (2004)
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45715-1_1
Silberztein, M.: La formalisation des langues: l’approche de NooJ. Collection sciences cognitive et management des connaissances. Edition ISTE, London (2015)
Rhazi, A.: Morpho-syntactical based recognition of Arabic MWEs using NooJ platform. Formalizing natural languages In: Proceedings of the 2014 International NooJ Conference, Cambridge Scholars Publishing, Newcastle (2015)
Gross, M.: The construction of local grammars. In: Roche, E., Schabs, Y. (eds.) Finite State Language Processing, pp. 329–354. MIT Press, Cambridge (1997)
Fehri, H., et al.: A new representation model for the automatic recognition and translation of Arabic named entities with NooJ. In: Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing, ACL, Blois, France, 12–15 July, pp. 134–142 (2011)
Laporte, E.: La reconnaissance des expressions figées lors de l’analyse automatique. Langage, no. 90 (1988)
Attia, M., et al.: Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation. Ph.D. thesis. The University of Manchester, Manchester, UK (2008)
Kocijan, K., Librenjak, S.: Recognizing verb-based Croatian idiomatic MWUs. In: Okrut, T., Hetsevich, Y., Silberztein, M., Stanislavenka, H. (eds.) NooJ 2015. CCIS, vol. 607, pp. 96–106. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42471-2_9
Silberztein, M.: La formalisation du dictionnaire LVF avec NooJ et ses applications pour l’analyse automatique de corpus dans Langages. Numéro 2010/3, pp. 179–180 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Rhazi, A., Boulaalam, A. (2018). Corpus-Based Extraction and Translation of Arabic Multi-Words Expressions (MWEs). In: Mbarki, S., Mourchid, M., Silberztein, M. (eds) Formalizing Natural Languages with NooJ and Its Natural Language Processing Applications. NooJ 2017. Communications in Computer and Information Science, vol 811. Springer, Cham. https://doi.org/10.1007/978-3-319-73420-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-73420-0_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73419-4
Online ISBN: 978-3-319-73420-0
eBook Packages: Computer ScienceComputer Science (R0)