Skip to main content

Corpus-Based Extraction and Translation of Arabic Multi-Words Expressions (MWEs)

  • Conference paper
  • First Online:
Formalizing Natural Languages with NooJ and Its Natural Language Processing Applications (NooJ 2017)

Abstract

This paper attempts to deal with the problems resulting from the translation of Arabic Multiword Expressions (MWEs), this translation may lead to many difficulties due to the specialized parallel corpus (Arabic-French texts). We first extract monolingual MWEs from each part of the parallel corpus. The second step consists of acquiring bilingual (Arabic-French and Arabic-English) correspondances of MWEs. In order to assess the quality of the mined expression, a statistical and symbolic approach for NooJ Machine Translation (NMT) task-based evaluation is followed. We investigate the performance of a hybrid strategy to integrate extern lexical resources and bilingual MWEs in NMT system. We propose, here, two discriminative strategies to integrate Arabic MWEs in a real parsing context (identification with pre-regrouping and re-ranking parses) with features dedicated to Arabic MWEs. Experimental results show that such a structure as a lexical entry improves the quality of translation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Meghawri, S., et al.: Semantic extraction of Arabic multiword expressions. Computer science and Information Technology (CS&IT), pp. 23–31 (2015). https://doi.org/10.5121/csit.2015.50203

  2. Najar, D., Mesfar, S., Ghezela, H.B.: A large terminological dictionary of Arabic compound words. In: Okrut, T., Hetsevich, Y., Silberztein, M., Stanislavenka, H. (eds.) NooJ 2015. CCIS, vol. 607, pp. 16–28. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42471-2_2

    Chapter  Google Scholar 

  3. Constant, M.: Mettre les expressions multi-mots au cœur de l’analyse automatique de textes: sur l’exploitation de ressources symboliques externes. Université Paris-Est (2012)

    Google Scholar 

  4. Attia, M., et al.: Automatic extraction of Arabic multiword expressions. In: Proceedings of the Workshop on Multiword Expressions: from Theory to Applications, Beijing. August- MWE, pp. 18–56 (2010)

    Google Scholar 

  5. Pecina, P.: Lexical association measure: collocation extraction. Institute of Formal and Applied Linguistics (2009). Editor in chief, Jan Hajič

    Google Scholar 

  6. Baldwin, T.: Multiword expressions, an advanced course. the Australasian language technology summer school (ALTSS). Sydney, Australia (2004)

    Google Scholar 

  7. Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45715-1_1

    Chapter  Google Scholar 

  8. Silberztein, M.: La formalisation des langues: l’approche de NooJ. Collection sciences cognitive et management des connaissances. Edition ISTE, London (2015)

    Google Scholar 

  9. Rhazi, A.: Morpho-syntactical based recognition of Arabic MWEs using NooJ platform. Formalizing natural languages In: Proceedings of the 2014 International NooJ Conference, Cambridge Scholars Publishing, Newcastle (2015)

    Google Scholar 

  10. Gross, M.: The construction of local grammars. In: Roche, E., Schabs, Y. (eds.) Finite State Language Processing, pp. 329–354. MIT Press, Cambridge (1997)

    Google Scholar 

  11. Fehri, H., et al.: A new representation model for the automatic recognition and translation of Arabic named entities with NooJ. In: Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing, ACL, Blois, France, 12–15 July, pp. 134–142 (2011)

    Google Scholar 

  12. Laporte, E.: La reconnaissance des expressions figées lors de l’analyse automatique. Langage, no. 90 (1988)

    Google Scholar 

  13. Attia, M., et al.: Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation. Ph.D. thesis. The University of Manchester, Manchester, UK (2008)

    Google Scholar 

  14. Kocijan, K., Librenjak, S.: Recognizing verb-based Croatian idiomatic MWUs. In: Okrut, T., Hetsevich, Y., Silberztein, M., Stanislavenka, H. (eds.) NooJ 2015. CCIS, vol. 607, pp. 96–106. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42471-2_9

    Chapter  Google Scholar 

  15. Silberztein, M.: La formalisation du dictionnaire LVF avec NooJ et ses applications pour l’analyse automatique de corpus dans Langages. Numéro 2010/3, pp. 179–180 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Azeddin Rhazi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rhazi, A., Boulaalam, A. (2018). Corpus-Based Extraction and Translation of Arabic Multi-Words Expressions (MWEs). In: Mbarki, S., Mourchid, M., Silberztein, M. (eds) Formalizing Natural Languages with NooJ and Its Natural Language Processing Applications. NooJ 2017. Communications in Computer and Information Science, vol 811. Springer, Cham. https://doi.org/10.1007/978-3-319-73420-0_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73420-0_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73419-4

  • Online ISBN: 978-3-319-73420-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics