Skip to main content

Towards Comprehensive Computational Representations of Arabic Multiword Expressions

  • Conference paper
  • First Online:
Computational and Corpus-Based Phraseology (EUROPHRAS 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10596))

Included in the following conference series:

  • 1280 Accesses

Abstract

A successful computational treatment of multiword expressions (MWEs) in natural languages leads to a robust NLP system which considers the long-standing problem of language ambiguity caused primarily by this complex linguistic phenomenon. The first step in addressing this challenge is building an extensive reliable MWEs language resource LR with comprehensive computational representations across all linguistic levels. This forms the cornerstone in understanding the heterogeneous linguistic behaviour of MWEs in their various manifestations. This paper presents a detailed framework for computational representations of Arabic MWEs (ArMWEs) across all linguistic levels based on the state-of-the-art lexical mark-up framework (LMF) with the necessary modifications to suit the distinctive properties of Modern Standard Arabic (MSA). This work forms part of a larger project that aims to develop a comprehensive computational lexicon of ArMWEs for NLP and language pedagogy LP (JOMAL project).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://multiword.sourceforge.net/PHITE.php?sitesig=FILES&page=FILES_20_Data_Sets.

  2. 2.

    https://sites.google.com/site/mwesurveytest/home.

  3. 3.

    The survey online form. https://goo.gl/eYz8qL.

References

  1. Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). doi:10.1007/3-540-45715-1_1

    Chapter  Google Scholar 

  2. Savary, A., Sailer, M., Parmentier, Y., Rosner, M., Rosén, V., Przepiórkowski, A., Krstev, C., Vincze, V., Wójtowicz, B., Losnegaard, G.S.: PARSEME–PARSing and multiword expressions within a European multilingual network. In: 7th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics LTC (2015)

    Google Scholar 

  3. Losnegaard, G.S., Sangati, F., Escartín, C.P., Savary, A., Bargmann, S., Monti, J.: PARSEME survey on MWE resources. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC, Portorož (2016)

    Google Scholar 

  4. Rosén, V., De Smedt, K., Losnegaard, G.S., Bejček, E., Savary, A., Osenova, P.: MWEs in Treebanks: from survey to guidelines. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC, Portorož (2016)

    Google Scholar 

  5. Cardey, S., Chan, R., Greenfield, P.: The development of a multilingual collocation dictionary. In: Proceedings of the Workshop on Multilingual Language Resources and Interoperability, pp. 32–39. Association for Computational Linguistics (2006)

    Google Scholar 

  6. Al-Sabbagh, R., Girju, R., Diesner, J.: Unsupervised construction of a lexicon and a repository of variation patterns for Arabic modal multiword expressions. In: EACL (2014)

    Google Scholar 

  7. Arts, T.: Oxford Arabic Dictionary: Arabic-English, English-Arabic. Oxford University Press, Oxford (2014)

    Google Scholar 

  8. Attia, M.A.: Accommodating multiword expressions in an Arabic LFG grammar. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS, vol. 4139, pp. 87–98. Springer, Heidelberg (2006). doi:10.1007/11816508_11

    Chapter  Google Scholar 

  9. Butt, M.: A Grammar Writer’s Cookbook. CSLI, Vancouver (1999)

    Google Scholar 

  10. Dipper, S.: Implementing and Documenting Large-scale Grammars-German LFG. Inst. für Maschinelle Sprachverarbeitung, Univ. (2003)

    Google Scholar 

  11. Wanner, L.: Lexical Functions in Lexicography and Natural Language Processing. John Benjamins Publishing, Amsterdam (1996)

    Book  Google Scholar 

  12. Bounhas, I., Slimani, Y.: A hybrid approach for Arabic multi-word term extraction. In: A Hybrid Approach for Arabic Multi-word Term Extraction, pp. 1–8. IEEE Press (2009)

    Google Scholar 

  13. Hawwari, A., Bar, K., Diab, M.: Building an Arabic multiword expressions repository. In: Proceedings of the 50th ACL, pp. 24–29. Citeseer (2012)

    Google Scholar 

  14. Hawwari, A., Attia, M., Diab, M.: A framework for the classification and annotation of multiword expressions in dialectal Arabic. In: Proceedings of the ANLP (2014)

    Google Scholar 

  15. Calzolari, N., Fillmore, C.J., Grishman, R., Ide, N., Lenci, A., MacLeod, C., Zampolli, A.: Towards best practice for multiword expressions in computational lexicons. In: Proceedings of the LREC (2002)

    Google Scholar 

  16. Tanabe, T., Takahashi, M., Shudo, K.: A lexicon of multiword expressions for linguistically precise, wide-coverage natural language processing. Comput. Speech Lang. 28(6), 1317–1339 (2014)

    Article  Google Scholar 

  17. Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)

    Article  Google Scholar 

  18. Bar, K., Diab, M., Hawwari, A.: Arabic multiword expressions. In: Dershowitz, N., Nissan, E. (eds.) Language, Culture, Computation. Computational Linguistics and Linguistics. LNCS, vol. 8003, pp. 64–81. Springer, Heidelberg (2014). doi:10.1007/978-3-642-45327-4_5

    Chapter  Google Scholar 

  19. Odijk, J.: Identification and lexical representation of multiword expressions. In: Spyns, P., Odijk, J. (eds.) Essential Speech and Language Technology for Dutch. Theory and Applications of Natural Language Processing, pp. 201–217. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  20. Francopoulo, G.: LMF Lexical Markup Framework. ISTE Ltd., London (2013)

    Book  Google Scholar 

  21. Francopoulo, G., Huang, C.-R.: Lexical markup framework: an ISO standard for electronic lexicons and its implications for Asian languages. Lexicography 1(1), 37–51 (2014)

    Article  Google Scholar 

  22. Francopoulo, G., George, M.: Language resource management-Lexical markup framework (LMF), ISO/TC (2008)

    Google Scholar 

  23. Atwell, E.: Development of tag sets for part-of-speech tagging. In: Ludeling, A., Kyto, M. (eds.) Corpus Linguistics: An International Handbook, pp. 501–526. Walter de Gruyter (2008)

    Google Scholar 

  24. Alghamdi, A., Atwell, E.: Constructing a corpus-informed listing of Arabic formulaic sequences ArFSs for language pedagogy and technology, Under review paper submitted to International Journal of Corpus Linguistics (2017)

    Google Scholar 

  25. Alghamdi, A., Atwell, E.: An empirical study of Arabic formulaic sequence extraction methods. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC, Portorož, pp. 502–506 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayman Alghamdi .

Editor information

Editors and Affiliations

Appendix

Appendix

XML fragment for the MWE, fī ʾams alḥāja,

figure a

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Alghamdi, A., Atwell, E. (2017). Towards Comprehensive Computational Representations of Arabic Multiword Expressions. In: Mitkov, R. (eds) Computational and Corpus-Based Phraseology. EUROPHRAS 2017. Lecture Notes in Computer Science(), vol 10596. Springer, Cham. https://doi.org/10.1007/978-3-319-69805-2_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69805-2_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69804-5

  • Online ISBN: 978-3-319-69805-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics