Skip to main content

Verbal Multi-Word Expressions in Yiddish

  • Conference paper
  • First Online:
Book cover Natural Language Processing and Information Systems (NLDB 2018)

Abstract

Verbal Multi-Word Expressions (VMWEs) are very common in many languages. They include among other types the following types: Verb-Particle Constructions (VPC) (e.g. get around), Light-Verb Constructions (LVC) (e.g. make a decision), and idioms (ID) (e.g. break a leg). In this paper, we present a new dataset for supervised learning of VMWEs written in Yiddish. The dataset was manually collected and annotated from a web resource. It contains a set of positive examples for VMWEs and a set of non-VMWEs examples. While the dataset can be used for training supervised algorithms, the positive examples can be used as seeds in unsupervised bootstrapping algorithms. Moreover, we analyze the lexical properties of VMWEs written in Yiddish by classifying them to six categories: VPC, LVC, ID, Inherently Pronominal Verb (IPronV), Inherently Prepositional Verb (IPrepV), and other (OTH). The analysis suggests some interesting features of VMWEs for exploration. This dataset is a first step towards automatic identification of VMWEs written in Yiddish, which is important for natural language understanding, generation and translation systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://typo.uni-konstanz.de/parseme/index.php.

  2. 2.

    Ashkenaz is the medieval Hebrew name for northern Europe and Germany.

  3. 3.

    https://en.wikipedia.org/wiki/Yiddish.

  4. 4.

    https://archive.org/details/nationalyiddishbookcenter.

  5. 5.

    http://yiddish-periodicals.huji.ac.il/.

  6. 6.

    ftp://babel.ling.upenn.edu/research-material/yiddish-corpus/.

  7. 7.

    http://web-corpora.net/YNC/search/.

  8. 8.

    http://yiddish.forward.com.

  9. 9.

    To facilitate readability, we use a transliteration of Hebrew using Roman characters; the letters used, in Hebrew lexicographic order, are abgdhwzxTiklmns`pcqršt.

  10. 10.

    https://typo.uni-konstanz.de/parseme/index.php/2-general/151-parseme-shared-task-pilot-annotation.

  11. 11.

    http://proycon.github.io/folia/.

  12. 12.

    http://liebeskind-chaya.blogspot.co.il/p/downloads.html.

References

  1. Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45715-1_1

    Chapter  Google Scholar 

  2. Biber, D., Johansson, S., Leech, G., Conrad, S., Finegan, E., Quirk, R.: Longman Grammar of Spoken and Written English. MIT Press, Cambridge (1999)

    Google Scholar 

  3. Fazly, A., Stevenson, S.: Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In: Proceedings of the Workshop on a Broader Perspective on Multiword Expressions, pp. 9–16. Association for Computational Linguistics (2007)

    Google Scholar 

  4. Jacobs, N.G.: Yiddish: A Linguistic Introduction. Cambridge University Press, Cambridge (2005)

    Google Scholar 

  5. Baumgarten, J.: Introduction to Old Yiddish Literature. Oxford University Press, Oxford (2005)

    Book  Google Scholar 

  6. Santorini, B.: The Penn Yiddish Corpus. University of Pennsylvania (1997)

    Google Scholar 

  7. Aptroot, M., Hansen, B.: Yiddish Language Structures. vol. 52, Walter de Gruyter, Berlin (2014)

    Google Scholar 

  8. Dias, G., Guilloré, S., Lopes, J.G.P.: Language independent automatic acquisition of rigid multiword units from unrestricted text corpora. In: Proceedings of Conférence Traitement Automatique des Langues Naturelles (TALN) (1999)

    Google Scholar 

  9. Deane, P.: A nonparametric method for extraction of candidate phrasal terms. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 605–613. Association for Computational Linguistics (2005)

    Google Scholar 

  10. Pecina, P., Schlesinger, P.: Combining association measures for collocation extraction. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 651–658. Association for Computational Linguistics (2006)

    Google Scholar 

  11. Bejcek, E., Stranák, P., Pecina, P.: Syntactic identification of occurrences of multiword expressions in text using a lexicon with dependency structures. In: MWE@ NAACL-HLT, pp. 106–115 (2013)

    Google Scholar 

  12. Green, S., de Marneffe, M.-C., Manning, C.D.: Parsing models for identifying multiword expressions. Comput. Linguist. 39, 195–227 (2013)

    Article  Google Scholar 

  13. Al-Haj, H., Itai, A., Wintner, S.: Lexical representation of multiword expressions in morphologically-complex languages. Int. J. Lexicogr. 27, 130–170 (2013)

    Article  Google Scholar 

  14. Baldwin, T.: Deep lexical acquisition of verb–particle constructions. Comput. Speech Lang. 19, 398–414 (2005)

    Article  Google Scholar 

  15. Zhang, Y., Kordoni, V., Villavicencio, A., Idiart, M.: Automated multiword expression prediction for grammar engineering. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pp. 36–44. Association for Computational Linguistics (2006)

    Google Scholar 

  16. Fazly, A.: Automatic acquisition of lexical knowledge about multiword predicates. University of Toronto (2007)

    Google Scholar 

  17. Boulaknadel, S., Daille, B., Aboutajdine, D.: A multi-word term extraction program for Arabic language. In: LREC (2008)

    Google Scholar 

  18. Ramisch, C., de Medeiros Caseli, H., Villavicencio, A., Machado, A., Finatto, M.J.: A hybrid approach for multiword expression identification. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS (LNAI), vol. 6001, pp. 65–74. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12320-7_9

    Chapter  Google Scholar 

  19. Farahmand, M., Nivre, J.: Modeling the statistical idiosyncrasy of multiword expressions. In: MWE@ NAACL-HLT, pp. 34–38 (2015)

    Google Scholar 

  20. Sangati, F., van Cranenburgh, A.: Multiword expression identification with recurring tree fragments and association measures. In: MWE@ NAACL-HLT, pp. 10–18 (2015)

    Google Scholar 

  21. Mandravickaite, J., Krilavičius, T.: Identification of multiword expressions for Latvian and Lithuanian: hybrid approach. In: Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp. 97–101 (2017)

    Google Scholar 

  22. Lapata, M., Lascarides, A.: Detecting novel compounds: the role of distributional evidence. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, vol. 1, pp. 235–242. Association for Computational Linguistics, Stroudsburg (2003)

    Google Scholar 

  23. Pecina, P.: Lexical association measures and collocation extraction. Lang. Resour. Eval. 44, 137–158 (2010)

    Article  Google Scholar 

  24. Ramisch, C., Schreiner, P., Idiart, M., Villavicencio, A.: An evaluation of methods for the extraction of multiword expressions. In: Proceedings of the LREC Workshop-Towards a Shared Task for Multiword Expressions (MWE 2008), pp. 50–53 (2008)

    Google Scholar 

  25. Ramisch, C., Villavicencio, A., Moura, L., Idiart, M.: Picking them up and figuring them out: verb-particle constructions, noise and idiomaticity. In: Proceedings of the Twelfth Conference on Computational Natural Language Learning, pp. 49–56. Association for Computational Linguistics (2008)

    Google Scholar 

  26. Al-Haj, H., Wintner, S.: Identifying multi-word expressions by leveraging morphological and syntactic idiosyncrasy. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 10–18. Association for Computational Linguistics (2010)

    Google Scholar 

  27. Rondon, A., de Medeiros Caseli, H., Ramisch, C.: Never-ending multiword expressions learning. In: MWE@ NAACL-HLT, pp. 45–53 (2015)

    Google Scholar 

  28. Katz, G., Giesbrecht, E.: Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pp. 12–19. Association for Computational Linguistics (2006)

    Google Scholar 

  29. Sporleder, C., Li, L.: Unsupervised recognition of literal and non-literal use of idiomatic expressions. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 754–762. Association for Computational Linguistics (2009)

    Google Scholar 

  30. Biemann, C., Giesbrecht, E.: Distributional semantics and compositionality 2011: shared task description and results. In: Proceedings of the Workshop on Distributional Semantics and Compositionality, pp. 21–28. Association for Computational Linguistics (2011)

    Google Scholar 

  31. Guevara, E.: Computing semantic compositionality in distributional semantics. In: Proceedings of the Ninth International Conference on Computational Semantics, pp. 135–144. Association for Computational Linguistics (2011)

    Google Scholar 

  32. Salehi, B., Cook, P., Baldwin, T.: A word embedding approach to predicting the compositionality of multiword expressions. In: HLT-NAACL, pp. 977–983 (2015)

    Google Scholar 

  33. Yazdani, M., Farahmand, M., Henderson, J.: Learning semantic composition to detect non-compositionality of multiword expressions. In: EMNLP, pp. 1733–1742 (2015)

    Google Scholar 

  34. Liebeskind, C., HaCohen-Kerner, Y.: Semantically motivated Hebrew verb-noun multi-word expressions identification. In: COLING, pp. 1242–1253 (2016)

    Google Scholar 

  35. Dandapat, S., Mitra, P., Sarkar, S.: Statistical investigation of Bengali noun-verb (NV) collocations as multi-word-expressions. In: Proceedings of Modeling and Shallow Parsing of Indian Languages, MSPIL, pp. 230–233 (2006)

    Google Scholar 

  36. Diab, M.T., Bhutada, P.: Verb noun construction MWE token supervised classification. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, pp. 17–22. Association for Computational Linguistics (2009)

    Google Scholar 

  37. Schneider, N., Danchik, E., Dyer, C., Smith, N.A.: Discriminative lexical semantic segmentation with gaps: running the MWE gamut. Trans. Assoc. Comput. Linguist. 2, 193–206 (2014)

    Google Scholar 

  38. Todirascu, A., Navlea, M.: Aligning Verb+Noun Collocation to Improve a French-Romanian Statistical MT System. John Benjamins (2015)

    Google Scholar 

  39. Blum, Y.P.: Techniques for automatic normalization of orthographically variant Yiddish texts (2015)

    Google Scholar 

  40. Liebeskind, C., HaCohen-Kerner, Y.: A lexical resource of Hebrew verb-noun multi-word expressions. In: LREC, pp. 522–527 (2016)

    Google Scholar 

Download references

Acknowledgments

We would like to express our deep gratitude to Gitty Eithen, Bluma Zicherman, and Hindy Golomb, our research assistants, for carrying out the annotation process.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chaya Liebeskind .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liebeskind, C., HaCohen-Kerner, Y. (2018). Verbal Multi-Word Expressions in Yiddish. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91947-8_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91946-1

  • Online ISBN: 978-3-319-91947-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics