Skip to main content

Compound Terms and Their Multi-word Variants: Case of German and Russian Languages

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8403))

  • 2097 Accesses

Abstract

The terminology of any language and any domain continuously evolves and leads to a constant term renewal. Terms undergo a wide range of morphological and syntactic variations which have to be handled by any NLP applications. If the syntactic variations of multi-word terms have been described and tools designed to process them, only a few works studied the syntagmatic variants of compound terms. This paper is dedicated to the identification of such variants, and more precisely to the detection of synonymic pairs that consist of “compound term - multi-word term ”. We describe a pipeline for their detection, from compound recognition and splitting to alignment of the variants with original terms, through multi-word term extraction. The experiments are carried out for two compound-producing languages, German and Russian, and two specialised domains: wind energy and breast cancer. We identify variation patterns for these two languages and demonstrate that the transformation of a morphological compound into a syntagmatic compound mainly occurs when the term denomination needs to be enlarged.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Jacquemin, C.: Syntagmatic and paradigmatic representations of term variation. In: Proceedings of 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999), pp. 341–348 (1999)

    Google Scholar 

  2. Weller, M., Blancafort, H., Gojun, A., Heid, U.: Terminology extraction and term variation patterns: A study of french and german data. In: Proceedings of German Society for Computational Linguistics and Language Technology (GSCL 2011), Hamburg, Germany (2011)

    Google Scholar 

  3. Yoshikane, F., Tsuji, K., Kageura, K., Jacquemin, C.: Detecting japanese term variation in textual corpus. In: Proceedings of 4th International Workshop on Information Retrieval with Asian Languages (IRAL 1999), Taipei, Taiwan, pp. 97–108 (1999)

    Google Scholar 

  4. Jacquemin, C.: Spotting and Discovering Terms through Natural Language Processing. MIT Press, Cambridge (2001)

    Google Scholar 

  5. Daille, B.: Variations and application-oriented terminology engineering. Terminology 11, 181–196 (2005)

    Article  Google Scholar 

  6. Macherey, K., Dai, A., Talbot, D., Popat, A., Och, F.: Language-independent compound splitting with morphological operations. In: Proceedings of ACL 2011, Portland, Oregon, pp. 1395–1404 (2011)

    Google Scholar 

  7. Langer, S.: Zur Morphologie und Semantik von Nominalkomposita. In: Proceedings of KONVENS 1998, Bonn, pp. 83–97 (1998)

    Google Scholar 

  8. Koehn, P., Knight, K.: Empirical methods for compound splitting. In: Proceedings of EACL 2003, Budapest, Hungary (2003)

    Google Scholar 

  9. Namer, F.: Morphologie, Lexique et Traitement Automatique des Langues. Lavoisier, Paris (2009)

    Google Scholar 

  10. Ville-Ometz, F., Royauté, J., Zasadzinski, A.: Enhancing in automatic recognition and extraction of term variants with linguistic features. Terminology 13, 61–84 (2007)

    Article  Google Scholar 

  11. Jacquemin, C.: Fastr: A unification-based front-end to automatic indexing. In: Proceedings of Intelligent Multimedia Information Retrieval Systems and Management (RIAO 1994), pp. 34–47 (1994)

    Google Scholar 

  12. Schmid, H., Fitschen, A., Heid, U.: Smor: A german computational morphology covering derivation, composition, and inflection. In: Proceedings of LREC 2004, Lisbon, Portugal, pp. 1263–1266 (2004)

    Google Scholar 

  13. Dyer, C.: Using a maximum entropy model to build segmentation lattices for mt. In: Proceedings of HLT-NAACL 2009 (2009)

    Google Scholar 

  14. Hewlett, D., Cohen, P.: Fully unsupervised word segmentation with bve and mdl. In: Proceedings of ACL 2011, Portland, Oregon, pp. 540–545 (2011)

    Google Scholar 

  15. Ott, N.: Measuring semantic relatedness of german compounds using germanet (2005), http://niels.drni.de/n3files/bananasplit/Compound-GermaNet-Slides.pdf

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Clouet, E., Daille, B. (2014). Compound Terms and Their Multi-word Variants: Case of German and Russian Languages. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54906-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54906-9_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54905-2

  • Online ISBN: 978-3-642-54906-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics