Abstract
The terminology of any language and any domain continuously evolves and leads to a constant term renewal. Terms undergo a wide range of morphological and syntactic variations which have to be handled by any NLP applications. If the syntactic variations of multi-word terms have been described and tools designed to process them, only a few works studied the syntagmatic variants of compound terms. This paper is dedicated to the identification of such variants, and more precisely to the detection of synonymic pairs that consist of “compound term - multi-word term ”. We describe a pipeline for their detection, from compound recognition and splitting to alignment of the variants with original terms, through multi-word term extraction. The experiments are carried out for two compound-producing languages, German and Russian, and two specialised domains: wind energy and breast cancer. We identify variation patterns for these two languages and demonstrate that the transformation of a morphological compound into a syntagmatic compound mainly occurs when the term denomination needs to be enlarged.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Jacquemin, C.: Syntagmatic and paradigmatic representations of term variation. In: Proceedings of 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999), pp. 341–348 (1999)
Weller, M., Blancafort, H., Gojun, A., Heid, U.: Terminology extraction and term variation patterns: A study of french and german data. In: Proceedings of German Society for Computational Linguistics and Language Technology (GSCL 2011), Hamburg, Germany (2011)
Yoshikane, F., Tsuji, K., Kageura, K., Jacquemin, C.: Detecting japanese term variation in textual corpus. In: Proceedings of 4th International Workshop on Information Retrieval with Asian Languages (IRAL 1999), Taipei, Taiwan, pp. 97–108 (1999)
Jacquemin, C.: Spotting and Discovering Terms through Natural Language Processing. MIT Press, Cambridge (2001)
Daille, B.: Variations and application-oriented terminology engineering. Terminology 11, 181–196 (2005)
Macherey, K., Dai, A., Talbot, D., Popat, A., Och, F.: Language-independent compound splitting with morphological operations. In: Proceedings of ACL 2011, Portland, Oregon, pp. 1395–1404 (2011)
Langer, S.: Zur Morphologie und Semantik von Nominalkomposita. In: Proceedings of KONVENS 1998, Bonn, pp. 83–97 (1998)
Koehn, P., Knight, K.: Empirical methods for compound splitting. In: Proceedings of EACL 2003, Budapest, Hungary (2003)
Namer, F.: Morphologie, Lexique et Traitement Automatique des Langues. Lavoisier, Paris (2009)
Ville-Ometz, F., Royauté, J., Zasadzinski, A.: Enhancing in automatic recognition and extraction of term variants with linguistic features. Terminology 13, 61–84 (2007)
Jacquemin, C.: Fastr: A unification-based front-end to automatic indexing. In: Proceedings of Intelligent Multimedia Information Retrieval Systems and Management (RIAO 1994), pp. 34–47 (1994)
Schmid, H., Fitschen, A., Heid, U.: Smor: A german computational morphology covering derivation, composition, and inflection. In: Proceedings of LREC 2004, Lisbon, Portugal, pp. 1263–1266 (2004)
Dyer, C.: Using a maximum entropy model to build segmentation lattices for mt. In: Proceedings of HLT-NAACL 2009 (2009)
Hewlett, D., Cohen, P.: Fully unsupervised word segmentation with bve and mdl. In: Proceedings of ACL 2011, Portland, Oregon, pp. 540–545 (2011)
Ott, N.: Measuring semantic relatedness of german compounds using germanet (2005), http://niels.drni.de/n3files/bananasplit/Compound-GermaNet-Slides.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Clouet, E., Daille, B. (2014). Compound Terms and Their Multi-word Variants: Case of German and Russian Languages. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54906-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-54906-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54905-2
Online ISBN: 978-3-642-54906-9
eBook Packages: Computer ScienceComputer Science (R0)