Skip to main content

Statistical Machine Translation of German Compound Words

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4139))

Abstract

German compound words pose special problems to statistical machine translation systems: the occurence of each of the components in the training data is not sufficient for successful translation. Even if the compound itself has been seen during training, the system may not be capable of translating it properly into two or more words. If German is the target language, the system might generate only separated components or may not be capable of choosing the correct compound. In this work, we investigate and compare different strategies for the treatment of German compound words in statistical machine translation systems. For translation from German, we compare linguistic-based and corpus-based compound splitting. For translation into German, we investigate splitting and rejoining German compounds, as well as joining English potential components. Additionaly, we investigate word alignments enhanced with knowledge about the splitting points of German compounds. The translation quality is consistently improved by all methods for both translation directions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Koehn, P., Knight, K.: Empirical Methods for Compound Splitting. In: Proc. 10th Conf. of the European Chapter of the Association for Computational Linguistics (EACL), Budapest, Hungary, pp. 347–354 (2003)

    Google Scholar 

  2. Koehn, P., Montz, C.: Shared task: statistical machine translation between European languages. In: Proc. ACL Workshop on Building and Using Parallel Texts, Ann Arbor, Michigan, pp. 119–124 (2005)

    Google Scholar 

  3. Niessen, S., Ney, H.: Improving SMT quality with morpho-syntactic analysis. In: Proc. 18th Int. Conf. on Computational Linguistics (COLING), Saarbrücken, Germany, pp. 1081–1085 (2000)

    Google Scholar 

  4. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proc. 40th Annual Meeting of the Assoc. for Computational Linguistics (ACL), Philadelphia, PA, pp. 311–318 (2002)

    Google Scholar 

  5. Popović, M., Ney, H.: Improving Word Alignment Quality using Morpho-syntactic Information. In: Proc. 20th Int. Conf. on Computational Linguistics (COLING), Geneva, Switzerland, pp. 310–314 (2004)

    Google Scholar 

  6. Toutanova, K., Tolga Ilhan, H., Manning, C.: Extensions to HMM-based statistical word alignment models. In: Proc. Conf. on Empirical Methods for Natural Language Processing (EMNLP), Philadelphia, PA, pp. 87–94 (2002)

    Google Scholar 

  7. Vilar, D., Matusov, E., Hasan, S., Zens, R., Ney, H.: Statistical Machine Translation of European Parliamentary Speeches. In: Proc. MT Summit X, Phuket, Thailand, pp. 259–266 (2005)

    Google Scholar 

  8. Zens, R., Bender, O., Hasan, S., Khadivi, S., Matusov, E., Xu, J., Zhang, Y., Ney, H.: The RWTH Phrase-based Statistical Machine Translation System. In: Proc. Int. Workshop on Spoken Language Translation (IWSLT), Pittsburgh, PA, pp. 155–162 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Popović, M., Stein, D., Ney, H. (2006). Statistical Machine Translation of German Compound Words. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds) Advances in Natural Language Processing. FinTAL 2006. Lecture Notes in Computer Science(), vol 4139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816508_61

Download citation

  • DOI: https://doi.org/10.1007/11816508_61

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37334-6

  • Online ISBN: 978-3-540-37336-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics