Skip to main content

German Decompounding in a Difficult Corpus

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4919))

Abstract

Splitting compound words has proved to be useful in areas such as Machine Translation, Speech Recognition or Information Retrieval (IR). In the case of IR systems, they usually have to cope with noisy data, as user queries are usually written quickly and submitted without review. This work attempts at improving the current approaches for German decompounding when applied to query keywords. The results show an increase of more than 10% in accuracy compared to other state-of-the-art methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baroni, M., Matiasek, J., Trost, H.: Predicting the Components of German Nominal Compounds. In: Proceedings of ECAI (2002)

    Google Scholar 

  2. Schiller, A.: German compound analysis with wfsc. In: Proceedings of Finite State Methods and Natural Language Processing 2005, Helsinki (2005)

    Google Scholar 

  3. Larson, M., Willett, D., Köhler, J., Rigoll, G.: Compound splitting and lexical unit recombination for improved performance of a speech recognition system for German parliamentary speeches. In: Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP) (2000)

    Google Scholar 

  4. Braschler, M., Göhring, A., Schäuble, P.: Eurospider at CLEF 2002. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 127–132. Springer, Heidelberg (2003)

    Google Scholar 

  5. Monz, C., de Rijke, M.: Shallow morphological analysis in monolingual information retrieval for Dutch, German and Italian. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 262–277. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  6. Brown, R.: Adding Linguistic Knowledge to a Lexical Example-Based Translation System. In: Proceedings of the Eighth International Conference on Theoretical and Methodological Issues in Machine Translation (TMI 1999), pp. 22–32 (1999)

    Google Scholar 

  7. Brown, R.: Corpus-driven splitting of compound words. In: Proceedings of the Ninth International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-2002) (2002)

    Google Scholar 

  8. Koehn, P., Knight, K.: Empirical methods for compound splitting. In: Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics, vol. 1, pp. 187–193 (2003)

    Google Scholar 

  9. Adda-Decker, M., Adda, G., Lamel, L.: Investigating text normalization and pronunciation variants for German broadcast transcription. In: Proceedings of ICSLP, pp. 66–269 (2000)

    Google Scholar 

  10. Marek, T.: Analysis of german compounds using weighted finite state transducers. Technical report, BA Thesis, Universität Tbingen (2006)

    Google Scholar 

  11. Finkler, W., Neumann, G.: Morphix. A fast realization of a classification-based approach to morphology. In: 4. Osterreichische Artificial-Intelligence-Tagung, Wiener Workshop-Wissensbasierte Sprachverarbeitung (1998)

    Google Scholar 

  12. Rackow, U., Dagan, I., Schwall, U.: Automatic translation of noun compounds. In: Proceedings of COLING-1992 (1992)

    Google Scholar 

  13. Demberg, V.: A language-independent unsupervised model for morphological segmentation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), Prague, Czech Republic (2007)

    Google Scholar 

  14. Langer, S.: Zur Morphologie und Semantik von Nominalkomposita. In: Tagungsband der 4. Konferenz zur Verarbeitung naturlicher Sprache (KONVENS) (1998)

    Google Scholar 

  15. Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19(1), 61–74 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alfonseca, E., Bilac, S., Pharies, S. (2008). German Decompounding in a Difficult Corpus. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78135-6_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78134-9

  • Online ISBN: 978-3-540-78135-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics