Skip to main content

Automated Detection of Morphemes Using Distributional Measurements

  • Conference paper
  • 2304 Accesses

Abstract

To simply take the distribution of linguistic elements as a basis for analysis was the methodological prime of researchers of the so-called “American Structuralism”. This paper deals with the detection of morphemes from a large corpus of German by simply applying a distributional procedure of counting the number of potential successors of a given sequence of letters of a word, a method reminiscent of proposals by Harris, Shannon and others. Morphemes can be heuristically read off by an increase in the potential successor count. Three different methods of identifying morpheme breaks are discussed and a proposal for improvement of the method by transforming graphemic to partial phonemic representation is put forward.

A. Fenk pointed out to me that the method described does not strictly speaking use an “information theoretical measurement” as the original title suggested. I agree to this appraisal and accordingly replaced the term with “distributional measurements” which — ultimately for historical reasons — might be more appropriate. Thanks to Gustav Vella for painstaking corrections of my “Enklisch”.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BERGENHOLTZ, H. and SCHAEDER, B. (1977): Die Wortarten des Deutschen. Klett, Stuttgart.

    Google Scholar 

  • DÉJEAN, H. (1998): Morphemes as Necessary Concepts for Structures Discovery from Untagged Corpora. Workshop on Paradigms and Grounding in Natural Language Learning. Adelaide, 295–299.

    Google Scholar 

  • EISENBERG, P. (1998): Grundriß der deutschen Grammatik. Band 1: Das Wort. Metzler, Stuttgart.

    Google Scholar 

  • HARRIS, Z. (1951): Methods in Structural Linguistics. University of Chicago Press, Chicago.

    Google Scholar 

  • HARRIS, Z. (1954): Distributional Structure. Word, 10.2-3, 146–162.

    Google Scholar 

  • MANNING, C. D. and SCHÜTZE, H. (1999): Foundations of Statistical Natural Language Processing. MIT-Press, Cambridge, MA.

    Google Scholar 

  • SHANNON, C. E. (1950): Prediction and Entropy of Printed English. Bell System Technical Journal, 3, 50–64.

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin · Heidelberg

About this paper

Cite this paper

Benden, C. (2005). Automated Detection of Morphemes Using Distributional Measurements. In: Weihs, C., Gaul, W. (eds) Classification — the Ubiquitous Challenge. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28084-7_57

Download citation

Publish with us

Policies and ethics