Abstract
To simply take the distribution of linguistic elements as a basis for analysis was the methodological prime of researchers of the so-called “American Structuralism”. This paper deals with the detection of morphemes from a large corpus of German by simply applying a distributional procedure of counting the number of potential successors of a given sequence of letters of a word, a method reminiscent of proposals by Harris, Shannon and others. Morphemes can be heuristically read off by an increase in the potential successor count. Three different methods of identifying morpheme breaks are discussed and a proposal for improvement of the method by transforming graphemic to partial phonemic representation is put forward.
A. Fenk pointed out to me that the method described does not strictly speaking use an “information theoretical measurement” as the original title suggested. I agree to this appraisal and accordingly replaced the term with “distributional measurements” which — ultimately for historical reasons — might be more appropriate. Thanks to Gustav Vella for painstaking corrections of my “Enklisch”.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
BERGENHOLTZ, H. and SCHAEDER, B. (1977): Die Wortarten des Deutschen. Klett, Stuttgart.
DÉJEAN, H. (1998): Morphemes as Necessary Concepts for Structures Discovery from Untagged Corpora. Workshop on Paradigms and Grounding in Natural Language Learning. Adelaide, 295–299.
EISENBERG, P. (1998): Grundriß der deutschen Grammatik. Band 1: Das Wort. Metzler, Stuttgart.
HARRIS, Z. (1951): Methods in Structural Linguistics. University of Chicago Press, Chicago.
HARRIS, Z. (1954): Distributional Structure. Word, 10.2-3, 146–162.
MANNING, C. D. and SCHÜTZE, H. (1999): Foundations of Statistical Natural Language Processing. MIT-Press, Cambridge, MA.
SHANNON, C. E. (1950): Prediction and Entropy of Printed English. Bell System Technical Journal, 3, 50–64.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Benden, C. (2005). Automated Detection of Morphemes Using Distributional Measurements. In: Weihs, C., Gaul, W. (eds) Classification — the Ubiquitous Challenge. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28084-7_57
Download citation
DOI: https://doi.org/10.1007/3-540-28084-7_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25677-9
Online ISBN: 978-3-540-28084-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)