Automated Detection of Morphemes Using Distributional Measurements

Benden, Christoph

doi:10.1007/3-540-28084-7_57

Automated Detection of Morphemes Using Distributional Measurements

Christoph Benden²¹

Conference paper

2304 Accesses

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Abstract

To simply take the distribution of linguistic elements as a basis for analysis was the methodological prime of researchers of the so-called “American Structuralism”. This paper deals with the detection of morphemes from a large corpus of German by simply applying a distributional procedure of counting the number of potential successors of a given sequence of letters of a word, a method reminiscent of proposals by Harris, Shannon and others. Morphemes can be heuristically read off by an increase in the potential successor count. Three different methods of identifying morpheme breaks are discussed and a proposal for improvement of the method by transforming graphemic to partial phonemic representation is put forward.

A. Fenk pointed out to me that the method described does not strictly speaking use an “information theoretical measurement” as the original title suggested. I agree to this appraisal and accordingly replaced the term with “distributional measurements” which — ultimately for historical reasons — might be more appropriate. Thanks to Gustav Vella for painstaking corrections of my “Enklisch”.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BERGENHOLTZ, H. and SCHAEDER, B. (1977): Die Wortarten des Deutschen. Klett, Stuttgart.
Google Scholar
DÉJEAN, H. (1998): Morphemes as Necessary Concepts for Structures Discovery from Untagged Corpora. Workshop on Paradigms and Grounding in Natural Language Learning. Adelaide, 295–299.
Google Scholar
EISENBERG, P. (1998): Grundriß der deutschen Grammatik. Band 1: Das Wort. Metzler, Stuttgart.
Google Scholar
HARRIS, Z. (1951): Methods in Structural Linguistics. University of Chicago Press, Chicago.
Google Scholar
HARRIS, Z. (1954): Distributional Structure. Word, 10.2-3, 146–162.
Google Scholar
MANNING, C. D. and SCHÜTZE, H. (1999): Foundations of Statistical Natural Language Processing. MIT-Press, Cambridge, MA.
Google Scholar
SHANNON, C. E. (1950): Prediction and Entropy of Printed English. Bell System Technical Journal, 3, 50–64.
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Linguistics - Linguistic Data Processing, University of Cologne, 50923, Köln
Christoph Benden

Authors

Christoph Benden
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fachbereich Statistik, Universität Dortmund, 44221, Dortmund
Claus Weihs
Institut für Entscheidungstheorie und Unternehmensforschung, Universität Karlsruhe (TH), 76128, Karlsruhe
Wolfgang Gaul

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Benden, C. (2005). Automated Detection of Morphemes Using Distributional Measurements. In: Weihs, C., Gaul, W. (eds) Classification — the Ubiquitous Challenge. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28084-7_57

Download citation

DOI: https://doi.org/10.1007/3-540-28084-7_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25677-9
Online ISBN: 978-3-540-28084-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics