Abstract
It is shown that word length and other properties of linguistic units display a lawful behavior not only in form of distributions but also with respect to their syntagmatic arrangement in a text. Based on L-segments (units of constant or increasing lengths), F-segments, and T-segments (units of constant or increasing frequency or polytextuality respectively), the dynamic behavior of segment patterns is investigated. Theoretical models are derived on the basis of plausible assumptions on influences of the properties of individual units on the properties of their constituents in the text. The corresponding hypotheses are tested on data from 66 German texts of four authors and two different genres. Experiments with various characteristics show promising properties which could be useful for author and/or genre discrimination.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
ALTMANN, G. and KÖHLER, R. (1996): "Language Forces? and Synergetic Modelling of Language Phenomena. In: P. Schmidt [Ed.]: Glottometrika 15. Issues in General Linguis-tic Theory and The Theory of Word Length. WVT, Trier, 62-76.
ANDERSEN, S. (2005): Word length balance in texts: Proportion constancy and word-chain-lengths in Proust’s longest sentence. Glottometrics 11, 32-50.
BORODA, M. (1982): Häufigkeitsstrukturen musikalischer Texte. In: J. Orlov, M. Boroda, G. Moisei and I. Nadarejŝvili [Eds.]: Sprache, Text, Kunst. Quantitative Analysen. Brock-meyer, Bochum, 231-262.
HERDAN, G. (1966): The advanced Theory of Language as Choice and Chance. Springer, Berlin et al., 423.
KÖHLER, R. (1999): Syntactic Structures. Properties and Interrelations. Journal of Quantita-tive Linguistics 6, 46-57.
KÖHLER, R. (2000): A study on the informational content of sequences of syntactic units. In: L.A. Kuz’min [Ed.]: Jazyk, glagol, predlo?enie. K 70-letiju G. G. Sil’nitskogo. Smolensk, S. 51-61.
KÖHLER, R. and G. ALTMANN (2000): Probability Distributions of Syntactic Units and Properties. Journal of Quantitative Linguistics 7/3, S.189-200.
KÖHLER, R. (2006b): Word length in text. A study in the syntagmatic dimension. To appear.
KÖHLER, R. (2006a): The frequency distribution of the lengths of length sequences. In: J. Genzor and M. Bucková [Eds.]: Favete linguis. Studies in honour of Victor Krupa. Slovak Academic Press, Bratislava, 145-152.
UHLÍHOVÁ, L. (2007): Word frequency and position in sentence. To appear.
WIMMER, G. and ALTMANN, G. (1999): Thesaurus of Univariate Discrete Probability Distributions. Stamm, Essen.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Köhler, R., Naumann, S. (2008). Quantitative Text Analysis Using L-, F- and T-Segments. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78246-9_75
Download citation
DOI: https://doi.org/10.1007/978-3-540-78246-9_75
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78239-1
Online ISBN: 978-3-540-78246-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)