Abstract
This paper proposes new measures for dealing with word dispersion in a language corpus - reduced frequency and rarity. Their calculation is described and some results from the Czech National Corpus (CNC) presented. Some previous approaches are briefly mentioned.
This research was supported by the GACR, Grant Nr. 405/96/K214.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Králík, J.: On the dispersion and its computation. Prague Studies in Mathematical Linguistics, Prague, Academia 1978, pp. 149–158.
Oakes, M.P.: Statistics for Corpus Linguistics. Edinburgh University Press, 1998.
Rychlý, P.: The Improvement of Common Statistical Measure. Proc. TSD’ 98, Brno 1998, pp. 109–112.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hlaváčová, J., Rychlý, P. (1999). Dispersion of Words in a Language Corpus. In: Matousek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science(), vol 1692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48239-3_58
Download citation
DOI: https://doi.org/10.1007/3-540-48239-3_58
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66494-9
Online ISBN: 978-3-540-48239-0
eBook Packages: Springer Book Archive