The General Principles of the Diachronic Part of the Czech National Corpus

Kučera, Karel

doi:10.1007/3-540-48239-3_11

Karel Kučera³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1692))

Included in the following conference series:

International Workshop on Text, Speech and Dialogue

477 Accesses
1 Citations

Abstract

The diachronic part of the Czech National Corpus (CNC) has been organized as a general basis for the study of the entire his- tory of Czech (from the 2nd half of the 13th century to 1990). It has been built around four principles, namely representativeness, authen- ticity, transcription, and preservation of maximum amount of informa- tion contained in the text. The diachronic part of the CNC includes the corpus, a bank of transcribed texts, a bank of transliterated texts, a text archive, a language database, a dictionary database, and a con- trol database storing information about the texts. The diachronic part of CNC now includes about 1.5 million tokens.

This research was supported by the GACR, Grant Nr. 405/96/K214.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Beltrami, P. G.: Norme per la redazione del Tesoro della Lingua Italiana delle Origini. In: Bollettino dell’Opera del Vocabolario Italiano, 1998, pp. 277–330.
Google Scholar
Čermák, F.: Jazykový korpus: Prostředek a zdroj poznání. Slovo a slovesnost, 56, 1995, pp. 119–140.
Google Scholar
Čermák, F., Kráalík, J., Kučera, K.: Recepce současné češtiny a reprezentativnost korpusu. Slovo a slovesnost, 58, 1997, pp. 117–124.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of the Czech National Corpus Faculty of Philosophy and Arts, Charles University, Prague
Karel Kučera

Authors

Karel Kučera
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineerig, Faculty of Applied Sciences, University of West Bohemia in Plzeň, Universitní 22, 306 14, Pizeň, Czech Republic
Václav Matousek , Pavel Mautner & Jana Ocelíková , &
Department of Programming Systems and Communication, Faculty of Informatics, Masaryk University Brno, Botanická 68a, 602 00, Brno, Czech Republic
Petr Sojka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kučera, K. (1999). The General Principles of the Diachronic Part of the Czech National Corpus. In: Matousek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science(), vol 1692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48239-3_11

Download citation

DOI: https://doi.org/10.1007/3-540-48239-3_11
Published: 01 October 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66494-9
Online ISBN: 978-3-540-48239-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics