Abstract
This article describes a two-step method for transcribing historic manuscripts. In this method, the first step uses a page-based representation making it easy to transcribe the document page-by-page and line-by-line, while the second step converts this to the TEI/XML text-based format, in order to make sure the document becomes fully searchable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Janssen, M.: TEITOK: text-faithful annotated corpora. In: Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, pp. 4037–4043 (2016)
Bilansky, A.: TypeWright: an experiment in participatory curation. Digit. Hum. Q. 9(4) (2015)
Evert, S., Hardie, A.: Twenty-first century corpus workbench: updating a query architecture for the new millennium. In: Corpus Linguistics 2011 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Janssen, M. (2018). Adding Words to Manuscripts: From PagesXML to TEITOK. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J. (eds) Digital Libraries for Open Knowledge. TPDL 2018. Lecture Notes in Computer Science(), vol 11057. Springer, Cham. https://doi.org/10.1007/978-3-030-00066-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-00066-0_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00065-3
Online ISBN: 978-3-030-00066-0
eBook Packages: Computer ScienceComputer Science (R0)