Adding Words to Manuscripts: From PagesXML to TEITOK

Janssen, Maarten

doi:10.1007/978-3-030-00066-0_13

Maarten Janssen ORCID: orcid.org/0000-0003-0272-6318¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11057))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

1594 Accesses

Abstract

This article describes a two-step method for transcribing historic manuscripts. In this method, the first step uses a page-based representation making it easy to transcribe the document page-by-page and line-by-line, while the second step converts this to the TEI/XML text-based format, in order to make sure the document becomes fully searchable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Janssen, M.: TEITOK: text-faithful annotated corpora. In: Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, pp. 4037–4043 (2016)
Google Scholar
Bilansky, A.: TypeWright: an experiment in participatory curation. Digit. Hum. Q. 9(4) (2015)
Google Scholar
Evert, S., Hardie, A.: Twenty-first century corpus workbench: updating a query architecture for the new millennium. In: Corpus Linguistics 2011 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

CELGA-ILTEC, Universidade de Coimbra, Coimbra, Portugal
Maarten Janssen

Authors

Maarten Janssen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maarten Janssen .

Editor information

Editors and Affiliations

University Carlos III, Madrid, Spain
Eva Méndez
USI, Università della Svizzera Italiana, Lugano, Switzerland
Fabio Crestani
INESC TEC, Faculty of Engineering, University of Porto, Porto, Portugal
Cristina Ribeiro
INESC TEC, Faculty of Engineering, University of Porto, Porto, Portugal
Gabriel David
INESC TEC, Faculty of Engineering, University of Porto, Porto, Portugal
João Correia Lopes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Janssen, M. (2018). Adding Words to Manuscripts: From PagesXML to TEITOK. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J. (eds) Digital Libraries for Open Knowledge. TPDL 2018. Lecture Notes in Computer Science(), vol 11057. Springer, Cham. https://doi.org/10.1007/978-3-030-00066-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-00066-0_13
Published: 05 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00065-3
Online ISBN: 978-3-030-00066-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics