Abstract
This paper describes the results of initial work aimed at increasing the number and improving the quality of textual versions of the historical documents available in Polish digital libraries. Digital libraries community is missing tools that integrate existing digitisation workflow with customizable OCR engine and crowd–based text correction, this paper describes work on providing such a solution. Apart from today’s state of the art in this field, this paper includes a description of the Virtual Transcription Laboratory (VTL) prototype, a crowdsourcing platform that utilize the Tesseract OCR engine. The last chapter outlines results of the prototype’s evaluation on real life dataset of historical documents from the IMPACT project. Results prove the applicability of the proposed solution as an enhancement of the digitisation workflow.
Presented results were developed as a part of PSNC activities within the scope of the SYNAT project ( http://www.synat.pl ) funded by the Polish National Center for Research and Development (grant no SP/I/1/77065/10).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lewandowska, A., Werla, M.: Pionier network digital libraries federation interoperability of advanced network services implemented on a country scale. Computational Methods in Science and Technology, 119–124 (2010)
Mazurek, C., Sielski, K., Stroiński, M., Walkowska, J., Werla, M., Węglarz, J.: Transforming a Flat Metadata Schema to a Semantic Web Ontology: The Polish Digital Libraries Federation and CIDOC CRM Case Study. In: Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 153–177. Springer, Heidelberg (2012)
Dudczak, A., Kmieciak, M., Werla, M.: Country scale infrastructure for creation of full text versions of historical documents from Polish Digital Libraries. Presented at Interedition Symposium: Scholarly Digital Editions, Tools and Infrastructure, The Hague, Netherlands (2012)
Neudecker, C., Tzadok, A.: User Collaboration for Improving Access to Historical Texts. Liber Quarterly 20(1), 119–128 (2010)
Alexandrov, V.: Error evaluation and applicability of ocr systems. In: Proceedings of the 4th International Conference Conference on Computer Systems and Technologies: e-Learning. CompSysTech 2003, pp. 308–313. ACM, New York (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dudczak, A., Kmieciak, M., Werla, M. (2012). Creation of Textual Versions of Historical Documents from Polish Digital Libraries. In: Zaphiris, P., Buchanan, G., Rasmussen, E., Loizides, F. (eds) Theory and Practice of Digital Libraries. TPDL 2012. Lecture Notes in Computer Science, vol 7489. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33290-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-33290-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33289-0
Online ISBN: 978-3-642-33290-6
eBook Packages: Computer ScienceComputer Science (R0)