Skip to main content

Creation of Textual Versions of Historical Documents from Polish Digital Libraries

  • Conference paper
Theory and Practice of Digital Libraries (TPDL 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7489))

Included in the following conference series:

Abstract

This paper describes the results of initial work aimed at increasing the number and improving the quality of textual versions of the historical documents available in Polish digital libraries. Digital libraries community is missing tools that integrate existing digitisation workflow with customizable OCR engine and crowd–based text correction, this paper describes work on providing such a solution. Apart from today’s state of the art in this field, this paper includes a description of the Virtual Transcription Laboratory (VTL) prototype, a crowdsourcing platform that utilize the Tesseract OCR engine. The last chapter outlines results of the prototype’s evaluation on real life dataset of historical documents from the IMPACT project. Results prove the applicability of the proposed solution as an enhancement of the digitisation workflow.

Presented results were developed as a part of PSNC activities within the scope of the SYNAT project ( http://www.synat.pl ) funded by the Polish National Center for Research and Development (grant no SP/I/1/77065/10).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lewandowska, A., Werla, M.: Pionier network digital libraries federation interoperability of advanced network services implemented on a country scale. Computational Methods in Science and Technology, 119–124 (2010)

    Google Scholar 

  2. Mazurek, C., Sielski, K., Stroiński, M., Walkowska, J., Werla, M., Węglarz, J.: Transforming a Flat Metadata Schema to a Semantic Web Ontology: The Polish Digital Libraries Federation and CIDOC CRM Case Study. In: Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 153–177. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  3. Dudczak, A., Kmieciak, M., Werla, M.: Country scale infrastructure for creation of full text versions of historical documents from Polish Digital Libraries. Presented at Interedition Symposium: Scholarly Digital Editions, Tools and Infrastructure, The Hague, Netherlands (2012)

    Google Scholar 

  4. Neudecker, C., Tzadok, A.: User Collaboration for Improving Access to Historical Texts. Liber Quarterly 20(1), 119–128 (2010)

    Google Scholar 

  5. Alexandrov, V.: Error evaluation and applicability of ocr systems. In: Proceedings of the 4th International Conference Conference on Computer Systems and Technologies: e-Learning. CompSysTech 2003, pp. 308–313. ACM, New York (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dudczak, A., Kmieciak, M., Werla, M. (2012). Creation of Textual Versions of Historical Documents from Polish Digital Libraries. In: Zaphiris, P., Buchanan, G., Rasmussen, E., Loizides, F. (eds) Theory and Practice of Digital Libraries. TPDL 2012. Lecture Notes in Computer Science, vol 7489. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33290-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33290-6_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33289-0

  • Online ISBN: 978-3-642-33290-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics