ABSTRACT
The faithful visualization of historical documents on e-book devices and tablet computers is addressed in this paper. To this purpose, digitized books should be converted to re-flowable formats where the characters are easily re-sized. This is accomplished by first analyzing the document to extract the characters that are then clustered and replaced by prototypes. The prototypes are represented as SVG objects and then arranged in the proper position in the converted document.
Among other applications, the proposed conversion can be used to allow visitors of archives and exhibitions to easily browse and consult historical documents on dedicated devices or on personal mobile devices that support standard re-flowable formats.
The system is quantitatively tested on the well known UW-I dataset by computing OCR errors on the original images and on the reconstructed ones. The visual rendering of historical documents is evaluated on a digitized book of the XIX-th Century.
- SONY eBook Reader emulator. http://ebookstore.sony.com/download/.Google Scholar
- Tesseract OCR. http://code.google.com/p/tesseract-ocr/.Google Scholar
- H. Ainsworth. Epub format construction guide. 2010. http://www.hxa.name/.Google Scholar
- IDPF. Epub3 international digital publishing forum, March 2011. http://idpf.org/epub/30.Google Scholar
- T. Kohonen. Self-organizing maps. Springer Series in Information Sciences, 2001. Google ScholarDigital Library
- S. Marinai. Metadata extraction from PDF papers for digital library ingest. In 10th Int.l Conf. on Document Analysis and Recognition, pages 251--255, 2009. Google ScholarDigital Library
- S. Marinai, E. Marino, and G. Soda. Table of contents recognition for converting pdf documents in e-book formats. In Proc. 10th ACM symposium on Document engineering, DocEng '10, pages 73--76, New York, NY, USA, 2010. Google ScholarDigital Library
- S. Marinai, E. Marino, and G. Soda. Conversion of PDF books in ePub format. In 11th Int.l Conf. on Document Analysis and Recognition, 2011. Google ScholarDigital Library
- P. Selinger. Potrace: a polygon-based tracing algorithm, 2003. Software available at http://potrace.sourceforge.net/.Google Scholar
Index Terms
- Towards a faithful visualization of historical books on e-book readers
Recommendations
Yizkor books: a voice for the silent past
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge managementYizkor Book collections contain firsthand commemorative accounts of events from the era surrounding the rise and fall of Nazi Germany, including documents from before, during, and after the Holocaust. Prior to our effort, information regarding the ...
Handwritten text recognition for historical documents in the transcriptorium project
DATeCH '14: Proceedings of the First International Conference on Digital Access to Textual Cultural HeritageTranscription of historical handwritten documents is a crucial problem for making easier the access to these documents to the general public. Currently, huge amount of historical handwritten documents are being made available by on-line portals ...
Optical Character Recognition Techniques for Restoration of Thai Historical Documents
ICCEE '08: Proceedings of the 2008 International Conference on Computer and Electrical EngineeringHistorical documents are national treasures. Insignificant effort has been made to restore Thai historical documents. Other nations such as Egypt, China, Greece, and USA are investing a large effort in restoring and preserving their national historical ...
Comments