Abstract
The paper illustrates the invention of a method and an apparatus able to recognize the text in a set of digital images referring to pages of ancient manuscripts or printed books. It includes the following macro steps: identifying and connecting in sequence regions containing words in a subset of the images; structuring a thesaurus of fonts used in those regions; performing the character recognition of one or more images belonging to the set, associating to this recognition a first value of efficiency. The prototype is patent pending (National Pat. Pend. n. BA2011A000038 – Intern. Pat. Pend. n. I116-PCT).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Feldgajer, O.: Universal Character Section for Multifont (EP 0369761 (A2)), http://worldwide.espacenet.com/publicationDetails/biblio?CC=EP&NR=0369761&KC=&FT=E&locale=en_EP
Bar-Yosef, I., Mokeichev, A., Kedem, K., Dinstein, I.: Adaptive shape prior for recognition and variational segmentation of degraded historical characters. Pattern Recognition 42(12), 3348–3354 (2008)
von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: reCAPTCHA: Human-Based Character Recognition via Web Security Measures. Science 321(5895), 1465–1468 (2008)
Krtolica, R.V., Malitsky, S.: Multifont Optical Character Recognition Using a Box Connectivity Approach (EP0649113A2), http://worldwide.espacenet.com/publicationDetails/biblio?CC=EP&NR=0649113&KC=&FT=E&locale=en_EP
Blondy, A.: Document Digitization (Fr 2768825 A1), http://patent.ipexl.com/FR/FR2768825.html
Nakamura, M.: Method and Apparatus for Isolating an Area Corresponding to a Character or Word (Us 5144682 A), http://www.patentbuddy.com/Patent/5144682
Masami, M.: Technique for Correcting Character-Recognition Errors (Gb 2463577), http://worldwide.espacenet.com/publicationDetails/biblio?CC=GB&NR=2463577&KC=&FT=E&locale=en_EP
Eynard, L., Leydier, Y., Emptoz, H.: Particular Words Mining and Article Spotting in Old French Gazettes. In: Proceedings of MLDM Posters, pp. 176–188 (2009)
Gordo, A., Llorenz, D., Marzal, A., Prat, F., Vilar, J.M.: State: A Multimodal Assisted Text-Transcription System for Ancient Documents. In: DAS 2008. Proceedings of 8th IAPR International Workshop on Document Analysis Systems, pp. 135–142 (2008)
Le Bourgeois, F., Emptoz, H.: DEBORA: Digital AccEss to BOoks of the RenaissAnce. IJDAR 9(2-4), 193–221 (2007)
Leydier, Y., Le Bourgeois, F., Emptoz, H.: Textual Indexation of Ancient Documents. In: Proceedings of the 2005 ACM Symposium on Document Engineering, pp. 111–117 (2005)
Leydier, Y., Le Bourgeois, F., Emptoz, H.: Towards an Omnilingual Word Retrieval System for Ancient Manuscripts. Pattern Recognition 42(9), 2089–2105 (2009)
Rawat, S., Kumar, K.S.S., Meshesha, M., Sikdar, I.D., Balasubramanian, A., Jawahar, C.V.: A Semi-automatic Adaptive OCR for Digital Libraries. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 13–24. Springer, Heidelberg (2006)
Toselli, A.H., Romero, V., Pastor, M., Vidal, E.: Multimodal Interactive Transcription of Text Images. Pattern Recognition 43(5), 1814–1825 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Barbuti, N., Caldarola, T. (2013). An Innovative Character Recognition for Ancient Book and Archival Materials: A Segmentation and Self-learning Based Approach. In: Agosti, M., Esposito, F., Ferilli, S., Ferro, N. (eds) Digital Libraries and Archives. IRCDL 2012. Communications in Computer and Information Science, vol 354. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35834-0_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-35834-0_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35833-3
Online ISBN: 978-3-642-35834-0
eBook Packages: Computer ScienceComputer Science (R0)