Abstract
In this article, we are interested in the restoration of character shapes in antique document images. This particular class of documents generally present a lot of involuntary historical information that have to be taken into account to get quality digital libraries. Actually, many document processing methods of all sorts have already been proposed to cope with degraded character images, but those techniques often consist in replacing the degraded shapes by a corresponding prototype which is not satisfying for lots of specialists. For that, we decided to develop our own method for accurate character restoration, basing our study on generic image processing tools (namely: Gabor filtering and the active contours model) completed with some specific automatically extracted structural information. The principle of our method is to make an active contour recover the lost information using an external energy term based on the use of an automatically built and selected reference character image. Results are presented for real case examples taken from printed and handwritten documents.
Similar content being viewed by others
References
Le Bourgeois, F., Trinh, E., Allier, B., Eglin, V., Emptoz, H.: Document images analysis solutions for digital libraries. In: Proceedings of the International Workshop on Document Image Analysis for Libraries, pp. 2–24. Palo Alto, CA, USA (2004)
Baird, H.S.: Digital libraries and document image analysis. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 2–14. Edinburgh, Scotland (2003)
Belaïd, A.: Recognition of table of contents for electronic library consulting. Int. J. Doc. Anal. Recognit. 4(1), 35–45 (2001)
Ping, Z., Lihui, C.: Document filters using morphological and geometrical features of characters. Image Vis. Comput. (19), 847–855 (2001)
Bouche, R., Emptoz, H., Lebourgeois, F., Metzger, J.-P.: DEBORA European Project, Research Report no. LB 5608 A, juin 2000
Hobby, J.D., Baird, H.S.: Degraded character image continuation. In: Proceedings of 5th UNLV Symposium on Document Analysis and Information Retrieval, pp. 233–245. Las Vegas, Nevada, USA (1996)
Hobby, J.D., Ho, T.K.: Enhancing degraded document images via bitmap clustering and averaging. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 394–400. Ulm, Germany (1997)
Whichello, A., Yan, H.: Linking broken character borders with variable sized masks to improve recognition. Pattern Recognit 29(8), 1429–1435 (1996)
Yu, D., Yan, H.: Reconstruction of broken handwritten digits based on structural morphological features. Pattern Recognit. (34), 235–254 (2001)
Shi, Z., Govindaraju, V.: Character image enhancement by selective region-growing. Pattern Recognit. Lett. (17), 523–527 (1996)
Billawala, N., Hart, P.E., Peairs, M.: Image continuation. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 53–57. Tsukuba, Japan (1993)
Zheng, Q., Kanungo, T.: Morphological degradation models and their use in document image restoration, Research Report LAMP-TR-065/CS-TR-4218/CAR-TR-962, February 2001. University of Maryland, Maryland
Baird, H.S.: Document image defect models. In: Baird, H., Bunke, H., Yamamoto, K. (eds.) Structured document image analysis—Proceedings of the IAPR 1990 Workshop on SSPR, pp. 546–556. Springer, Berlin Heidelberg Germany (1992)
Baird, H.S.: The state of the art of document image degradation modeling. In: Proceedings of the 4th IAPR Workshop on Document Analysis Systems, pp. 1–16, Rio de Janeiro, Brazil (2000)
Andre, J.: De Pacioli à Bézier: 5 siècles de mathématiques pour la typographie, in 4000 ans d'histoire des mathématiques: les mathématiques dans la longue durée, Actes du 13ème colloque Inter-IREM d'épistémologie et histoire des mathématiques 2000, pp. 98–139. Rennes (2002)
Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. Int. J. Comput. Vis. 1(X), 321–331 (1988)
Allier, B.: Contours actifs et caractères dégradés, Research Report RR2002-3, mars 2002. Laboratoire RFV/LIRIS-INSA Lyon
Cohen, L.D., Cohen, I.: Finite-elements methods for active contour models and balloons for 2-D and 3-D images. IEEE Trans. Pattern Anal. Machine Intell. 15(11), 1131–1147 (1993)
Cohen, L.D.: On active contour models and balloons: Image understanding. Comput. Vis. Graphics Image Process. 17(2), 211–218 (1991)
Xu, C., Prince, J.L.: Gradient vector flow: A new external force for snakes. In: Proceedings of the IEEE Conference of Computer Vision and Pattern Recognition, pp. 66–71. San Juan, Puerto Rico, USA (1997)
Allier, B., Emptoz, H.: Character prototyping in document images using Gabor filters. In: Proceedings of the IEEE International Conference on Image Processing. Barcelona, Spain (2003) (0-7803-7751-6)
Jain, A.K., Farrokhnia, F.: Unsupervised texture segmentation using Gabor filters. Pattern Recognit. 24(12), 1167–1186 (1991)
Zramdini, A.: Study of optical font recognition based on global typographical features. PhD thesis, IIUF-Université de Fribourg, Fribourg, Suisse (1995), 170 p
Chaudhuri, B.B., Garain, U.: Automatic detection of italic, bold and all-capital words in document images. In: Proceedings of the 14th International Conference on Pattern Recognition, pp. 610–612. Brisbane, Australia (1998)
Wong, K.Y., Casey, R.G., Wahl, F.M.: Document Analysis System. IBM J. Res. Dev. 26(6), 647–656 (1982)
Duffy, L.: Recherche d'information logique dans les documents à typographie riche et récurrente. Application aux sommaires. PhD thesis, Institut National des Sciences Appliquées-INSA de Lyon, Lyon, France (1997), 160 p
Doermann, D.S., Rivlin, E., Rosenfeld, A.: The function of documents. Int. J. Comput. Vis. 16(11), 799–814 (1998)
Wu, V., Manmatha, R., Riseman, E.M.: Finding text in images. In: Proceedings of the Second ACM International Conference on Digital Libraries, pp. 23–26. Philadelphia, PA, USA (1997)
Jain, A.K., Bhattacharjee, S.K., Chen, Y.: On texture in document images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 677–680. Champaign, Illinois, USA (1992)
Ma, H., Doermann, D.S.: Gabor filter based multi-class classifier for scanned document images. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 968–972. Edinburgh, Scotland (2003)
Hamamoto, Y., Uchimura, S., Watanabe, M., Yasuda, T., Mitani, Y., Tomita, S.: A Gabor filter-based method for recognizing handwritten numerals. Pattern Recognit. 31(4), 395–400 (1998)
Manjunath, B.S., Ma, W.Y.: Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Machine Intell. 18(8), 837–842 (1996)
Allier, B., Emptoz, H.: Font type extraction and character prototyping using Gabor filters. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 799–803. Edinburgh, Scotland (2003)
Basseville, M.: Distance measures for signal processing and pattern recognition. Signal Process. 18(4), 349–369 (1989)
Allier, B.: Contribution à la numérisation des collections: apports des contours actifs. PhD thesis, Institut National des Sciences Appliquées-INSA de Lyon, Lyon France (2003), 260 p
Cheeseman, P., Stutz, J.: Bayesian classification (Autoclass): Theory and results. In: Fayyad, U.M., et al. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 153–180. AAAI /MIT, Cambridge, MA (1996)
Chenevoy, Y.: Reconnaissance structurelle de documents imprimés: études et réalisations. PhD thesis, INPL (1992), 213 p
Anigbogu, J.C., Belaïd, A.: Hidden Markov models in text recognition. Int. J. Pattern Recognit. Artif. Intell. 9(6), 925–958 (1995)
Simon, J.C., Zerhoumi, K.: Description Robuste d'une Image de Lignes. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 3–14. Saint-Malo, France (1991)
Xue, H., Govindaraju, V.: Building skeletal graphs for structural feature extraction on handwriting images. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 96–100. Seattle, USA (2001)
Chianese, A., Cordella, L.P., De Santo, M., Marcelli, A., Vento, M.: A structural method for handprinted character recognition. In: Lecture Notes in Computer Science, vol. 399, pp. 289–302. Springer-Verlag, Berlin Heidelberg New York (1989)
Wang, L., Pavlidis, T.: Direct gray-scale extraction of features for character recognition. IEEE Trans. Pattern Anal. Machine Intell. 15(10), 1053–1067 (1993)
Lee, S.-W., Kim, Y.J.: Direct extraction of topographic features for gray scale character recognition. IEEE Trans. Pattern Anal. Machine Intell. 17(7), 724–729 (1995)
Allier, B.: Reconstruction de caractères: Extraction de graphes structurels, Research Report RR2001-5, déc. 2001. Laboratoire RFV/LIRIS - INSA Lyon
Bali, N.: Codage source adapté aux formats spécifiques aux documents complexes suivent la qualité service attendue, Research Report, Juin 2004. DEA T3IA Univ. de Poitiers, SIC/LIRIS
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Allier, B., Bali, N. & Emptoz, H. Automatic accurate broken character restoration for patrimonial documents. IJDAR 8, 246–261 (2006). https://doi.org/10.1007/s10032-005-0012-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-005-0012-7