Skip to main content
Log in

Automatic accurate broken character restoration for patrimonial documents

  • Original Paper
  • Published:
International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

In this article, we are interested in the restoration of character shapes in antique document images. This particular class of documents generally present a lot of involuntary historical information that have to be taken into account to get quality digital libraries. Actually, many document processing methods of all sorts have already been proposed to cope with degraded character images, but those techniques often consist in replacing the degraded shapes by a corresponding prototype which is not satisfying for lots of specialists. For that, we decided to develop our own method for accurate character restoration, basing our study on generic image processing tools (namely: Gabor filtering and the active contours model) completed with some specific automatically extracted structural information. The principle of our method is to make an active contour recover the lost information using an external energy term based on the use of an automatically built and selected reference character image. Results are presented for real case examples taken from printed and handwritten documents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Le Bourgeois, F., Trinh, E., Allier, B., Eglin, V., Emptoz, H.: Document images analysis solutions for digital libraries. In: Proceedings of the International Workshop on Document Image Analysis for Libraries, pp. 2–24. Palo Alto, CA, USA (2004)

  2. Baird, H.S.: Digital libraries and document image analysis. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 2–14. Edinburgh, Scotland (2003)

  3. Belaïd, A.: Recognition of table of contents for electronic library consulting. Int. J. Doc. Anal. Recognit. 4(1), 35–45 (2001)

    Article  Google Scholar 

  4. Ping, Z., Lihui, C.: Document filters using morphological and geometrical features of characters. Image Vis. Comput. (19), 847–855 (2001)

  5. Bouche, R., Emptoz, H., Lebourgeois, F., Metzger, J.-P.: DEBORA European Project, Research Report no. LB 5608 A, juin 2000

  6. Hobby, J.D., Baird, H.S.: Degraded character image continuation. In: Proceedings of 5th UNLV Symposium on Document Analysis and Information Retrieval, pp. 233–245. Las Vegas, Nevada, USA (1996)

  7. Hobby, J.D., Ho, T.K.: Enhancing degraded document images via bitmap clustering and averaging. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 394–400. Ulm, Germany (1997)

  8. Whichello, A., Yan, H.: Linking broken character borders with variable sized masks to improve recognition. Pattern Recognit 29(8), 1429–1435 (1996)

    Article  Google Scholar 

  9. Yu, D., Yan, H.: Reconstruction of broken handwritten digits based on structural morphological features. Pattern Recognit. (34), 235–254 (2001)

  10. Shi, Z., Govindaraju, V.: Character image enhancement by selective region-growing. Pattern Recognit. Lett. (17), 523–527 (1996)

  11. Billawala, N., Hart, P.E., Peairs, M.: Image continuation. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 53–57. Tsukuba, Japan (1993)

  12. Zheng, Q., Kanungo, T.: Morphological degradation models and their use in document image restoration, Research Report LAMP-TR-065/CS-TR-4218/CAR-TR-962, February 2001. University of Maryland, Maryland

  13. Baird, H.S.: Document image defect models. In: Baird, H., Bunke, H., Yamamoto, K. (eds.) Structured document image analysis—Proceedings of the IAPR 1990 Workshop on SSPR, pp. 546–556. Springer, Berlin Heidelberg Germany (1992)

  14. Baird, H.S.: The state of the art of document image degradation modeling. In: Proceedings of the 4th IAPR Workshop on Document Analysis Systems, pp. 1–16, Rio de Janeiro, Brazil (2000)

  15. Andre, J.: De Pacioli à Bézier: 5 siècles de mathématiques pour la typographie, in 4000 ans d'histoire des mathématiques: les mathématiques dans la longue durée, Actes du 13ème colloque Inter-IREM d'épistémologie et histoire des mathématiques 2000, pp. 98–139. Rennes (2002)

  16. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. Int. J. Comput. Vis. 1(X), 321–331 (1988)

    Article  Google Scholar 

  17. Allier, B.: Contours actifs et caractères dégradés, Research Report RR2002-3, mars 2002. Laboratoire RFV/LIRIS-INSA Lyon

  18. Cohen, L.D., Cohen, I.: Finite-elements methods for active contour models and balloons for 2-D and 3-D images. IEEE Trans. Pattern Anal. Machine Intell. 15(11), 1131–1147 (1993)

    Article  Google Scholar 

  19. Cohen, L.D.: On active contour models and balloons: Image understanding. Comput. Vis. Graphics Image Process. 17(2), 211–218 (1991)

    Google Scholar 

  20. Xu, C., Prince, J.L.: Gradient vector flow: A new external force for snakes. In: Proceedings of the IEEE Conference of Computer Vision and Pattern Recognition, pp. 66–71. San Juan, Puerto Rico, USA (1997)

  21. Allier, B., Emptoz, H.: Character prototyping in document images using Gabor filters. In: Proceedings of the IEEE International Conference on Image Processing. Barcelona, Spain (2003) (0-7803-7751-6)

  22. Jain, A.K., Farrokhnia, F.: Unsupervised texture segmentation using Gabor filters. Pattern Recognit. 24(12), 1167–1186 (1991)

    Article  Google Scholar 

  23. Zramdini, A.: Study of optical font recognition based on global typographical features. PhD thesis, IIUF-Université de Fribourg, Fribourg, Suisse (1995), 170 p

  24. Chaudhuri, B.B., Garain, U.: Automatic detection of italic, bold and all-capital words in document images. In: Proceedings of the 14th International Conference on Pattern Recognition, pp. 610–612. Brisbane, Australia (1998)

  25. Wong, K.Y., Casey, R.G., Wahl, F.M.: Document Analysis System. IBM J. Res. Dev. 26(6), 647–656 (1982)

    Article  Google Scholar 

  26. Duffy, L.: Recherche d'information logique dans les documents à typographie riche et récurrente. Application aux sommaires. PhD thesis, Institut National des Sciences Appliquées-INSA de Lyon, Lyon, France (1997), 160 p

  27. Doermann, D.S., Rivlin, E., Rosenfeld, A.: The function of documents. Int. J. Comput. Vis. 16(11), 799–814 (1998)

    Google Scholar 

  28. Wu, V., Manmatha, R., Riseman, E.M.: Finding text in images. In: Proceedings of the Second ACM International Conference on Digital Libraries, pp. 23–26. Philadelphia, PA, USA (1997)

  29. Jain, A.K., Bhattacharjee, S.K., Chen, Y.: On texture in document images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 677–680. Champaign, Illinois, USA (1992)

  30. Ma, H., Doermann, D.S.: Gabor filter based multi-class classifier for scanned document images. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 968–972. Edinburgh, Scotland (2003)

  31. Hamamoto, Y., Uchimura, S., Watanabe, M., Yasuda, T., Mitani, Y., Tomita, S.: A Gabor filter-based method for recognizing handwritten numerals. Pattern Recognit. 31(4), 395–400 (1998)

    Article  Google Scholar 

  32. Manjunath, B.S., Ma, W.Y.: Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Machine Intell. 18(8), 837–842 (1996)

    Article  Google Scholar 

  33. Allier, B., Emptoz, H.: Font type extraction and character prototyping using Gabor filters. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 799–803. Edinburgh, Scotland (2003)

  34. Basseville, M.: Distance measures for signal processing and pattern recognition. Signal Process. 18(4), 349–369 (1989)

    Article  MathSciNet  Google Scholar 

  35. Allier, B.: Contribution à la numérisation des collections: apports des contours actifs. PhD thesis, Institut National des Sciences Appliquées-INSA de Lyon, Lyon France (2003), 260 p

  36. Cheeseman, P., Stutz, J.: Bayesian classification (Autoclass): Theory and results. In: Fayyad, U.M., et al. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 153–180. AAAI /MIT, Cambridge, MA (1996)

    Google Scholar 

  37. Chenevoy, Y.: Reconnaissance structurelle de documents imprimés: études et réalisations. PhD thesis, INPL (1992), 213 p

  38. Anigbogu, J.C., Belaïd, A.: Hidden Markov models in text recognition. Int. J. Pattern Recognit. Artif. Intell. 9(6), 925–958 (1995)

    Article  Google Scholar 

  39. Simon, J.C., Zerhoumi, K.: Description Robuste d'une Image de Lignes. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 3–14. Saint-Malo, France (1991)

  40. Xue, H., Govindaraju, V.: Building skeletal graphs for structural feature extraction on handwriting images. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 96–100. Seattle, USA (2001)

  41. Chianese, A., Cordella, L.P., De Santo, M., Marcelli, A., Vento, M.: A structural method for handprinted character recognition. In: Lecture Notes in Computer Science, vol. 399, pp. 289–302. Springer-Verlag, Berlin Heidelberg New York (1989)

  42. Wang, L., Pavlidis, T.: Direct gray-scale extraction of features for character recognition. IEEE Trans. Pattern Anal. Machine Intell. 15(10), 1053–1067 (1993)

    Article  Google Scholar 

  43. Lee, S.-W., Kim, Y.J.: Direct extraction of topographic features for gray scale character recognition. IEEE Trans. Pattern Anal. Machine Intell. 17(7), 724–729 (1995)

    Article  Google Scholar 

  44. Allier, B.: Reconstruction de caractères: Extraction de graphes structurels, Research Report RR2001-5, déc. 2001. Laboratoire RFV/LIRIS - INSA Lyon

  45. Bali, N.: Codage source adapté aux formats spécifiques aux documents complexes suivent la qualité service attendue, Research Report, Juin 2004. DEA T3IA Univ. de Poitiers, SIC/LIRIS

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bénédicte Allier.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Allier, B., Bali, N. & Emptoz, H. Automatic accurate broken character restoration for patrimonial documents. IJDAR 8, 246–261 (2006). https://doi.org/10.1007/s10032-005-0012-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-005-0012-7

Keywords

Navigation