Skip to main content
Log in

Historical document enhancement using LUT classification

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

The fast evolution of scanning and computing technologies in recent years has led to the creation of large collections of scanned historical documents. It is almost always the case that these scanned documents suffer from some form of degradation. Large degradations make documents hard to read and substantially deteriorate the performance of automated document processing systems. Enhancement of degraded document images is normally performed assuming global degradation models. When the degradation is large, global degradation models do not perform well. In contrast, we propose to learn local degradation models and use them in enhancing degraded document images. Using a semi-automated enhancement system, we have labeled a subset of the Frieder diaries collection (The diaries of Rabbi Dr. Avraham Abba Frieder. http://ir.iit.edu/collections/). This labeled subset was then used to train classifiers based on lookup tables in conjunction with the approximated nearest neighbor algorithm. The resulting algorithm is highly efficient and effective. Experimental evaluation results are provided using the Frieder diaries collection (The diaries of Rabbi Dr. Avraham Abba Frieder. http://ir.iit.edu/collections/).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agam, G., Bal, G., Frieder, G., Frieder, O.: Degraded document image enhancement. In: Lin, X., Yanikoglu, B.A. (eds.) Document Recognition and Retrieval XIV. Proceeding of the SPIE, vol. 6500, pp. 65000C–1–65000C–11 (2007)

  2. Allier B., Bali N., Emptoz H.: Automatic accurate broken character restoration for patrimonial documents. Int. J. Document Anal. Recognit. 8(4), 246–261 (2006)

    Article  Google Scholar 

  3. Antonacopoulos, A., Castilla, C.: Flexible text recovery from degraded typewritten historical documents. In: Proceedings of the 18th International Conference on Pattern Recognition ICPR’06, pp. 1062–1065 (2006)

  4. Antonacopoulos, A., Karatzas, D.: A complete approach to the conversion of typewritten historical documents for digital archives. In: Proceedings of the IAPR International Workshop on Document Analysis Systems DAS’04, pp. 90–101 (2004)

  5. Antonacopoulos, A., Karatzas, D.: Document image analysis for world war ii personal records. In: Proceedings of the International Workshop on Document Image Analysis for Libraries DIAL’04 (2004)

  6. Antonacopoulos, A., Karatzas, D.: Semantics-based content extraction in typewritten historical documents. In: Proceedings of the International Conference on Document Analysis and Recognition ICDAR’05 (2005)

  7. Arya S., Mount D.M., Netanyahu N.S., Silverman R., Wu A.Y.: An optimal algorithm for approximate nearest neighbor searching. J. ACM 45(6), 891–923 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  8. Badekas E., Nikolaou N., Papamarkos N.: Text binarization in color documents. Int. J. Imaging Syst. Technol. 16(6), 262–274 (2006)

    Article  Google Scholar 

  9. Baird, H.: Document image quality: making fine discriminations. In: Proceedings of the International Conference on Document Analysis and Recognition (1999)

  10. Bal, G., Agam, G., Frieder, G., Frieder, O.: Interactive degraded document enhancement and ground truth generation. In: Yanikoglu, B., Berkner, K. (eds.) Document Recognition and Retrieval XV. Proceedings of the SPIE, vol. 6815 (2008)

  11. Bernsen, J.: Dynamic thresholding of gray-level images. In: Proceedings of the 8th International Conference on Pattern Recognition. pp. 1251–1255 (1986)

  12. Burges C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 121–167 (1998)

    Article  Google Scholar 

  13. Cannon M., Hochberg J., Kelly P.: Quality assessment and restoration of typewritten document images. Int. J. Document Anal. Recognit. 2(2–3), 80–89 (1999)

    Article  Google Scholar 

  14. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines, (2001) Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm

  15. Droettboom, M., MacMillan, K., Fujinaga, I.: The gamera framework for building custom recognition systems. In: Symposium on Document Image Under standing Technologies. pp. 275–286 (2003)

  16. Du, K., Lu, J., Sekiya, H., Sun, Y., Yahagi, T.: Postprocessing for restoring edges and removing artifacts of low bit rates wavelet-based image. In: Proceedings of the International Symposium on Intelligent Signal Processing and Communications, ISPACS ’06, pp. 943–946 (2006)

  17. The diaries of Rabbi Dr. Avraham Abba Frieder. http://ir.iit.edu/collections/

  18. Friedman J.H., Bentley J.L., Finkel R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977)

    Article  MATH  Google Scholar 

  19. Gatos, B., Pratikakis, I., Perantonis, S.J.: An adaptive binarization technique for low quality historical documents. In: International Workshop Document Analysis Systems (DAS), pp. 102–113 (2004)

  20. Gatos B., Pratikakis I., Perantonis S.J.: Adaptive degraded document image binarization. Pattern Recognit. 39(6), 317–327 (2006)

    Article  MATH  Google Scholar 

  21. Kanungo, T.: Document Degradation models and a methodology for degradation model validation. Ph.D. thesis, University of Washington (1996)

  22. Kavallieratou, E., Stamatatos, E.: Improving the quality of degraded document images. In: International Conference Document Image Analysis for Libraries DIAL’06 (2006)

  23. Niblack, W.: An Introduction to Digital Image Processing. Prentice Hall (1986)

  24. Otsu N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)

    Article  MathSciNet  Google Scholar 

  25. Obafemi-Ajayi, T., Agam, G., Frieder, O.: Ensemble lut classification for degraded document enhancement. In: Yanikoglu, B., Berkner, K. (eds.) Document Recognition and Retrieval XV. Proceedings of the SPIE, vol. 6815 (2008)

  26. Sauvola J., Pietikainen M.: Adaptive document image binarization. Pattern Recognit. 33, 225–236 (2000)

    Article  Google Scholar 

  27. Sezgin M., Sankur B.: Survey over image thresholding techniques and quantitative. J. Electron. Imaging 13, 146–165 (2004)

    Article  Google Scholar 

  28. Smith, R.: (2007) An overview of the tesseract ocr engine. In: Proceedings of the Int’l Conf. on Document Analysis and Recognition 2 629–633

  29. Stubberud, P., Kana, J., Kallurit, V. (1995) Adaptive image restoration of text images that contain touching or broken characters. In: Proceedings of the Third International Conference on Document Analysis and Recognition (ICDAR’95)

  30. Ye X., Suen C.Y., Cheriet M.: A generic method of cleaning and enhancing handwritten data from business forms. Int. J. Document Anal. Recognit. 4, 84–96 (2001)

    Article  Google Scholar 

  31. Zheng, Q., Kanungo, T.: Estimation of morphological degradation model parameters. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’01. pp. 1961–1964 (2001)

  32. Zheng, Q., Kanungo, T.: Morphological degradation models and their use in document image restoration. In: International Conference on Image Processing, pp.193–196 (2001)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tayo Obafemi-Ajayi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Obafemi-Ajayi, T., Agam, G. & Frieder, O. Historical document enhancement using LUT classification. IJDAR 13, 1–17 (2010). https://doi.org/10.1007/s10032-009-0099-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-009-0099-3

Keywords

Navigation