Abstract
The fast evolution of scanning and computing technologies in recent years has led to the creation of large collections of scanned historical documents. It is almost always the case that these scanned documents suffer from some form of degradation. Large degradations make documents hard to read and substantially deteriorate the performance of automated document processing systems. Enhancement of degraded document images is normally performed assuming global degradation models. When the degradation is large, global degradation models do not perform well. In contrast, we propose to learn local degradation models and use them in enhancing degraded document images. Using a semi-automated enhancement system, we have labeled a subset of the Frieder diaries collection (The diaries of Rabbi Dr. Avraham Abba Frieder. http://ir.iit.edu/collections/). This labeled subset was then used to train classifiers based on lookup tables in conjunction with the approximated nearest neighbor algorithm. The resulting algorithm is highly efficient and effective. Experimental evaluation results are provided using the Frieder diaries collection (The diaries of Rabbi Dr. Avraham Abba Frieder. http://ir.iit.edu/collections/).
Similar content being viewed by others
References
Agam, G., Bal, G., Frieder, G., Frieder, O.: Degraded document image enhancement. In: Lin, X., Yanikoglu, B.A. (eds.) Document Recognition and Retrieval XIV. Proceeding of the SPIE, vol. 6500, pp. 65000C–1–65000C–11 (2007)
Allier B., Bali N., Emptoz H.: Automatic accurate broken character restoration for patrimonial documents. Int. J. Document Anal. Recognit. 8(4), 246–261 (2006)
Antonacopoulos, A., Castilla, C.: Flexible text recovery from degraded typewritten historical documents. In: Proceedings of the 18th International Conference on Pattern Recognition ICPR’06, pp. 1062–1065 (2006)
Antonacopoulos, A., Karatzas, D.: A complete approach to the conversion of typewritten historical documents for digital archives. In: Proceedings of the IAPR International Workshop on Document Analysis Systems DAS’04, pp. 90–101 (2004)
Antonacopoulos, A., Karatzas, D.: Document image analysis for world war ii personal records. In: Proceedings of the International Workshop on Document Image Analysis for Libraries DIAL’04 (2004)
Antonacopoulos, A., Karatzas, D.: Semantics-based content extraction in typewritten historical documents. In: Proceedings of the International Conference on Document Analysis and Recognition ICDAR’05 (2005)
Arya S., Mount D.M., Netanyahu N.S., Silverman R., Wu A.Y.: An optimal algorithm for approximate nearest neighbor searching. J. ACM 45(6), 891–923 (1998)
Badekas E., Nikolaou N., Papamarkos N.: Text binarization in color documents. Int. J. Imaging Syst. Technol. 16(6), 262–274 (2006)
Baird, H.: Document image quality: making fine discriminations. In: Proceedings of the International Conference on Document Analysis and Recognition (1999)
Bal, G., Agam, G., Frieder, G., Frieder, O.: Interactive degraded document enhancement and ground truth generation. In: Yanikoglu, B., Berkner, K. (eds.) Document Recognition and Retrieval XV. Proceedings of the SPIE, vol. 6815 (2008)
Bernsen, J.: Dynamic thresholding of gray-level images. In: Proceedings of the 8th International Conference on Pattern Recognition. pp. 1251–1255 (1986)
Burges C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 121–167 (1998)
Cannon M., Hochberg J., Kelly P.: Quality assessment and restoration of typewritten document images. Int. J. Document Anal. Recognit. 2(2–3), 80–89 (1999)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines, (2001) Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Droettboom, M., MacMillan, K., Fujinaga, I.: The gamera framework for building custom recognition systems. In: Symposium on Document Image Under standing Technologies. pp. 275–286 (2003)
Du, K., Lu, J., Sekiya, H., Sun, Y., Yahagi, T.: Postprocessing for restoring edges and removing artifacts of low bit rates wavelet-based image. In: Proceedings of the International Symposium on Intelligent Signal Processing and Communications, ISPACS ’06, pp. 943–946 (2006)
The diaries of Rabbi Dr. Avraham Abba Frieder. http://ir.iit.edu/collections/
Friedman J.H., Bentley J.L., Finkel R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977)
Gatos, B., Pratikakis, I., Perantonis, S.J.: An adaptive binarization technique for low quality historical documents. In: International Workshop Document Analysis Systems (DAS), pp. 102–113 (2004)
Gatos B., Pratikakis I., Perantonis S.J.: Adaptive degraded document image binarization. Pattern Recognit. 39(6), 317–327 (2006)
Kanungo, T.: Document Degradation models and a methodology for degradation model validation. Ph.D. thesis, University of Washington (1996)
Kavallieratou, E., Stamatatos, E.: Improving the quality of degraded document images. In: International Conference Document Image Analysis for Libraries DIAL’06 (2006)
Niblack, W.: An Introduction to Digital Image Processing. Prentice Hall (1986)
Otsu N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Obafemi-Ajayi, T., Agam, G., Frieder, O.: Ensemble lut classification for degraded document enhancement. In: Yanikoglu, B., Berkner, K. (eds.) Document Recognition and Retrieval XV. Proceedings of the SPIE, vol. 6815 (2008)
Sauvola J., Pietikainen M.: Adaptive document image binarization. Pattern Recognit. 33, 225–236 (2000)
Sezgin M., Sankur B.: Survey over image thresholding techniques and quantitative. J. Electron. Imaging 13, 146–165 (2004)
Smith, R.: (2007) An overview of the tesseract ocr engine. In: Proceedings of the Int’l Conf. on Document Analysis and Recognition 2 629–633
Stubberud, P., Kana, J., Kallurit, V. (1995) Adaptive image restoration of text images that contain touching or broken characters. In: Proceedings of the Third International Conference on Document Analysis and Recognition (ICDAR’95)
Ye X., Suen C.Y., Cheriet M.: A generic method of cleaning and enhancing handwritten data from business forms. Int. J. Document Anal. Recognit. 4, 84–96 (2001)
Zheng, Q., Kanungo, T.: Estimation of morphological degradation model parameters. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’01. pp. 1961–1964 (2001)
Zheng, Q., Kanungo, T.: Morphological degradation models and their use in document image restoration. In: International Conference on Image Processing, pp.193–196 (2001)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Obafemi-Ajayi, T., Agam, G. & Frieder, O. Historical document enhancement using LUT classification. IJDAR 13, 1–17 (2010). https://doi.org/10.1007/s10032-009-0099-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-009-0099-3