Abstract
In this paper, we propose a new algorithm for the binarization of degraded document images. We map the image into a 2D feature space in which the text and background pixels are separable, and then we partition this feature space into small regions. These regions are labeled as text or background using the result of a basic binarization algorithm applied on the original image. Finally, each pixel of the image is classified as either text or background based on the label of its corresponding region in the feature space. Our algorithm splits the feature space into text and background regions without using any training dataset. In addition, this algorithm does not need any parameter setting by the user and is appropriate for various types of degraded document images. The proposed algorithm demonstrated superior performance against six well-known algorithms on three datasets.
Similar content being viewed by others
References
Otsu N.: A threshold selection method from grey level histogram. IEEE Trans. Syst. Man Cybernet. 9, 62–66 (1979)
Kapur J.N., Sahoo P.K., Wong A.K.C.: A new method for gray level picture thresholding using the entropy of the histogram. Comput. Vis. Graph. Image Process. 29, 273–285 (1985)
Weszka J.S., Rosenfield A.: Histogram modification for threshold selection. IEEE Trans. Syst. Man Cybernet. 9, 38–52 (1979)
Dawoud A., Kamel M.S.: Iterative multimodel subimage binarization for handwritten character segmentation. IEEE Trans. Image Process. 13, 1223–1230 (2004)
Liu Y., Srihari S.N.: Document image binarization based on texture features. IEEE Trans. Pattern Anal. Mach. Intell. 19, 540–544 (1997)
Niblack W.: An Introduction to Digital Image Processing. Prentice Hall, Englewood Cliffs, NJ, USA (1986)
White J.M., Rohrer G.D.: Imager segmentation for optical character recognition and other applications requiring character image extraction. IBM J. Res. Dev. 27, 400–411 (1983)
Parker J.R.: Gray level thresholding in badly illuminated images. IEEE Trans. Pattern Anal. Mach. Intell. 13, 813–819 (1991)
Kamel M., Zhao A.: Extraction of binary character/graphics images from grayscale document images. Graph. Models Image Process. 55, 203–217 (1993)
Bernsen, J.: Dynamic thresholding of grey-level images. In: Proceedings of the 8th International Conference on Pattern Recognition, Paris, pp. 1251–1255 (1986)
Chen Y., Leedham G.: Decompose algorithm for thresholding degraded historical document images. IEE Proc. Vis. Image Signal Process. 152, 702–714 (2005)
Yang Y., Yan H.: An adaptive logical method for binarization of degraded document images. Pattern Recognit. 33, 787–807 (2000)
Sauvola J., Pietikainen M.: Adaptive document image binarization. Pattern Recognit. 33, 225–236 (2000)
Ye X., Cheriet M., Suen C.Y.: Stroke-model-based character extraction from gray-level document images. IEEE Trans. Image Process. 10, 1152–1161 (2001)
Oh H.H., Lim K.T., Hien S.I.: An improved binarization algorithm based on a water flow model for document image with inhomogeneous backgrounds. Pattern Recognit. 38, 2612–2625 (2005)
Gatos B., Pratikakis I., Perantonis S.J.: Adaptive degraded document image binarization. Pattern Recognit. 39, 317–327 (2006)
Lu, S., Tan, C.L.: Binarization of badly illuminated document images through shading estimation and compensation. In: 9th International Conference on Document Analysis and Recognition, Brazil, pp. 312–316 (2007)
Huang S., Ahmadi M., Sid-Ahmed M.A.: A hidden Markov model-based character extraction method. Pattern Recognit. 41, 2890–2900 (2008)
Chou C.H., Lin W.H., Chang F.: A binarization method with learning-built rules for document images produced by cameras. Pattern Recognit. 43, 1518–1530 (2010)
Valizadeh, M., Komeili, M., Armanfard, N., Kabir, E.: Degraded document image binarization based on combination of two complementary algorithms. In: International Conference in Advances in Computer Tools for Engineering Applications, Lobanon, pp. 595–599 (2009)
Li J., Ray S., Lindsay B.G.: A nonparametric statistical approach to clustering via mode identification. J. Mach. Learn. Res. 8, 1687–1723 (2007)
Badekas, E., Papamarkos, N.: Automatic evaluation of document binarization results. In: Proceedings of the 10th Iberoamerican Congress on Pattern Recognition, pp. 1005–1014. Havana (2005)
First international document image binarization contest (http://users.iit.demokritos.gr/~bgat/DIBCO2009/benchmark/)
Media Team Oulu Document database (http://www.mediateam.oulu.fi/MTDB/)
Gatos, B., Ntirogiannis K., Pratikakis, I.: ICDAR 2009 document image binarization contest (DIBCO 2009). In: 10th International Conference on Document Analysis and Recognition, Spain, pp. 1375–1382 (2009)
Solihin Y., Leedham C.G.: Integral ratio: A new class of global thresholding techniques for handwriting images. IEEE Trans. Pattern Anal. Mach. Intell. 21, 761–768 (1999)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Valizadeh, M., Kabir, E. Binarization of degraded document image based on feature space partitioning and classification. IJDAR 15, 57–69 (2012). https://doi.org/10.1007/s10032-010-0142-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-010-0142-4