Abstract
In this paper, we present an adaptive water flow model for the binarization of degraded document images. We regard an image surface as a three-dimensional terrain and pour water on it. The water finds the valleys and fills them. Our algorithm controls the rainfall process, pouring the water, in such a way that the water fills up to half of the valley’s depth. After stopping the rainfall, each wet region represents one character or a noisy component. To segment each character, we labeled the wet regions and regarded them as blobs; since some of the blobs are noisy components, we use a multilayer Perceptron to label each blob as either text or non-text. Since our algorithm classifies the blobs instead of pixels, it preserves stroke connectivity. After several experiments, the proposed binarization algorithm demonstrated superior performance against six well-known algorithms on three sets of degraded document images. The main superiority of our algorithm is on document images with uneven illumination.
Similar content being viewed by others
References
Gatos B., Pratikakis I., Perantonis S.J.: Adaptive degraded document image binarization. Pattern Recognit. 39, 317–327 (2006)
Otsu N.: A threshold selection method from grey level histogram. IEEE Trans. Syst. Man Cybernet. 9, 62–66 (1979)
Kapur J.N., Sahoo P.K., Wong A.K.C.: A new method for graylevel picture thresholding using the entropy of the histogram. Comput. Vis. Graph. Image Process. 29, 273–285 (1985)
Weszka J.S., Rosenfield A.: Histogram modification for threshold selection. IEEE Trans. Syst. Man Cybernet. 9, 38–52 (1979)
Dawoud A., Kamel M.S.: Iterative multimodel subimage binarization for handwritten character segmentation. IEEE Trans. Image Process. 13, 1223–1230 (2004)
Liu Y., Srihari S.N.: Document image binarization based on texture features. IEEE Trans. Pattern Anal. Mach. Intell. 19, 540–544 (1997)
Sauvola J., Pietikainen M.: Adaptive document image binarization. Pattern Recognit. 33, 225–236 (2000)
Lu, S., Tan, C.L.: Binarization of badly illuminated document images through shading estimation and compensation. In: Proceedings of 9th International Conference on Document Analysis and Recognition, Brazil, pp. 312–316 (2007)
Chen Y., Leedham G.: Decompose algorithm for thresholding degraded historical document images. IEE Proc. Vis. Image Signal Process. 152, 702–714 (2005)
Parker J.R.: Gray level thresholding in badly illuminated images. IEEE Trans. Pattern Anal. Mach. Intell. 13, 813–819 (1991)
Niblack W.: An Introduction to Digital Image Processing. Prentice Hall, Englewood Cliffs, NJ (1986)
Yang Y., Yan H.: An adaptive logical method for binarization of degraded document images. Pattern Recognit. 33, 787–807 (2000)
Rodtook, S., Rangsanseri, Y.: Adaptive thresholding of document images based on Laplacian sign. In: Proceedings of International Conference on Information Technology: Coding and Computing, pp. 501–505 (2001)
Chen Q., Sun Q.S., Heng P.A., Xia D.S.: A double-threshold image binarization method based on edge detector. Pattern Recognit. 41, 1254–1267 (2008)
Huang S., Ahmadi M., Sid-Ahmed M.A.: A hidden Markov model-based character extraction method. Pattern Recognit. 41, 2890–2900 (2008)
Kim I.K., Jung D.W., Park R.H.: Document image binarization based on topographic analysis using a water flow model. Pattern Recognit. 35, 265–277 (2002)
Gatos, B., Pratikakis, I., Perantonis, S.J.: Efficient binarization of historical and degraded document images. In: Proceedings of 8th IAPR Workshop on Document Analysis Systems, pp. 447–454 (2008)
Kamel M., Zhao A.: Extraction of binary character/graphics images from grayscale document images. Graph. Model. Image Process. 55, 203–217 (1993)
Oh H.H., Lim K.T., Hien S.I.: An improved binarization algorithm based on a water flow model for document image with inhomogeneous backgrounds. Pattern Recognit. 38, 2612–2625 (2005)
Papamarkos N.: A neuro-fuzzy technique for document binarisation. Neural Comput. Appl. 12, 190–199 (2003)
Gupta M.R., Jacobson N.P., Garcia E.K.: OCR binarization and image pre-processing for searching historical documents. Pattern Recognit. 40, 389–397 (2007)
Badekas E., Papamarkos N.: Optimal combination of document binarization techniques using a self-organizing map neural network. Eng. Appl. Artif. Intell. 20, 11–24 (2007)
Ye X., Cheriet M., Suen C.Y.: Stroke-model-based character extraction from gray-level document images. IEEE Trans. Image Process. 10, 1152–1161 (2001)
White J.M., Rohrer G.D.: Image segmentation for optical character recognition and other applications requiring character image extraction. IBM J. Res. Dev. 27, 400–411 (1983)
Valizadeh, M., Kabir, E.: Binarization of degraded document image based on feature space partitioning and classification. Int. J. Doc. Anal. Recognit. (available online since December 2010)
Lu S., Su B., Tan C.L.: Document image binarization using background estimation and stroke edges. Int. J. Doc. Anal. Recognit. 13, 303–314 (2010)
Gatos, B., Ntirogiannis, K., Pratikakis, I.: ICDAR 2009 document image binarization contest (DIBCO 2009). In: Proceedings of 10th International Conference on Document Analysis and Recognition, Spain, pp. 1375–1382 (2009)
Valizadeh, M., Komeili, M., Armanfard, N., Kabir, E.: Degraded document image binarization based on combination of two complementary algorithms. In: Proceedings of International Conference on Advances in Computing Tools for Engineering Applications, Lobanon, pp. 595–599 (2009)
First international document image binarization contest. http://users.iit.demokritos.gr/~bgat/DIBCO2009/benchmark/
Badekas, E., Papamarkos, N.: Automatic evaluation of document binarization results. In: Proceedings of the 10th Iberoamerican Congress on Pattern Recognition, Havana, pp. 1005–1014 (2005)
Media Team Oulu Document database. http://www.mediateam.oulu.fi/MTDB/
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Valizadeh, M., Kabir, E. An adaptive water flow model for binarization of degraded document images. IJDAR 16, 165–176 (2013). https://doi.org/10.1007/s10032-012-0182-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-012-0182-z