Skip to main content
Log in

Binarization of degraded document image based on feature space partitioning and classification

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

In this paper, we propose a new algorithm for the binarization of degraded document images. We map the image into a 2D feature space in which the text and background pixels are separable, and then we partition this feature space into small regions. These regions are labeled as text or background using the result of a basic binarization algorithm applied on the original image. Finally, each pixel of the image is classified as either text or background based on the label of its corresponding region in the feature space. Our algorithm splits the feature space into text and background regions without using any training dataset. In addition, this algorithm does not need any parameter setting by the user and is appropriate for various types of degraded document images. The proposed algorithm demonstrated superior performance against six well-known algorithms on three datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Otsu N.: A threshold selection method from grey level histogram. IEEE Trans. Syst. Man Cybernet. 9, 62–66 (1979)

    Article  Google Scholar 

  2. Kapur J.N., Sahoo P.K., Wong A.K.C.: A new method for gray level picture thresholding using the entropy of the histogram. Comput. Vis. Graph. Image Process. 29, 273–285 (1985)

    Article  Google Scholar 

  3. Weszka J.S., Rosenfield A.: Histogram modification for threshold selection. IEEE Trans. Syst. Man Cybernet. 9, 38–52 (1979)

    Article  Google Scholar 

  4. Dawoud A., Kamel M.S.: Iterative multimodel subimage binarization for handwritten character segmentation. IEEE Trans. Image Process. 13, 1223–1230 (2004)

    Article  Google Scholar 

  5. Liu Y., Srihari S.N.: Document image binarization based on texture features. IEEE Trans. Pattern Anal. Mach. Intell. 19, 540–544 (1997)

    Article  Google Scholar 

  6. Niblack W.: An Introduction to Digital Image Processing. Prentice Hall, Englewood Cliffs, NJ, USA (1986)

    Google Scholar 

  7. White J.M., Rohrer G.D.: Imager segmentation for optical character recognition and other applications requiring character image extraction. IBM J. Res. Dev. 27, 400–411 (1983)

    Article  Google Scholar 

  8. Parker J.R.: Gray level thresholding in badly illuminated images. IEEE Trans. Pattern Anal. Mach. Intell. 13, 813–819 (1991)

    Article  Google Scholar 

  9. Kamel M., Zhao A.: Extraction of binary character/graphics images from grayscale document images. Graph. Models Image Process. 55, 203–217 (1993)

    Article  Google Scholar 

  10. Bernsen, J.: Dynamic thresholding of grey-level images. In: Proceedings of the 8th International Conference on Pattern Recognition, Paris, pp. 1251–1255 (1986)

  11. Chen Y., Leedham G.: Decompose algorithm for thresholding degraded historical document images. IEE Proc. Vis. Image Signal Process. 152, 702–714 (2005)

    Article  Google Scholar 

  12. Yang Y., Yan H.: An adaptive logical method for binarization of degraded document images. Pattern Recognit. 33, 787–807 (2000)

    Article  Google Scholar 

  13. Sauvola J., Pietikainen M.: Adaptive document image binarization. Pattern Recognit. 33, 225–236 (2000)

    Article  Google Scholar 

  14. Ye X., Cheriet M., Suen C.Y.: Stroke-model-based character extraction from gray-level document images. IEEE Trans. Image Process. 10, 1152–1161 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  15. Oh H.H., Lim K.T., Hien S.I.: An improved binarization algorithm based on a water flow model for document image with inhomogeneous backgrounds. Pattern Recognit. 38, 2612–2625 (2005)

    Article  Google Scholar 

  16. Gatos B., Pratikakis I., Perantonis S.J.: Adaptive degraded document image binarization. Pattern Recognit. 39, 317–327 (2006)

    Article  MATH  Google Scholar 

  17. Lu, S., Tan, C.L.: Binarization of badly illuminated document images through shading estimation and compensation. In: 9th International Conference on Document Analysis and Recognition, Brazil, pp. 312–316 (2007)

  18. Huang S., Ahmadi M., Sid-Ahmed M.A.: A hidden Markov model-based character extraction method. Pattern Recognit. 41, 2890–2900 (2008)

    Article  MATH  Google Scholar 

  19. Chou C.H., Lin W.H., Chang F.: A binarization method with learning-built rules for document images produced by cameras. Pattern Recognit. 43, 1518–1530 (2010)

    Article  MATH  Google Scholar 

  20. Valizadeh, M., Komeili, M., Armanfard, N., Kabir, E.: Degraded document image binarization based on combination of two complementary algorithms. In: International Conference in Advances in Computer Tools for Engineering Applications, Lobanon, pp. 595–599 (2009)

  21. Li J., Ray S., Lindsay B.G.: A nonparametric statistical approach to clustering via mode identification. J. Mach. Learn. Res. 8, 1687–1723 (2007)

    MathSciNet  MATH  Google Scholar 

  22. Badekas, E., Papamarkos, N.: Automatic evaluation of document binarization results. In: Proceedings of the 10th Iberoamerican Congress on Pattern Recognition, pp. 1005–1014. Havana (2005)

  23. First international document image binarization contest (http://users.iit.demokritos.gr/~bgat/DIBCO2009/benchmark/)

  24. Media Team Oulu Document database (http://www.mediateam.oulu.fi/MTDB/)

  25. Gatos, B., Ntirogiannis K., Pratikakis, I.: ICDAR 2009 document image binarization contest (DIBCO 2009). In: 10th International Conference on Document Analysis and Recognition, Spain, pp. 1375–1382 (2009)

  26. Solihin Y., Leedham C.G.: Integral ratio: A new class of global thresholding techniques for handwriting images. IEEE Trans. Pattern Anal. Mach. Intell. 21, 761–768 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Morteza Valizadeh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Valizadeh, M., Kabir, E. Binarization of degraded document image based on feature space partitioning and classification. IJDAR 15, 57–69 (2012). https://doi.org/10.1007/s10032-010-0142-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-010-0142-4

Keywords

Navigation