Skip to main content

Region Based Approach for Binarization of Degraded Document Images

  • Conference paper
  • First Online:
Advances in Soft and Hard Computing (ACS 2018)

Abstract

Binarization of highly degraded document images is one of the key steps of image preprocessing, influencing the final results of further text recognition and document analysis. As the contaminations visible on such documents are usually local, the most popular fast global thresholding methods should not be directly applied for such images. On the other hand, the application of some typical adaptive methods based on the analysis of the neighbourhood of each pixel of the images is time consuming and not always leads to satisfactory results. To bridge the gap between those two approaches the application of region based modifications of some histogram based thresholding methods has been proposed in the paper. It has been verified for well known Otsu, Rosin and Kapur algorithms using the challenging images from Bickley Diary dataset. Experimental results obtained for region based Otsu and Kapur methods are superior in comparison to the use of global methods and may be the basis for further research towards combined region based binarization of degraded document images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bradley, D., Roth, G.: Adaptive thresholding using the integral image. J. Graph. Tools 12(2), 13–21 (2007)

    Article  Google Scholar 

  2. Chou, C.H., Lin, W.H., Chang, F.: A binarization method with learning-built rules for document images produced by cameras. Pattern Recognit. 43(4), 1518–1530 (2010)

    Article  Google Scholar 

  3. Deng, F., Wu, Z., Lu, Z., Brown, M.S.: Binarizationshop: a user assisted software suite for converting old documents to black-and-white. In: Proceedings of the Annual Joint Conference on Digital Libraries, pp. 255–258 (2010)

    Google Scholar 

  4. Feng, M.L., Tan, Y.P.: Adaptive binarization method for document image analysis. In: Proceedings of the 2004 IEEE International Conference on Multimedia and Expo (ICME), vol. 1, pp. 339–342, June 2004

    Google Scholar 

  5. Gatos, B., Pratikakis, I., Perantonis, S.: Adaptive degraded document image binarization. Pattern Recognit. 39(3), 317–327 (2006)

    Article  Google Scholar 

  6. Kapur, J., Sahoo, P., Wong, A.: A new method for gray-level picture thresholding using the entropy of the histogram. Comput. Vis. Graph. Image Process. 29(3), 273–285 (1985)

    Article  Google Scholar 

  7. Khurshid, K., Siddiqi, I., Faure, C., Vincent, N.: Comparison of Niblack inspired binarization methods for ancient documents. In: Document Recognition and Retrieval XVI, vol. 7247, pp. 7247–7247-9 (2009)

    Google Scholar 

  8. Kulyukin, V., Kutiyanawala, A., Zaman, T.: Eyes-free barcode detection on smartphones with Niblack’s binarization and Support Vector Machines. In: Proceedings of the 16th International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV 2012) at the World Congress in Computer Science, Computer Engineering, and Applied Computing WORLDCOMP, vol. 1, pp. 284–290. CSREA Press, July 2012

    Google Scholar 

  9. Lech, P., Okarma, K.: Fast histogram based image binarization using the Monte Carlo threshold estimation. In: Chmielewski, L.J., Kozera, R., Shin, B.S., Wojciechowski, K. (eds.) Computer Vision and Graphics. Lecture Notes in Computer Science, vol. 8671, pp. 382–390. Springer, Cham (2014)

    Google Scholar 

  10. Lech, P., Okarma, K.: Optimization of the fast image binarization method based on the monte carlo approach. Elektronika Ir Elektrotechnika 20(4), 63–66 (2014)

    Article  Google Scholar 

  11. Lech, P., Okarma, K.: Prediction of the optical character recognition accuracy based on the combined assessment of image binarization results. Elektronika Ir Elektrotechnika 21(6), 62–65 (2015)

    Article  Google Scholar 

  12. Leedham, G., Yan, C., Takru, K., Tan, J.H.N., Mian, L.: Comparison of some thresholding algorithms for text/background segmentation in difficult document images. In: Proceedings of the 7th International Conference on Document Analysis and Recognition, ICDAR 2003, pp. 859–864, August 2003

    Google Scholar 

  13. Michalak, H., Okarma, K.: Fast adaptive image binarization using the region based approach. In: Silhavy, R. (ed.) Artificial Intelligence and Algorithms in Intelligent Systems. Advances in Intelligent Systems and Computing, vol. 764, pp. 79–90. Springer, Cham (2019)

    Chapter  Google Scholar 

  14. Moghaddam, R.F., Cheriet, M.: AdOtsu: an adaptive and parameterless generalization of Otsu’s method for document image binarization. Pattern Recognit. 45(6), 2419–2431 (2012)

    Article  Google Scholar 

  15. Niblack, W.: An Introduction to Digital Image Processing. Prentice Hall, Englewood Cliffs (1986)

    Google Scholar 

  16. Okarma, K., Lech, P.: Fast statistical image binarization of colour images for the recognition of the QR codes. Elektronika Ir Elektrotechnika 21(3), 58–61 (2015)

    Article  Google Scholar 

  17. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)

    Article  Google Scholar 

  18. Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: ICDAR 2017 Document Image Binarization COmpetition (DIBCO 2017) (2017). https://vc.ee.duth.gr/dibco2017/

  19. Rosin, P.L.: Unimodal thresholding. Pattern Recognit. 34(11), 2083–2096 (2001)

    Article  Google Scholar 

  20. Samorodova, O.A., Samorodov, A.V.: Fast implementation of the Niblack binarization algorithm for microscope image segmentation. Pattern Recognit. Image Anal. 26(3), 548–551 (2016)

    Article  Google Scholar 

  21. Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recognit. 33(2), 225–236 (2000)

    Article  Google Scholar 

  22. Saxena, L.P.: Niblack’s binarization method and its modifications to real-time applications: a review. Artif. Intell. Rev., 1–33 (2017)

    Google Scholar 

  23. Shrivastava, A., Srivastava, D.K.: A review on pixel-based binarization of gray images. Advances in Intelligent Systems and Computing, vol. 439, pp. 357–364. Springer, Singapore (2016)

    Google Scholar 

  24. Su, B., Lu, S., Tan, C.L.: Robust document image binarization technique for degraded document images. IEEE Trans. Image Process. 22(4), 1408–1417 (2013)

    Article  MathSciNet  Google Scholar 

  25. Wen, J., Li, S., Sun, J.: A new binarization method for non-uniform illuminated document images. Pattern Recognit. 46(6), 1670–1690 (2013)

    Article  Google Scholar 

  26. Wolf, C., Jolion, J.M.: Extraction and recognition of artificial text in multimedia documents. Form. Pattern Anal. Appl. 6(4), 309–326 (2004)

    MathSciNet  Google Scholar 

  27. Yoon, Y., Ban, K.D., Yoon, H., Lee, J., Kim, J.: Best combination of binarization methods for license plate character segmentation. ETRI J. 35(3), 491–500 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krzysztof Okarma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Michalak, H., Okarma, K. (2019). Region Based Approach for Binarization of Degraded Document Images. In: PejaÅ›, J., El Fray, I., Hyla, T., Kacprzyk, J. (eds) Advances in Soft and Hard Computing. ACS 2018. Advances in Intelligent Systems and Computing, vol 889. Springer, Cham. https://doi.org/10.1007/978-3-030-03314-9_37

Download citation

Publish with us

Policies and ethics