Skip to main content
Log in

Fast and accurate scene text understanding with image binarization and off-the-shelf OCR

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

While modern off-the-shelf OCR engines show particularly high accuracy on scanned text, text detection and recognition in natural images still remain a challenging problem. Here, we demonstrate that OCR engines can still perform well on this harder task as long as an appropriate image binarization is applied to input photographs. We propose a new binarization algorithm that is particularly suitable for scene text and systematically evaluate its performance along with 12 existing binarization methods. While most existing binarization techniques are designed specifically either for text detection or for recognition of localized text, our method shows very similar results for both large images and localized text regions. Therefore, it can be applied to large images directly with no need for re-binarization of localized text regions. We also propose the real-time variant of this method based on linear-time bilateral filtering. Evaluation across different metrics on established natural image text recognition benchmarks (ICDAR 2003 and ICDAR 2011) shows that our simple and fast image binarization method combined with off-the-shelf OCR engine achieves state-of-the-art performance for end-to-end text understanding in natural images and outperforms recent fancy methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://www.nuance.com/.

  2. http://code.google.com/p/tesseract-ocr.

  3. http://graphics.cs.msu.ru/en/research/projects/msr/text.

  4. http://liris.cnrs.fr/christian.wolf/software/binarize/index.html.

  5. http://www.comp.nus.edu.sg/~subolan/.

  6. https://sites.google.com/site/roboticssaurav/strokewidthnokia.

  7. http://graphics.cs.msu.ru/en/research/projects/msr/text.

  8. http://www.mathworks.com/products/image/.

References

  1. Adams, A., Gelfand, N., Dolson, J., Levoy, M.: Gaussian KD-trees for fast high-dimensional filtering. ACM Trans. Graph. (TOG) 28(3), 21 (2009)

    Article  Google Scholar 

  2. Badekas, E., Papamarkos, N.: Automatic evaluation of document binarization results. In: CIARP, pp. 1005–1014 (2005)

  3. Boykov, Y., Jolly, M.P.: Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: ICCV, pp. 105–112 (2001)

  4. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2004)

  5. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1124–1137 (2004)

    Article  Google Scholar 

  6. Clavelli, A., Karatzas, D., Lladós, J.: A framework for the assessment of text extraction algorithms on complex colour images. In: Document Analysis Systems, pp. 19–26 (2010)

  7. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR (2010)

  8. Ezaki, N.: Text detection from natural scene images: towards a system for visually impaired persons. In. International Conference on Pattern Recognition, pp. 683–686 (2004)

  9. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 38(2), 337–407 (2000)

  10. Gatos, B., Ntirogiannis, K., Pratikakis, I.: Icdar 2009 document image binarization contest (dibco 2009). In: ICDAR, pp. 1375–1382 (2009)

  11. Gatos, B., Pratikakis, I., Perantonis, S.J.: Text detection in indoor/outdoor scene images. In: CBDAR’05, pp. 127–132 (2005)

  12. He, K., Sun, J., Tang, X.: Guided image filtering. In: Computer vision-ECCV 2010, pp. 1–14. Springer (2010)

  13. Howe, N.: A laplacian energy for document binarization. In: ICDAR, pp. 6–10 (2011)

  14. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras, L.P.: ICDAR 2013 robust reading competition. In: 2013 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2013)

  15. Kimmel, R., Bruckstein, A.M.: Regularized Laplacian zero crossings as optimal edge integrators. Int. J. Comput. Vis. 53(3), 225–243 (2003)

    Article  Google Scholar 

  16. Kittler, J., Illingworth, J.: Minimum error thresholding. Pattern Recogn. 19, 41–47 (1986)

    Article  Google Scholar 

  17. Lu, S., Su, B., Tan, C.L.: Document image binarization using background estimation and stroke edges. IJDAR 13(4), 303–314 (2010)

    Article  Google Scholar 

  18. Milyaev, S., Barinova, O., Novikova, T., Lempitsky, V., Kohli, P.: Image binarization for end-to-end text understanding in natural images. In: ICDAR (2013)

  19. Minetto, R., Thome, N., Cord, M., Stolfi, J., Precioso, F., Guyomard, J., Leite, N.J.: Text detection and recognition in urban scenes. In: ICCV Workshops, pp. 227–234 (2011)

  20. Mishra, A., Alahari, K., Jawahar, C.V.: An mrf model for binarization of natural scene text. In: ICDAR, pp. 11–16 (2011)

  21. Neumann, L., Matas, J.: Estimating hidden parameters for text localization and recognition. In: Computer Vision Winter Workshop (2011)

  22. Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR, pp. 3538–3545 (2012)

  23. Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: 2013 IEEE International Conference on Computer Vision (ICCV 2013), pp. 97–104 (2013)

  24. Niblack, W.: An introduction to digital image processing. Strandberg Publishing, Denmark (1985)

    Google Scholar 

  25. Ntirogiannis, K., Gatos, B., Pratikakis, I.: An objective evaluation methodology for document image binarization techniques. In: DAS, pp. 217–224 (2008)

  26. Otsu, N.: A threshold selection method from gray level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979)

    Article  Google Scholar 

  27. Pan, Y.F., Hou, X., Liu, C.L.: Text localization in natural scene images based on conditional random field. In: ICDAR, pp. 6–10 (2009)

  28. Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR 2011 document image binarization contest (DIBCO 2011). In: ICDAR, pp. 1506–1510 (2011)

  29. Sauvola, J., Pietikinen, M.: Adaptive document image binarization. Pattern Recogn. 33, 225–236 (2000)

  30. Wakahara, T., Kita, K.: Binarization of color character strings in scene images using k-means clustering and support vector machines. In: ICDAR, pp. 274–278 (2011)

  31. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: IEEE International Conference on Computer Vision (ICCV). Barcelona, Spain (2011)

  32. Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: 21st International Conference on Pattern Recognition (ICPR), pp. 3304–3308 (2012)

  33. Wolf, C., Doermann, D.: Binarization of low quality text using a markov random field model. In: Proceedings of International Conference on Pattern Recognition, pp. 160–163 (2002)

  34. Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recogn. 8(4), 280–296 (2006)

    Article  Google Scholar 

  35. Yamazoe, T., Etoh, M., Yoshimura, T., Tsujino, K.: Hypothesis preservation approach to scene text recognition with weighted finite-state transducer. In: ICDAR, pp. 359–363 (2011)

  36. Yang, Q., Tan, K.H., Ahuja, N.: Real-time o (1) bilateral filtering. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 557–564. IEEE (2009)

  37. Yang, Q.: Recursive bilateral filtering. In: ECCV (1), pp. 399–413 (2012)

  38. Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: ICDAR 2011 document image binarization contest (DIBCO 2011). In: CVPR (2012)

  39. Yoon, K.J., Kweon, I.S.: Adaptive support-weight approach for correspondence search. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 650–656 (2006)

    Article  Google Scholar 

  40. Zhu, K., Qi, F., Jiang, R., Xu, L., Kimaci, M., Wu, Y., Aizawa, T.: Using adaboost to detect and segment characters from natural scenes. In: Camera-Based Document Analysis and Recognition (CBDAR) (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergey Milyaev.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Milyaev, S., Barinova, O., Novikova, T. et al. Fast and accurate scene text understanding with image binarization and off-the-shelf OCR. IJDAR 18, 169–182 (2015). https://doi.org/10.1007/s10032-015-0240-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-015-0240-4

Keywords

Navigation