Scene Text Recognition: No Country for Old Men?

Gómez, Lluís; Karatzas, Dimosthenis

doi:10.1007/978-3-319-16631-5_12

Lluís Gómez¹⁵ &
Dimosthenis Karatzas¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9009))

Included in the following conference series:

Asian Conference on Computer Vision

Abstract

It is a generally accepted fact that Off-the-shelf OCR engines do not perform well in unconstrained scenarios like natural scene imagery, where text appears among the clutter of the scene. However, recent research demonstrates that a conventional shape-based OCR engine would be able to produce competitive results in the end-to-end scene text recognition task when provided with a conveniently preprocessed image. In this paper we confirm this finding with a set of experiments where two off-the-shelf OCR engines are combined with an open implementation of a state-of-the-art scene text detection framework. The obtained results demonstrate that in such pipeline, conventional OCR solutions still perform competitively compared to other solutions specifically designed for scene text recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Scene text detection and recognition: recent advances and future trends

Article 22 June 2015

Text Detection and Recognition from the Scene Images Using RCNN and EasyOCR

KhmerST: A Low-Resource Khmer Scene Text Detection and Recognition Benchmark

Notes

1.
Word Lens and the Google Translate service are examples of a real applications of end-to-end scene text detection and recognition that have acquired market-level maturity.
2.
http://code.google.com/p/tesseract-ocr/.
3.
OCR Omnipage Professional, available at http://www.nuance.com/.
4.
Word list is provided by the Microsoft Web N-Gram Service (http://webngram.research.microsoft.com/info/) with top 100k frequently searched words on the Bing search engine.
5.
http://docs.opencv.org/trunk/modules/text/doc/erfilter.html.
6.
http://code.google.com/p/tesseract-ocr/.
7.
http://finereader.abbyy.com/.

References

Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photoocr: reading text in uncontrolled conditions. In: International Conference on Computer Vision (ICCV) (2013)
Google Scholar
Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: Computer Vision and Pattern Recognition (CVPR) (2004)
Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Computer Vision and Pattern Recognition (CVPR) (2010)
Google Scholar
Fujisawa, H.: Forty years of research in character and document recognition an industrial perspective. Pattern Recogn. 41, 2435–2446 (2008)
Article Google Scholar
Gomez, L., Karatzas, D.: Multi-script text extraction from natural scenes. In: International Conference on Document Analysis and Recognition (ICDAR) (2013)
Google Scholar
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Gomez, L., Robles, S., Mas, J., Fernandez, D., Almazan, J., de las Heras, L.P.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition (ICDAR) (2013)
Google Scholar
Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: International Conference on Document Analysis and Recognition (ICDAR) (2003)
Google Scholar
Milyaev, S., Barinova, O., Novikova, T., Kohli, P., Lempitsky, V.: Image binarization for end-to-end text understanding in natural images. In: International Conference on Document Analysis and Recognition (ICDAR) (2013)
Google Scholar
Neumann, L., Matas, J.: A method for text localization and detection. In: Assian Conference on Computer Vision (ACCV) (2010)
Google Scholar
Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: International Conference on Document Analysis and Recognition (ICDAR) (2011)
Google Scholar
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar
Neumann, L., Matas, J.: On combining multiple segmentations in scene text recognition. In: International Conference on Document Analysis and Recognition (ICDAR) (2013)
Google Scholar
Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: International Conference on Computer Vision (ICCV) (2013)
Google Scholar
Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 752–765. Springer, Heidelberg (2012)
Chapter Google Scholar
Pan, Y.F., Hou, X., Liu, C.L.: Text localization in natural scene images based on conditional random field. In: International Conference on Document Analysis and Recognition (ICDAR) (2009)
Google Scholar
Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: International Conference on Document Analysis and Recognition (ICDAR) (2011)
Google Scholar
Smith, R.: An overview of the tesseract OCR engine. In: International Conference on Document Analysis and Recognition (ICDAR) (2007)
Google Scholar
Smith, R.: Limits on the application of frequency-based language models to OCR. In: International Conference on Document Analysis and Recognition (ICDAR) (2011)
Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Computer Vision and Pattern Recognition (CVPR) (2001)
Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV (2011)
Google Scholar
Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: International Conference on Pattern Recognition (ICPR) (2012)
Google Scholar
Yao, C., Bai, X., Liu, W.: A unified framework for multi-oriented text detection and recognition. In: IEEE Transactions on Image Processing (TIP) (2014)
Google Scholar
Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2013)
Google Scholar

Download references

Acknowledgement

This project was supported by the Spanish project TIN2011-24631 the fellowship RYC-2009-05031, and the Catalan government scholarship 2013 FI1126. The authors want to thanks also Google Inc. for the support received through the GSoC project, as well as the OpenCV community, specially to Stefano Fabri and Vadim Pisarevsky, for their help in the implementation of the scene text detection module evaluated in this paper.

Author information

Authors and Affiliations

Computer Vision Center, Universitat Autònoma de Barcelona, Barcelona, Spain
Lluís Gómez & Dimosthenis Karatzas

Authors

Lluís Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Dimosthenis Karatzas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lluís Gómez .

Editor information

Editors and Affiliations

Center for Visual Information Technology, International Institute of Information Technology, Hyderabad, India
C.V. Jawahar
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Shiguang Shan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gómez, L., Karatzas, D. (2015). Scene Text Recognition: No Country for Old Men?. In: Jawahar, C., Shan, S. (eds) Computer Vision - ACCV 2014 Workshops. ACCV 2014. Lecture Notes in Computer Science(), vol 9009. Springer, Cham. https://doi.org/10.1007/978-3-319-16631-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-16631-5_12
Published: 11 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16630-8
Online ISBN: 978-3-319-16631-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Scene Text Recognition: No Country for Old Men?