Scene text detection and recognition: recent advances and future trends

Zhu, Yingying; Yao, Cong; Bai, Xiang

doi:10.1007/s11704-015-4488-0

Scene text detection and recognition: recent advances and future trends

Review Article
Published: 22 June 2015

Volume 10, pages 19–36, (2016)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Yingying Zhu¹,
Cong Yao¹ &
Xiang Bai¹

2076 Accesses
Explore all metrics

Abstract

Text, as one of the most influential inventions of humanity, has played an important role in human life, so far from ancient times. The rich and precise information embodied in text is very useful in a wide range of vision-based applications, therefore text detection and recognition in natural scenes have become important and active research topics in computer vision and document analysis. Especially in recent years, the community has seen a surge of research efforts and substantial progresses in these fields, though a variety of challenges (e.g. noise, blur, distortion, occlusion and variation) still remain. The purposes of this survey are three-fold: 1) introduce up-to-date works, 2) identify state-of-the-art algorithms, and 3) predict potential research directions in the future. Moreover, this paper provides comprehensive links to publicly available resources, including benchmark datasets, source codes, and online demos. In summary, this literature review can serve as a good reference for researchers in the areas of scene text detection and recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Review of Scene Text Detection and Recognition

Article 11 January 2019

Review on Text Recognition in Natural Scene Images

Scene text detection and recognition with advances in deep learning: a survey

Article 27 March 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Tsai S S, Chen H, Chen D, Schroth G, Grzeszczuk R, Girod B. Mobile visual search on printed documents using text and low bit-rate features. In: Proceedings of the 18th IEEE International Conference on Image Processing. 2011, 2601–2604
Google Scholar
Barber D B, Redding J D, McLain T W, Beard R W, Taylor CN. Vision-based target geo-location using a fixed-wing miniature air vehicle. Journal of Intelligent and Robotic Systems, 2006, 47(4): 361–382
Article Google Scholar
Kisacanin B, Pavlovic V, Huang T S. Real-time vision for humancomputer interaction. Springer Science and Business Media, 2005
Google Scholar
DeSouza G N, Kak A C. Vision for mobile robot navigation: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(2): 237–267
Article Google Scholar
Ham Y K, Kang M S, Chung H K, Park R H, Park G T. Recognition of raised characters for automatic classification of rubber tires. Optical Engineering, 1995, 34(1): 102–109
Article Google Scholar
Yao C, Zhang X, Bai X, Liu W, Tu Z. Rotation-invariant features for multi-oriented text detection in natural images. PloS one, 2013, 8(8): e70173
Article Google Scholar
Yao C, Bai X, Shi B, Liu W. Strokelets: A learned multi-scale representation for scene text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 4042-4049
Chen X, Yuille A L. Detecting and reading text in natural scenes. In: Proceedings of 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2004, 2
Google Scholar
Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In: Proceedings of 2010 IEEE Conference on Computer Vision and Pattern Recognition. 2010, 2963–2970
Chapter Google Scholar
Neumann L, Matas J. A method for text localization and recognition in real-world images. Lecture Notes in Computer Science, 2011, 6494, 770–783
Article Google Scholar
Wang K, Babenko B, Belongie S. End-to-end scene text recognition. In: Proceedings of 2011 IEEE International Conference on Computer Vision. 2011, 1457–1464
Chapter Google Scholar
Yao C, Bai X, Liu W, Ma Y, Tu Z. Detecting texts of arbitrary orientations in natural images. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012, 1083–1090
Google Scholar
Neumann L, Matas J. Real-time scene text localization and recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 3538–3545
Google Scholar
Novikova T, Barinova O, Kohli P, Lempitsky V. Large-lexicon attribute-consistent text recognition in natural images. In: Proceedings of 12th European Conference on Computer Vision. 2012, 752–765
Google Scholar
Mishra A, Alahari K, Jawahar C V. Scene text recognition using higher order language priors. In: Proceedings of the 23rd British Machine Vision Conference. 2012
Google Scholar
Weinman J J, Butler Z, Knoll D, Field J. Toward integrated scene text reading. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(2): 375–387
Article Google Scholar
Bissacco A, Cummins M, Netzer Y, Neven, H. Photoocr: reading text in uncontrolled conditions. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 785–792
Google Scholar
Phan T Q, Shivakumara P, Tian S, Tan C L. Recognizing text with perspective distortion in natural scenes. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 569–576
Google Scholar
Jaderberg M, Vedaldi A, Zisserman A. Deep features for text spotting. In: Proceedings of the 13th European Conference on Computer Vision. 2014, 512–528
Google Scholar
Almazan J, Gordo A, Fornes A, Valveny, E. Word spotting and recognition with embedded attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(12): 2552–2566
Article Google Scholar
Chen D, Luettin J, Shearer K. A survey of text detection and recognition in images and videos. Institut Dalle Molle d’Intelligence Artificielle Perceptive Research Report IDIAP-RR 00-38. 2000
Google Scholar
Jung K, Kim K I, Jain A K. Text information extraction in images and video: a survey. Pattern recognition, 2004, 37(5): 977–997
Article Google Scholar
Liang J, Doermann D, Li H. Camera-based analysis of text and documents: a survey. International Journal of Document Analysis and Recognition, 2005, 7(2–3): 84–104
Article Google Scholar
Zhang H, Zhao K, Song Y Z, Guo J. Text extraction from natural scene image: a survey. Neurocomputing, 2013, 122: 310–323
Article Google Scholar
Uchida S. Text localization and recognition in images and video. Handbook of Document and Recognition. London: Springer, 2014, 843–883
Chapter Google Scholar
Kang L, Li Y, Doermann D. Orientation robust text line detection in natural images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 4034–4041
Google Scholar
Pan Y F, Hou X, Liu C L. A hybrid approach to detect and localize texts in natural scene images. IEEE Transactions on Image Processing, 2011, 20(3): 800–813
Article MathSciNet Google Scholar
Yi C, Tian Y L. Text string detection from natural scenes by structurebased partition and grouping. IEEE Transactions on Image Processing, 2011, 20(9): 2594–2605
Article MathSciNet Google Scholar
Huang W, Lin Z, Yang J C, Wang J. Text localization in natural images using stroke feature transform and text covariance descriptors. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 1241–1248
Google Scholar
Huang W, Qiao Y, Tang X. Robust scene text detection with convolution neural network induced Mser trees. In: Proceedings of European Conference on Computer Vision. 2014, 497–511
Google Scholar
Mishra A, Alahari K, Jawahar C V. Top-down and bottom-up cues for scene text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 2687–2694
Google Scholar
Shi C Z, Wang C H, Xiao B H, Zhang Y. Scene text recognition using part-based tree-structured character detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 2961–2968
Google Scholar
Lee C Y, Bhardwaj A, Di W, Jagadeesh, V. Region-based discriminative feature pooling for scene text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 4050–4057
Google Scholar
Yao C, Bai X, Liu W. A unified framework for multi-oriented text detection and recognition. IEEE Transactions on Image Processing, 2014, 23(11): 4737–4749
Article MathSciNet Google Scholar
Zhong Y, Karu K, Jain A K. Locating text in complex color images. In: Proceedings of the 3rd IEEE Conference on Document Analysis and Recognition. 1995, 146–149
Chapter Google Scholar
Kim K I, Jung K, Kim J H. Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(12): 1631–1639
Article MathSciNet Google Scholar
Gllavata J, Ewerth R, Freisleben B. Text detection in images based on unsupervised classification of high-frequency wavelet coefficients. In: Proceedings of the 17th IEEE International Conference on Pattern Recognition. 2004, 425–428
Google Scholar
Li H, Doermann D, Kia O. Automatic text detection and tracking in digital video. IEEE Transactions on Image Processing, 2000, 9(1): 147–156
Article Google Scholar
Leibe B, Schiele B. Scale-invariant object categorization using a scaleadaptive mean-shift search. Lecture Notes in Computer Science, 2004, 3175: 145–153
Article Google Scholar
Lyu M R, Song J, Cai M. A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Transactions on Circuits and Systems for Video Technology, 2005, 15(2): 243–255
Article Google Scholar
Zhong Y, Zhang H, Jain A K. Automatic caption localization in compressed video. IEEE Transactions on Pattern Analysis and Machine Intelligenc, 2000, 22(4): 385–392
Article Google Scholar
Viola P, Jones M. Fast and robust classification using asymmetric adaboost and a detector cascade. In: Proceedings of Advances in Neural Information Processing System, 2001, 14
Google Scholar
Lucas S M. Icdar 2005 text locating competition results. In: Proceedings of the 8th International Conference on Document Analysis and Recognition. 2005, 80–84
Google Scholar
Wu V, Manmatha R, Riseman E M. Finding text in images. In: Proceedings of the 2nd ACM international conference on Digital libraries. 1997, 3–12
Chapter Google Scholar
Wolf C, Jolion J M. Extraction and recognition of artificial text in multimedia documents. Formal Pattern Analysis and Applications, 2004, 6(4): 309–326
MathSciNet Google Scholar
Wang K, Belongie S. Word spotting in the wild. In: Proceedings of European Conference on Computer Vision. 2010, 591–604
Google Scholar
Jain A K, Yu B. Automatic text location in images and video frames. Pattern Recognition, 1998, 31(12): 2055–2076
Article Google Scholar
Chen H, Tsai S S, Schroth G, Chen D m. Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: Proceedings of the 18th IEEE International Conference on Image Processing. 2011, 2609–2612
Google Scholar
Yin X C, Yin X, Huang K, Hao H W. Robust text detection in natural scene images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(5): 970–983
Article Google Scholar
Wright J, Yang A Y, Ganesh A, Sastry S S. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2): 210–227
Article Google Scholar
Elad M, Aharon M. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 2006, 15(12): 3736–3745
Article MathSciNet Google Scholar
Zhao M, Li S, Kwok J. Text detection in images using sparse representation with discriminative dictionaries. Image and Vision Computing, 2010, 28(12): 1590–1599
Article Google Scholar
Shivakumara P, Phan T Q, Tan C L. A laplacian approach to multioriented text detection in video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2): 412–419
Article Google Scholar
Liu Y X, Ikenaga T. A contour-based robust algorithm for text detection in color images. IEICE Transactions on Information and Systems, 2006, 89(3): 1221–1230
Article Google Scholar
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 1, 886–893
Google Scholar
Lafferty J, McCallum A, Pereira F C N. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning. 2001, 282–289
Google Scholar
Sawaki M, Murase H, Hagita N. Automatic acquisition of context-based images templates for degraded character recognition in scene images. In: Proceedings of the 15th International Conference on Pattern Recognition. 2000, 4, 15–18
Article Google Scholar
Zhou J, Lopresti D. Extracting text from www images. In: Proceedings of the 4th International Conference on Document Analysis and Recognition. 1997, 1, 248–252
Article Google Scholar
Zhou J, Lopresti D P, Lei Z. Ocr for world wide web images. In: Proceedings of Society of Photographic Instrumentation Engineers. 1997, 58
Google Scholar
de Campos T, Babu B R, Varma M. Character recognition in natural images. In: Proceedings of the International Conference on Computer Vision Theory and Applications, 2009
Google Scholar
Smith R. Limits on the application of frequency-based language models to Ocr. In: Proceedings of International Conference on Document Analysis and Recognition. 2011, 538–542
Google Scholar
Matas J, Chum O, Urban M, Pajdla T. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 2004, 22(10): 761–767
Article Google Scholar
Mohri M, Pereira F, Riley M. Weighted finite-state transducers in speech recognition. Computer Speech and Language, 2002, 16(1): 69–88
Article Google Scholar
Rodriguez-Serrano J A, Perronnin F C. Label embedding for text recognition. In: Proceedings of the British Machine Vision Conference, 2013
Google Scholar
Neumann L, Matas J. Text localization in real-world images using efficiently pruned exhaustive search. In: Proceedings of International Conference on Document Analysis and Recognition. 2011, 687-691
Neumann L, Matas J. Scene text localization and recognition with oriented stroke detection. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 97–104
Google Scholar
Le Cun B B, Denker J S, Henderson D, Howard R E, Hubbard W, Jackel L D. Handwritten digit recognition with a back-propagation network. In: Proceedings of Advances in Neural Information Processing Systems. 1990
Google Scholar
Farabet C, Couprie C, Najman L, LeCun, Y. Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1915–1929
Article Google Scholar
Taigman Y, Yang M, Ranzato M A, Wolf, L. Deepface: closing the gap to human-level performance in face verification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 1701–1708
Google Scholar
Girshick R, Donahue J, Darrell T, Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 580–587
Google Scholar
Lee C Y, Xie S, Gallagher P, Zhang Z Y, Tu Z W. Deeply-supervised nets. arXiv preprint arXiv:1409.5185. 2014
Google Scholar
Coates A, Carpenter B, Case C, Satheesh S, Suresh B, Wang T, Wu D J, Ng AY. Text detection and character recognition in scene images with unsupervised feature learning. In: Proceedings of International Conference on Document Analysis and Recognition. 2011, 440–445
Google Scholar
Wang T, Wu D J, Coates A, Ng A Y. End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st International Conference on Pattern Recognition. 2012, 3304–3308
Google Scholar
Karaoglu S, Van Gemert J C, Gevers T. Object reading: text recognition for object recognition. Lecture Notes in Computer Science, 2012, 7585: 456–465
Article Google Scholar
Google Goggles. https://play.google.com/store/apps
Lucas S M, Panaretos A, Sosa L, et al. ICDAR 2003 robust reading competitions. In: Proceedings of the 12th International Conference on Document Analysis and Recognition. 2003, 2, 682–682
Article Google Scholar
Shahab A, Shafait F, Dengel A. ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: Proceedings of International Conference on Document Analysis and Recognition. 2011, 1491–1496
Google Scholar
Karatzas D, Shafait F, Uchida S, Iwamura, M. ICDAR 2013 robust reading competition. In: Proceedings of Document Analysis and Recognition. 2013, 1484–1493
Google Scholar
Nagy R, Dicker A, Meyer-Wegener K. NEOCR: a configurable dataset for natural image text recognition. Camera-Based Document Analysis and Recognition. Berlin: Springer, 2012: 150–163
Chapter Google Scholar
Lee S H, Cho M S, Jung K, Kim J H. Scene text extraction with edge constraint and text collinearity link. In: Proceedings of International Conference on Pattern Recognition. 2010, 3983–3986
Google Scholar
de Campos T, Babu B R, Varma M. Character recognition in natural images. In: Proceedings of International Conference on Computer Vision Theory and Applications, 2009
Google Scholar
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A Y. Reading digits in natural images with unsupervised feature learning. In: Proceedings of NIPS workshop on deep learning and unsupervised feature learning. 2011, (2), 5
Google Scholar
Yi C, Tian Y. Text extraction from scene images by character appearance and structure modeling. Computer Vision and Image Understanding, 2013, 117(2): 182–194
Article Google Scholar
Wolf C, Jolion J M. Object count/area graphs for the evaluation of object detection and segmentation algorithms. International Journal of Document Analysis and Recognition, 2006, 8(4): 280–296
Article Google Scholar
Yin X C, Yin X, Huang K, Hao H W. Accurate and robust text detection: a step-in for text retrieval in natural scene images. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2013, 1091–1092
Google Scholar
Neumann L, Matas J. On combining multiple segmentations in scene text recognition. In: Proceedings of the 12th International Conference on Document Analysis and Recognition. 2013, 523–527
Google Scholar
Koo H I, Kim D H. Scene text detection via connected component clustering and nontext filtering. IEEE Transactions on Image Processing, 2013, 22(6): 2296–2305
Article MathSciNet Google Scholar
Shi C, Wang C, Xiao B, Zhang Y, Gao S. Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognition Letters, 2013, 34(2): 107–116
Article Google Scholar
Yi C, Tian Y. Text detection in natural scene images by stroke gabor words. In: Proceedings of International Conference on Document Analysis and Recognition, 2011, 177–181
Google Scholar
Freeman H, Shapira R. Determining the minimum-area encasing rectangle for an arbitrary closed curve. Communications of the ACM, 1975, 18(7): 409–413
Article MATH MathSciNet Google Scholar
Everingham M, Van Gool L, Williams C K I, Winn J, Zisserman A. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 2010, 88(2): 303–338
Article Google Scholar
Goel V, Mishra A, Alahari K, Jawahar C V. Whole is greater than sum of parts: recognizing scene text words. In: Proceedings of the 12th International Conference on Document Analysis and Recognition. 2013, 398–402
Google Scholar
Yildirim G, Achanta R, SÃijsstrunk S. Text recognition in natural images using multiclass hough forests. In: Proceedings of International Conference on Computer Vision Theory and Applications. 2013, 737–741
Google Scholar
ABBYY FineReader 9.0. http://www.abbyy.com/
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A. Synthetic data and artificial neural networks for natural scene text recognition. 2014, arXiv preprint arXiv:1406.2227
Google Scholar
Su B, Lu S. Accurate scene text recognition based on recurrent neural network. In: Proceedings of Computer Vision-ACCV, 2014
Google Scholar
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A. Reading text in the wild with convolutional neural networks. 2014, arXiv preprint arXiv:1412.1842
Google Scholar
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A. Deep structured output learning for unconstrained text recognition. 2014, arXiv reprint arXiv: 1412.5903
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, 430074, China
Yingying Zhu, Cong Yao & Xiang Bai

Authors

Yingying Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Cong Yao
View author publications
You can also search for this author inPubMed Google Scholar
Xiang Bai
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xiang Bai.

Additional information

Yingying Zhu received her BS in electronics and information engineering from Huazhong University of Science and Technology (HUST), China in 2011. She is currently a PhD student in the School of Electronic Information and Communications, HUST. Her research areas mainly include text/traffic sign detection and recognition in natural images.

Cong Yao received his BS and PhD in electronics and information engineering from Huazhong University of Science and Technology (HUST), China in 2008 and 2014, respectively. He was a visiting research scholar with Temple University, USA in 2013. His research has focused on computer vision and machine learning, in particular, the area of text detection and recognition in natural images.

Xiang Bai received his BS, MS, and PhD degrees from Huazhong University of Science and Technology (HUST), China in 2003, 2005, and 2009, respectively, all in electronics and information engineering. He is currently a professor in the School of Electronic Information and Communications, HUST, where he is also the Vice Director of the National Center of Anti-Counterfeiting Technology, China. His research interests include object recognition, shape analysis, scene text recognition, and intelligent systems.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, Y., Yao, C. & Bai, X. Scene text detection and recognition: recent advances and future trends. Front. Comput. Sci. 10, 19–36 (2016). https://doi.org/10.1007/s11704-015-4488-0

Download citation

Received: 31 October 2014
Accepted: 09 March 2015
Published: 22 June 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s11704-015-4488-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scene text detection and recognition: recent advances and future trends

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Review of Scene Text Detection and Recognition

Review on Text Recognition in Natural Scene Images

Scene text detection and recognition with advances in deep learning: a survey

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now