Scene Text Recognition: An Overview

Liang, Shiqi; Bi, Ning; Tan, Jun

doi:10.1007/978-3-031-09037-0_27

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13363))

Included in the following conference series:

International Conference on Pattern Recognition and Artificial Intelligence

2084 Accesses
1 Citations

Abstract

Recent years have witnessed increasing interest in recognizing text in natural scenes in both academia and industry due to the rich semantic information carried by text. With the rapid development of deep learning technology, text recognition in natural scene, also known as scene text recognition (STR), has also made breakthrough progress. However, noise interference in natural scene such as extreme illumination and occlusion, as well as other factors, lead huge challenges to it. Recent research has shown promising in terms of accuracy and efficiency. In order to present the entire picture of the field of STR, this paper try to: 1) summarize the fundamental problems of STR and the progress of representative STR algorithms in recent years; 2) analyze and compare the advantages and disadvantages of them; 3) point out directions for future work to inspire future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Scene text detection and recognition with advances in deep learning: a survey

Article 27 March 2019

A Survey on Scene Text Recognition in Natural Images

Scene Text Recognition: A Preliminary Investigation on Various Techniques and Implementation Using Deep Learning Classifiers

References

Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)
Article MATH Google Scholar
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018)
Google Scholar
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Thirty-first AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)
Google Scholar
Ma, C., Sun, L., Zhong, Z., Huo, Q.: ReLaText: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recogn. 111, 107684 (2021)
Article Google Scholar
Wang, X., Zheng, S., Zhang, C., Li, R., Gui, L.: R-YOLO: a real-time text detector for natural scenes with arbitrary rotation. Sensors 21(3), 888 (2021)
Google Scholar
Xiao, L., Zhou, P., Xu, K., Zhao, X.: Multi-directional scene text detection based on improved YOLOv3. Sensors 21(14), 4870 (2021)
Google Scholar
Long, S., et al.: TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_2
Chapter Google Scholar
Xie, E., et al.: Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 9038–9045 (2019)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Google Scholar
Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8440–8449 (2019)
Google Scholar
Tian, Z., et al.: Learning shape-aware embedding for scene text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4234–4243 (2019)
Google Scholar
Xu, Y., Wang, Y., Zhou, W., Wang, Y., Yang, Z., Bai, X.: TextField: learning a deep direction field for irregular scene text detection. IEEE Trans. Image Process. 28(11), 5566–5579 (2019)
Article MATH Google Scholar
Zhu, Y., Du, J.: Textmountain: accurate scene text detection via instance segmentation. Pattern Recogn. 110, 107336 (2021)
Article Google Scholar
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 11474–11481 (2020)
Google Scholar
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Google Scholar
Ghosh, M., et al.: Movie title extraction and script separation using shallow convolution neural network. IEEE Access 9, 125184–125201 (2021)
Article Google Scholar
Zhang, C., et al.: Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10552–10561 (2019)
Google Scholar
He, M., et al.: MOST: a multi-oriented scene text detector with localization refinement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8813–8822 (2021)
Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Article Google Scholar
Wang, J., Hu, X.: Gated recurrent convolution neural network for OCR. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 334–343 (2017)
Google Scholar
Liu, W., Chen, C., Wong, K.Y.K., Su, Z., Han, J.: STAR-Net: a spatial attention residue network for scene text recognition. In: BMVC, vol. 2, p. 7 (2016)
Google Scholar
Liu, H., Jin, S., Zhang, C.: Connectionist temporal classification with maximum entropy regularization. Adv. Neural. Inf. Process. Syst. 31, 831–841 (2018)
Google Scholar
Yin, F., Wu, Y.C., Zhang, X.Y., Liu, C.L.: Scene text recognition with sliding convolutional character models. arXiv preprint arXiv:1709.01727(2017)
Gao, Y., Chen, Y., Wang, J., Tang, M., Lu, H.: Reading scene text with fully convolutional sequence modeling. Neurocomputing 339, 161–170 (2019)
Article Google Scholar
Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)
Google Scholar
Shi, B., et al.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)
Article Google Scholar
Luo, C., Jin, L., Sun, Z.: MORAN: a multi-object rectified attention network for scene text recognition. Pattern Recognt. 90, 109–118 (2019)
Article Google Scholar
Lin, Q., Luo, C., Jin, L., Lai, S.: STAN: a sequential transformation attention-based network for scene text recognition. Pattern Recognt. 111, 107692 (2021)
Article Google Scholar
Cheng, Z., et al.: Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5076–5084 (2017)
Google Scholar
Lu, N., et al.: MASTER: multi-aspect non-local network for scene text recognition. Pattern Recognt. 117, 107980 (2021)
Article Google Scholar
Wang, T., et al.: Decoupled attention network for text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 12216–12224 (2020)
Google Scholar
Yan, R., Peng, L., Xiao, S., Yao, G.: Primitive representation learning for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 284–293 (2021)
Google Scholar
Chen, Y., et al.: Graph-based global reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 433–442 (2019)
Google Scholar
Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7098–7107 (2021)
Google Scholar
Bhunia, A. K., et al.: Joint visual semantic reasoning: Multi-stage decoder for text recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14940–14949 (2021)
Google Scholar
Yu, D., et al.: Towards accurate scene text recognition with semantic reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12113–12122 (2020)
Google Scholar
Litman, R., et al.: SCATTER: selective context attentional scene text recognizer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11962–11972 (2020)
Google Scholar
Hu, W., Cai, X., Hou, J., Yi, S., Lin, Z.: GTC: guided training of CTC towards efficient and accurate scene text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 11005–11012 (2020)
Google Scholar
Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) ECCV 2018. LNCS, vol. 11218, pp. 67–83. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_5
Liu, X., et al.: FOTS: fast oriented text spotting with a unified network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5676–5685 (2018)
Google Scholar
Feng, W., He, W., Yin, F., Zhang, X.Y., Liu, C.L.: TextDragon: an end-to-end framework for arbitrary shaped text spotting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9076–9085 (2019)
Google Scholar
Liao, M., et al.: Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell. (2019)
Google Scholar
Wang, H., et al.: All you need is boundary: toward arbitrary-shaped text spotting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 12160–12167 (2020)
Google Scholar
Mittal, A., Shivakumara, P., Pal, U., Lu, T., Blumenstein, M.: A new method for detection and prediction of occluded text in natural scene images. Signal Process. Image Commun. 100, 116512 (2022)
Article Google Scholar
Liu, Y., et al.: ABCNet: real-time scene text spotting with adaptive Bezier-curve network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9809–9818 (2020)
Google Scholar
Wang, P., et al.: PGNet: real-time arbitrarily-shaped text spotting with point gathering network. arXiv preprint arXiv:2104.05458(2021)
Wang, W., et al.: PAN++: towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Trans. Pattern Anal. Machi. Intell. (2021)
Google Scholar

Download references

Acknowledgments

This work was supported by Guangdong Province Key Laboratory of Computational Science at the Sun Yat-sen University (2020B1212060032), the National Natural Science Foundation of China (Grant no. 11971491, 11471012).

Author information

Authors and Affiliations

School of Mathematics and Computational Science, Sun Yat-Senen University, Guangzhou, 510275, People’s Republic of China
Shiqi Liang, Ning Bi & Jun Tan
Guangdong Province Key Laboratory of Computational Science, Sun Yat-Senen University, Guangzhou, 510275, People’s Republic of China
Ning Bi & Jun Tan

Authors

Shiqi Liang
View author publications
You can also search for this author in PubMed Google Scholar
Ning Bi
View author publications
You can also search for this author in PubMed Google Scholar
Jun Tan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Tan .

Editor information

Editors and Affiliations

Télécom SudParis, Palaiseau, France
Mounîm El Yacoubi
École de Technologie Supérieure, Montreal, QC, Canada
Eric Granger
Hong Kong Baptist University, Kowloon, Kowloon, Hong Kong
Pong Chi Yuen
Indian Statistical Institute, Kolkata, India
Umapada Pal
Université Paris Cité, Paris, France
Nicole Vincent

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, S., Bi, N., Tan, J. (2022). Scene Text Recognition: An Overview. In: El Yacoubi, M., Granger, E., Yuen, P.C., Pal, U., Vincent, N. (eds) Pattern Recognition and Artificial Intelligence. ICPRAI 2022. Lecture Notes in Computer Science, vol 13363. Springer, Cham. https://doi.org/10.1007/978-3-031-09037-0_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-09037-0_27
Published: 02 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09036-3
Online ISBN: 978-3-031-09037-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics