Skip to main content

Scene Text Recognition: An Overview

  • Conference paper
  • First Online:
Pattern Recognition and Artificial Intelligence (ICPRAI 2022)

Abstract

Recent years have witnessed increasing interest in recognizing text in natural scenes in both academia and industry due to the rich semantic information carried by text. With the rapid development of deep learning technology, text recognition in natural scene, also known as scene text recognition (STR), has also made breakthrough progress. However, noise interference in natural scene such as extreme illumination and occlusion, as well as other factors, lead huge challenges to it. Recent research has shown promising in terms of accuracy and efficiency. In order to present the entire picture of the field of STR, this paper try to: 1) summarize the fundamental problems of STR and the progress of representative STR algorithms in recent years; 2) analyze and compare the advantages and disadvantages of them; 3) point out directions for future work to inspire future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)

    Article  MATH  Google Scholar 

  2. Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4

  3. Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018)

    Google Scholar 

  4. Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Thirty-first AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  5. Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)

    Google Scholar 

  6. Ma, C., Sun, L., Zhong, Z., Huo, Q.: ReLaText: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recogn. 111, 107684 (2021)

    Article  Google Scholar 

  7. Wang, X., Zheng, S., Zhang, C., Li, R., Gui, L.: R-YOLO: a real-time text detector for natural scenes with arbitrary rotation. Sensors 21(3), 888 (2021)

    Google Scholar 

  8. Xiao, L., Zhou, P., Xu, K., Zhao, X.: Multi-directional scene text detection based on improved YOLOv3. Sensors 21(14), 4870 (2021)

    Google Scholar 

  9. Long, S., et al.: TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_2

    Chapter  Google Scholar 

  10. Xie, E., et al.: Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 9038–9045 (2019)

    Google Scholar 

  11. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  12. Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)

    Google Scholar 

  13. Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8440–8449 (2019)

    Google Scholar 

  14. Tian, Z., et al.: Learning shape-aware embedding for scene text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4234–4243 (2019)

    Google Scholar 

  15. Xu, Y., Wang, Y., Zhou, W., Wang, Y., Yang, Z., Bai, X.: TextField: learning a deep direction field for irregular scene text detection. IEEE Trans. Image Process. 28(11), 5566–5579 (2019)

    Article  MATH  Google Scholar 

  16. Zhu, Y., Du, J.: Textmountain: accurate scene text detection via instance segmentation. Pattern Recogn. 110, 107336 (2021)

    Article  Google Scholar 

  17. Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 11474–11481 (2020)

    Google Scholar 

  18. Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)

    Google Scholar 

  19. Ghosh, M., et al.: Movie title extraction and script separation using shallow convolution neural network. IEEE Access 9, 125184–125201 (2021)

    Article  Google Scholar 

  20. Zhang, C., et al.: Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10552–10561 (2019)

    Google Scholar 

  21. He, M., et al.: MOST: a multi-oriented scene text detector with localization refinement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8813–8822 (2021)

    Google Scholar 

  22. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)

    Article  Google Scholar 

  23. Wang, J., Hu, X.: Gated recurrent convolution neural network for OCR. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 334–343 (2017)

    Google Scholar 

  24. Liu, W., Chen, C., Wong, K.Y.K., Su, Z., Han, J.: STAR-Net: a spatial attention residue network for scene text recognition. In: BMVC, vol. 2, p. 7 (2016)

    Google Scholar 

  25. Liu, H., Jin, S., Zhang, C.: Connectionist temporal classification with maximum entropy regularization. Adv. Neural. Inf. Process. Syst. 31, 831–841 (2018)

    Google Scholar 

  26. Yin, F., Wu, Y.C., Zhang, X.Y., Liu, C.L.: Scene text recognition with sliding convolutional character models. arXiv preprint arXiv:1709.01727(2017)

  27. Gao, Y., Chen, Y., Wang, J., Tang, M., Lu, H.: Reading scene text with fully convolutional sequence modeling. Neurocomputing 339, 161–170 (2019)

    Article  Google Scholar 

  28. Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)

    Google Scholar 

  29. Shi, B., et al.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)

    Article  Google Scholar 

  30. Luo, C., Jin, L., Sun, Z.: MORAN: a multi-object rectified attention network for scene text recognition. Pattern Recognt. 90, 109–118 (2019)

    Article  Google Scholar 

  31. Lin, Q., Luo, C., Jin, L., Lai, S.: STAN: a sequential transformation attention-based network for scene text recognition. Pattern Recognt. 111, 107692 (2021)

    Article  Google Scholar 

  32. Cheng, Z., et al.: Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5076–5084 (2017)

    Google Scholar 

  33. Lu, N., et al.: MASTER: multi-aspect non-local network for scene text recognition. Pattern Recognt. 117, 107980 (2021)

    Article  Google Scholar 

  34. Wang, T., et al.: Decoupled attention network for text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 12216–12224 (2020)

    Google Scholar 

  35. Yan, R., Peng, L., Xiao, S., Yao, G.: Primitive representation learning for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 284–293 (2021)

    Google Scholar 

  36. Chen, Y., et al.: Graph-based global reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 433–442 (2019)

    Google Scholar 

  37. Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7098–7107 (2021)

    Google Scholar 

  38. Bhunia, A. K., et al.: Joint visual semantic reasoning: Multi-stage decoder for text recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14940–14949 (2021)

    Google Scholar 

  39. Yu, D., et al.: Towards accurate scene text recognition with semantic reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12113–12122 (2020)

    Google Scholar 

  40. Litman, R., et al.: SCATTER: selective context attentional scene text recognizer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11962–11972 (2020)

    Google Scholar 

  41. Hu, W., Cai, X., Hou, J., Yi, S., Lin, Z.: GTC: guided training of CTC towards efficient and accurate scene text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 11005–11012 (2020)

    Google Scholar 

  42. Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) ECCV 2018. LNCS, vol. 11218, pp. 67–83. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_5

  43. Liu, X., et al.: FOTS: fast oriented text spotting with a unified network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5676–5685 (2018)

    Google Scholar 

  44. Feng, W., He, W., Yin, F., Zhang, X.Y., Liu, C.L.: TextDragon: an end-to-end framework for arbitrary shaped text spotting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9076–9085 (2019)

    Google Scholar 

  45. Liao, M., et al.: Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell. (2019)

    Google Scholar 

  46. Wang, H., et al.: All you need is boundary: toward arbitrary-shaped text spotting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 12160–12167 (2020)

    Google Scholar 

  47. Mittal, A., Shivakumara, P., Pal, U., Lu, T., Blumenstein, M.: A new method for detection and prediction of occluded text in natural scene images. Signal Process. Image Commun. 100, 116512 (2022)

    Article  Google Scholar 

  48. Liu, Y., et al.: ABCNet: real-time scene text spotting with adaptive Bezier-curve network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9809–9818 (2020)

    Google Scholar 

  49. Wang, P., et al.: PGNet: real-time arbitrarily-shaped text spotting with point gathering network. arXiv preprint arXiv:2104.05458(2021)

  50. Wang, W., et al.: PAN++: towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Trans. Pattern Anal. Machi. Intell. (2021)

    Google Scholar 

Download references

Acknowledgments

This work was supported by Guangdong Province Key Laboratory of Computational Science at the Sun Yat-sen University (2020B1212060032), the National Natural Science Foundation of China (Grant no. 11971491, 11471012).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Tan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liang, S., Bi, N., Tan, J. (2022). Scene Text Recognition: An Overview. In: El Yacoubi, M., Granger, E., Yuen, P.C., Pal, U., Vincent, N. (eds) Pattern Recognition and Artificial Intelligence. ICPRAI 2022. Lecture Notes in Computer Science, vol 13363. Springer, Cham. https://doi.org/10.1007/978-3-031-09037-0_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-09037-0_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-09036-3

  • Online ISBN: 978-3-031-09037-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics