Skip to main content
Log in

Journey of scene text components recognition: Progress and open issues

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In computer vision, scene text component recognition is an important problem in end-to-end scene text reading systems. It involves two major sub-problems - segmentation of such components into scene characters and classification of segmented characters into known character classes. Significant attention and increasingly focused research efforts are being put forth and reasonable progress in this field has already been made, though a diversity of challenges like background complexity, variety of text appearances, noise, blur, distortion and various other degradation and deformation issues are still left to address. In this paper, we present (i) a detail survey of scene component segmentation and/or recognition methods reported so far in literature, (ii) related datasets available for quantitative evaluation and benchmarking segmentation and/or recognition performance, (iii) comparative results and analysis over the reported methods, and (iv) discussion on open areas to be looked into in order to achieve the desired goal of end-to-end scene text recognition. Moreover, this paper provides an acceptable reference for researcher in the area of scene text components segmentation and recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Abdali R, Ghani R.F (2019) Robust Character Recognition For Optical And Natural Images Using Deep Learning. Proceedings of IEEE Student Conference on Research and Development, pp. 152–156

  2. Bae JH, Jung KC, Kim JW, Kim HJ (1998) Segmentation of touching characters using an MLP. Pattern Recogn Lett 19(8):701–709

    Article  Google Scholar 

  3. Bai X, Yao C, Liu W (2016) Strokelets: a learned multi-scale mid-level representation for scene text recognition. IEEE Trans Image Process 25(6):2789–2802

    Article  MathSciNet  Google Scholar 

  4. Bai F, Cheng Z, Niu Y, Pu S, Zhou S (2018) Edit probability for scene text recognition. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 1508–1516

  5. Barnouti NH, Abomaali M, Al-Mayyahi MHN (2018) An efficient character recognition technique using K-nearest neighbor classifier. Int J Eng Technol 7(4):3148–3153

    Google Scholar 

  6. Bartz C, Yang H, Meinel C (2017) STN-OCR: A single neural network for text detection and text recognition”, arXiv preprint arXiv:1707.0883

  7. Bartz C, Yang H, Meinel C (2018) SEE: towards semi-supervised end-to-end scene text recognition. Proceedings of AAAI:6674–6681

  8. Bissacco A, Cummins M, Netzer Y, Neven H (2013) Photo OCR: reading text in uncontrolled conditions”, Proceedings of International Conference on Computer Vision. IEEE, pp 785–792

  9. Casey RG, Lecolinet E (1996) A survey of methods and strategies in character segmentation. IEEE Trans Pattern Anal Machine Intell IEEE 18(7):690–706

    Article  Google Scholar 

  10. Chekol B, Celebi N, TAŞCI T (2019) Segmented character recognition using curvaturebased global image feature. Turkish J Electrical Eng Comput Sci 27(5):3804–3814

    Article  Google Scholar 

  11. Chen D, Odobez JM, Bourlard H (2004) Text detection and recognition in images and video frames. Pattern Recogn 37(3):595–608

    Article  Google Scholar 

  12. Chen X, Wang T, Zhu Y, Jin L, Luo C (2020) Adaptive embedding gate for attention-based scene text recognition. Neurocomput Elsevier 381:261–271

    Article  Google Scholar 

  13. de Campos TE, Babu BR, Varma M (2009) Character recognition in natural images. Proceeding of the International Conference on Computer Vision Theory and Applications (VISAPP), pp 273–280

  14. Du X, Ma T, Y. Zheng, H. Ye, X. Wu, L. He (2020), “Scene Text Recognition with Temporal Convolutional Encoder,” Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2383–2387

  15. Esmaile MF, Abdulredaa E (2018) Optical character recognition using active contour segmentation. J Eng 24(1):146–158

    Google Scholar 

  16. Fabrizio J, Marcotegui B, Cord M (2009) Text segmentation in natural scenes using Toggle-Mapping. Proceeding of 16th International Conference on Image Processing, pp 2373–2376

  17. Francis LM, Sreenath N (2019) Robust scene text recognition: Using manifold regularized Twin-Support Vector Machine. J King Saud Univ - Comput Inf Scie. https://doi.org/10.1016/j.jksuci.2019.01.013

  18. Ghosh SK, Valveny E, Bagdanov AD (2017) Visual attention models for scene textrecognition. Proceedings of 14th IAPR International Conference on Document Analysis and Recognition, pp 943–948

  19. Gómez L, Karatzas D (2017) Textproposals: a text-specific selective search algorithm for word spotting in the wild. Pattern Recogn 70:60–74

  20. Guo Q, Wang F, Lei J, Tu D, Li G (2016) Convolutional feature learning and hybrid CNN-HMM for scene number recognition. Neuro-Comput J 184:78–90

    Google Scholar 

  21. He P, Huang W, Qiao Y, Loy CC, Tang X (2016) Reading scene text in deep convolutional sequences, Proceeding of Association for the Advancement of Artificial Intelligence, pp 1–8, AAAI

  22. Hong S, Kim D, Choi MK (2020) Memory-efficient models for scene text recognition via neural architecture search. Proc IEEE Winter Conf Appl Comput Vision Workshops:183–191

  23. Iwamura M (2018) Advances of Scene Text Datasets. arXiv:1812.05219

  24. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Deep structured output learning for unconstrained text recognition. Proceedings of International Conference on Learning Representations, pp 1–10

  25. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition. Proceedings of Workshop on Neural Information Processing Systems, pp 1–10

  26. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20

    Article  MathSciNet  Google Scholar 

  27. KAIST Scene Text Database (2019), http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_Text_ Database. Accessed 10 Dec 2019

  28. Kang C, Kim G, Yoo S (2017) Detection and recognition of text embedded in online images via neural context models. Proc Proceed Associate artificial intelligence:4103–4110

  29. Karatzas D, Shafait F, Uchida S, Iwamura M (2013) ICDAR 2013 robust reading competition”, Proceedings of 12th International Conference on Document Analysis and Recognition. IEEE, pp 1484–1493

  30. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, Shafait F (2015) ICDAR 2015 competition on robust reading. Proceedings of 13th International Conference on Document Analysis and Recognition, IEEE, pp 1156–1160

  31. S. H. Katper, A.R Gilal, A. Waqas, A. Alshanqiti, A. Alsughayyir and J. Jaafar, “Deep neural networks combined with STN for multi-oriented text detection and recognition,” Int J adv Computer Sci Appl, vol. 11, no. 4, pp.178–185, 2020.

  32. Lee C, Osindero S (2016) Recursive recurrent nets with attention modeling for OCR in the Wild”, Proceeding of conference on computer vision and pattern recognition, IEEE, pp 2231–2239

  33. Liao M, Shi B, Bai X (2018) TextBoxes ++ a single-shot oriented scene text detector. IEEE Trans Image Process 27:3676–3690

    Article  MathSciNet  Google Scholar 

  34. Liao M, Zhang J, Wan Z, Xie F, Liang J, Lyu P, Yao C, Bai X (2019) Scene text recognition from two-dimensional perspective. Proc AAAI Conf Artificial Intell 33:8714–8721

    Google Scholar 

  35. Lin H, Yang P, Zhang F (2019) Review of scene text detection and recognition. Archieves of Computational Methods in Engineering, 27(2):433–454

  36. Litman R, Anschel O, Tsiper S, Litman R, Mazor S, Manmatha R (2020) SCATTER: selective context attentional scene text recognizer. Proc IEEE/CVF Conf Comput Vision Pattern Recog:11962–11972

  37. Liu H, Bir B (2019) Pose-guided R-CNN for Jersey number recognition in sports", Proceedings of Conference on Computer Vision and Pattern Recognition Workshops, IEEE

  38. Liu Z, Li Y, Ren F, Goh WL, Yu H (2018) Squeezedtext: a real-time scene text recognition by binary convolutional encoder-decoder network, Proceeding of Thirty-Second AAAI Conference on Artificial Intelligence, pp 7194–7201

  39. Liu W, Chaofeng C, Wong K (2018) SAFE: Scale Aware Feature Encoder for Scene Text Recognition. Proceedings of Asian Conference on Computer Vision. Springer, pp 196–211

  40. Liu X, Meng G, Pan C (2019) Scene text detection and recognition with advances in deep learning: a survey. Int J Document Anal Recog (IJDAR) 22(2):143–162

    Article  Google Scholar 

  41. Long S, He X, Ya C (2018) Scene Text Detection and Recognition: The Deep Learning Era”, Int J Comput Vis. https://doi.org/10.1007/s11263-020-01369-0

  42. Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competitions. Proceedings of International Conference on Document Analysis and Recognition, IEEE, pp 682–687

  43. Lue HT, Wen MG, Cheng HY, Fan KC, Lin CW, Yu CC (2010) A novel character segmentation method for text images captured by cameras. Electron Telecommun Res Inst (ETRI) J 32(5):729–739

    Google Scholar 

  44. Luo C, Jin L, Sun Z (2019) Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn 90:109–118

    Article  Google Scholar 

  45. Mancas-Thillou C, Gosselin B (2006) Character segmentation-by-recognition using log-Gabor filters, Proceeding of 18th International Conference on Pattern Recognition (ICPR'06). IEEE 2:901–904

  46. Mishra A, Alahari K, Jawahar CV (2012) Scene text recognition using higher order language priors, Proceedings of British Machine Vision Conference, pp 127.1–127.11

  47. Mishra A, Alahari K, Jawahar CV (2012) Top-Down and Bottom-Up Cues for Scene Text Recognition. Proceedings of International Conference on Computer Vision and Pattern Recognition. IEEE, pp 2687–2694

  48. Mollah AF, Basu S, Nasipuri M (2011) Segmentation of camera captured business card images for mobile devices. Int J Comput Sci Appl 1(1):33–37

    Google Scholar 

  49. Moysset B, Kermorvant C, Wolf C (2017) Full-Page Text Recognition Learning Where to Start and When to Stop, Proceedings of 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE 1:871–876

  50. Nayef N, Patel Y, Busta M, Chowdhury PN, Karatzas D, Khlif W, Matas J, Pal U, Burie JC, Liu CL, Ogier JM (2019) ICDAR2019 Robust reading challenge on multi-lingual scene text detection and recognition–RRC-MLT-2019”. arXiv preprint arXiv:1907.00945

  51. Negishi K, Iwamura M, Omachi S, Aso H (2005) Isolated character recognition by searching features in scene images, Proc First Int Workshop Camera-Based Document Anal Recog:140–147

  52. Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning, In NIPS workshop on deep learning and unsupervised feature learning

  53. Neumann L, Matas J (2010) A method for text localization and recognition in real-world images”, Proceedings of Asian Conference on Computer Vision. Springer, pp 770–783

  54. Neumann L, Matas J (2015) Real-time lexicon-free scene text localization and recognition. IEEE Trans Pattern Anal Mach Intell 38(9):1872–1885

    Article  Google Scholar 

  55. Noola DA, Kodabagi MM (2015) An approach to extract line, word and character from scene text image. Int J Emerg Technol Comput Sci Electron 14(2):916–922

  56. Patel C, Patel A, Shah D (2013) A review of character segmentation method. Int J Current Eng Technol 3(5):2075–2078

    Google Scholar 

  57. Pruthi D, Dhingra B, Lipton ZC (2019) Combating adversarial misspellings with robust word recognition. Proc 57th Annual Meeting Assoc Computation Linguistics, pp 5582–5591

  58. Qiao Z, Zhou Y, Yang D, Zhou Y, Wang W (2020) SEED: semantics enhanced encoder-decoder framework for scene text recognition. Proc IEEE/CVF Conf Comput Vis Pattern Recog:13528–13537

  59. Rong X, Yi C, Tian Y (2017) Unambiguous text localization and retrieval for cluttered scenes. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 5494–5502

  60. Roy S, Shivakumara P, Roy PP, Tan CL (2012) Wavelet-gradient-fusion for video text binarization. Proceedings of International Conference on Patten Recognition. IEEE, pp 3300–3303

  61. Roy P, Bhattacharya S, Ghosh S, Pal U (2020) STEFANN: scene text editor using font adaptive neural network. Proc IEEE/CVF Conf Comput Vis Pattern Recog, pp 13228–13237

  62. Saidane Z, Garcia C (2007) Robust Binarization for Video Text Recognition, Proceedings of International Conference on Document Analysis and Recognition. IEEE 2:874–879

  63. Sambyal N, Abrol P (2016) Connected component based English character set segmentation. Int J Scientific Tech Advancements 2(4):303–306

    Google Scholar 

  64. Saric M (2017) Scene text segmentation using low variation extremal regions and sorting based character grouping. Int J Neurocomput Elsevier 266:56–65

    Article  Google Scholar 

  65. Sarshogh MR, Hines EK, (2019) A Multitask Network for Localization and Recognition of Text in Images, arXiv preprint arXiv:1906.09266

  66. Seeri SV, Pujari JD, Hiremath PS (2016) Text Localization and Character Extraction in Natural Scene Images using Contourlet Transform and SVM Classifier. Int J Image, Graphics Signal Process 8(5):36–66

  67. Sengupta P, Mollah AF (2019) Scene Text Component Segmentation Using Hierarchical Distance Slicing. International Journal of Computational Intelligence & IoT 2(1) Elsevier:336–339

    Google Scholar 

  68. Shi C, Wang C, Xiao B, Zhang Y, Gao S, Z. Zhang 2013 Scene text recognition using part-based tree-structured character detection. Proc IEEE Conference Comput Vis Pattern Recog, pp. 2961–2968

  69. Shi B, Yao, C. Zhang, Guo S (2015), “Automatic script identification in the wild. Pro Int Con Document Anal Recog, pp 531–535

  70. Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4168–4176

  71. Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304

    Article  Google Scholar 

  72. Shi B, Yang M, Wang X, Lyu P, Bai X, Yao C (2018) Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans Pattern Anal Mach Intell 31(11):855–868

    Google Scholar 

  73. Shruthi V, Sunitha R (2015) Text detection and character segmentation from natural scene images based using graph cut Labelling. Int J Eng Comput Sci 4(5):12123–12126

    Google Scholar 

  74. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition arXiv preprint arXiv:1409.1556

  75. Su B, Lu S (2017) Accurate recognition of words in scenes without character segmentation using recurrent neural network, Pattern Recognition, vol. 63, Elsevier, pp 397–405

  76. Tsai YS, Hsieh YY, Ho CH, Chang YC, Chang YY, Lin HJ, Chuang JH (2018) Rule-based optical character recognition for serial number on Renminbi banknote. Proc Electron Imag, pp 308.1–308.6

  77. Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition”. Proceedings of International Conference on Computer Vision. IEEE, pp 1457–1464

  78. Wang T, Wu D, Coates A, Ng A (2012) End-to-end text recognition with convolutional neural networks. Proc Int Conf Pattern Recog:3304–3308

  79. Xu X, Chen J, Xiao J, Gao L, Shen F, Shen HT (2020) What machines see is not what they get: fooling scene text recognition models with adversarial text images Proc IEEE/CVF Conf Comput Vision Pattern Recog, pp 12304–12314

  80. Yang C, Yin XC, Li Z, Wu J, Guo C, Wang H, Xiao L (2017) AdaDNNs: adaptive ensemble of deep neural networks for scene text recognition. arXiv preprint arXiv:1710.03425

  81. Yang M, Guan Y, Liao M, He X, Bian K, Bai S, Yao C, Bai X (2019) Symmetry-constrained rectification network for scene text recognition. Proc IEEE Int Conf Comput Vis, pp 9147–9156

  82. Yao C, Bai X, Liu W (2014) A unified framework for multioriented text detection and recognition. IEEE Trans Image Process 23:4737–4749

    Article  MathSciNet  Google Scholar 

  83. Yi C, Tian Y (2014) Scene text recognition in mobile applications by character descriptor and structure configuration. IEEE Trans Image Process 23(7):2972–2982

    Article  MathSciNet  Google Scholar 

  84. Yousef M, Hussain KF, Mohammed US (2020) Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. Int J Patt Recog 14(8):107–482

    Google Scholar 

  85. Zhan F, Lu S, (2019) Esir: End-to-end scene text recognition via iterative image rectification. Proceedings of the Conference on Computer Vision and Pattern Recognition, pp. 2059–2068, IEEE

  86. Zhan F, Zhu H, Lu S (2019) )Scene text synthesis for efficient and effective deep network training, arXiv preprint arXiv:1901.09193

  87. Zhang Y, Zhang C (2003) A new algorithm for character segmentation of license plate. Proceeding of Intelligent Vehicles Symposium. IEEE, pp 106–109

  88. Zhang Y, Shuai N, Wenju L, Xing X, Dongxiang Z, Shen TH (2019) Sequence-To-Sequence Domain Adaptation Network for Robust Text Image Recognition. Proc Conf Comput Vis Pattern Recognition, 2740–2749, IEEE

  89. Zhang H, Yao Q, Yang M, Xu Y, Bai X. (2020) Efficient Backbone Search for Scene Text Recognition,” arXiv preprint arXiv:2003.06567

  90. Zhou Z, Li L, Tan CL (2010) Edge based Binarization of video text images. Proceedings of International Conference on Pattern Recognition, pp 133–136

  91. Zuo LQ, Sun HM, Mao QC, Rong Q, Jia RS (2019) Natural Scene Text Recognition Based on Encoder-Decoder Framework. IEEE Access 7:62616–62623

    Article  Google Scholar 

Download references

Acknowledgements

The authors are thankful to the Department of Computer Science and Engineering of Aliah University, Kolkata, India for providing every kind of support for carrying out this research work. P. Sengupta is grateful to Dept. of MA & ME, Govt. of West Bengal for providing Swami Vivekananda Merit cum Means Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayatullah Faruk Mollah.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sengupta, P., Mollah, A.F. Journey of scene text components recognition: Progress and open issues. Multimed Tools Appl 80, 6079–6104 (2021). https://doi.org/10.1007/s11042-020-09862-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09862-x

Keywords

Navigation