Abstract
In computer vision, scene text component recognition is an important problem in end-to-end scene text reading systems. It involves two major sub-problems - segmentation of such components into scene characters and classification of segmented characters into known character classes. Significant attention and increasingly focused research efforts are being put forth and reasonable progress in this field has already been made, though a diversity of challenges like background complexity, variety of text appearances, noise, blur, distortion and various other degradation and deformation issues are still left to address. In this paper, we present (i) a detail survey of scene component segmentation and/or recognition methods reported so far in literature, (ii) related datasets available for quantitative evaluation and benchmarking segmentation and/or recognition performance, (iii) comparative results and analysis over the reported methods, and (iv) discussion on open areas to be looked into in order to achieve the desired goal of end-to-end scene text recognition. Moreover, this paper provides an acceptable reference for researcher in the area of scene text components segmentation and recognition.
Similar content being viewed by others
References
Abdali R, Ghani R.F (2019) Robust Character Recognition For Optical And Natural Images Using Deep Learning. Proceedings of IEEE Student Conference on Research and Development, pp. 152–156
Bae JH, Jung KC, Kim JW, Kim HJ (1998) Segmentation of touching characters using an MLP. Pattern Recogn Lett 19(8):701–709
Bai X, Yao C, Liu W (2016) Strokelets: a learned multi-scale mid-level representation for scene text recognition. IEEE Trans Image Process 25(6):2789–2802
Bai F, Cheng Z, Niu Y, Pu S, Zhou S (2018) Edit probability for scene text recognition. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 1508–1516
Barnouti NH, Abomaali M, Al-Mayyahi MHN (2018) An efficient character recognition technique using K-nearest neighbor classifier. Int J Eng Technol 7(4):3148–3153
Bartz C, Yang H, Meinel C (2017) STN-OCR: A single neural network for text detection and text recognition”, arXiv preprint arXiv:1707.0883
Bartz C, Yang H, Meinel C (2018) SEE: towards semi-supervised end-to-end scene text recognition. Proceedings of AAAI:6674–6681
Bissacco A, Cummins M, Netzer Y, Neven H (2013) Photo OCR: reading text in uncontrolled conditions”, Proceedings of International Conference on Computer Vision. IEEE, pp 785–792
Casey RG, Lecolinet E (1996) A survey of methods and strategies in character segmentation. IEEE Trans Pattern Anal Machine Intell IEEE 18(7):690–706
Chekol B, Celebi N, TAŞCI T (2019) Segmented character recognition using curvaturebased global image feature. Turkish J Electrical Eng Comput Sci 27(5):3804–3814
Chen D, Odobez JM, Bourlard H (2004) Text detection and recognition in images and video frames. Pattern Recogn 37(3):595–608
Chen X, Wang T, Zhu Y, Jin L, Luo C (2020) Adaptive embedding gate for attention-based scene text recognition. Neurocomput Elsevier 381:261–271
de Campos TE, Babu BR, Varma M (2009) Character recognition in natural images. Proceeding of the International Conference on Computer Vision Theory and Applications (VISAPP), pp 273–280
Du X, Ma T, Y. Zheng, H. Ye, X. Wu, L. He (2020), “Scene Text Recognition with Temporal Convolutional Encoder,” Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2383–2387
Esmaile MF, Abdulredaa E (2018) Optical character recognition using active contour segmentation. J Eng 24(1):146–158
Fabrizio J, Marcotegui B, Cord M (2009) Text segmentation in natural scenes using Toggle-Mapping. Proceeding of 16th International Conference on Image Processing, pp 2373–2376
Francis LM, Sreenath N (2019) Robust scene text recognition: Using manifold regularized Twin-Support Vector Machine. J King Saud Univ - Comput Inf Scie. https://doi.org/10.1016/j.jksuci.2019.01.013
Ghosh SK, Valveny E, Bagdanov AD (2017) Visual attention models for scene textrecognition. Proceedings of 14th IAPR International Conference on Document Analysis and Recognition, pp 943–948
Gómez L, Karatzas D (2017) Textproposals: a text-specific selective search algorithm for word spotting in the wild. Pattern Recogn 70:60–74
Guo Q, Wang F, Lei J, Tu D, Li G (2016) Convolutional feature learning and hybrid CNN-HMM for scene number recognition. Neuro-Comput J 184:78–90
He P, Huang W, Qiao Y, Loy CC, Tang X (2016) Reading scene text in deep convolutional sequences, Proceeding of Association for the Advancement of Artificial Intelligence, pp 1–8, AAAI
Hong S, Kim D, Choi MK (2020) Memory-efficient models for scene text recognition via neural architecture search. Proc IEEE Winter Conf Appl Comput Vision Workshops:183–191
Iwamura M (2018) Advances of Scene Text Datasets. arXiv:1812.05219
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Deep structured output learning for unconstrained text recognition. Proceedings of International Conference on Learning Representations, pp 1–10
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition. Proceedings of Workshop on Neural Information Processing Systems, pp 1–10
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20
KAIST Scene Text Database (2019), http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_Text_ Database. Accessed 10 Dec 2019
Kang C, Kim G, Yoo S (2017) Detection and recognition of text embedded in online images via neural context models. Proc Proceed Associate artificial intelligence:4103–4110
Karatzas D, Shafait F, Uchida S, Iwamura M (2013) ICDAR 2013 robust reading competition”, Proceedings of 12th International Conference on Document Analysis and Recognition. IEEE, pp 1484–1493
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, Shafait F (2015) ICDAR 2015 competition on robust reading. Proceedings of 13th International Conference on Document Analysis and Recognition, IEEE, pp 1156–1160
S. H. Katper, A.R Gilal, A. Waqas, A. Alshanqiti, A. Alsughayyir and J. Jaafar, “Deep neural networks combined with STN for multi-oriented text detection and recognition,” Int J adv Computer Sci Appl, vol. 11, no. 4, pp.178–185, 2020.
Lee C, Osindero S (2016) Recursive recurrent nets with attention modeling for OCR in the Wild”, Proceeding of conference on computer vision and pattern recognition, IEEE, pp 2231–2239
Liao M, Shi B, Bai X (2018) TextBoxes ++ a single-shot oriented scene text detector. IEEE Trans Image Process 27:3676–3690
Liao M, Zhang J, Wan Z, Xie F, Liang J, Lyu P, Yao C, Bai X (2019) Scene text recognition from two-dimensional perspective. Proc AAAI Conf Artificial Intell 33:8714–8721
Lin H, Yang P, Zhang F (2019) Review of scene text detection and recognition. Archieves of Computational Methods in Engineering, 27(2):433–454
Litman R, Anschel O, Tsiper S, Litman R, Mazor S, Manmatha R (2020) SCATTER: selective context attentional scene text recognizer. Proc IEEE/CVF Conf Comput Vision Pattern Recog:11962–11972
Liu H, Bir B (2019) Pose-guided R-CNN for Jersey number recognition in sports", Proceedings of Conference on Computer Vision and Pattern Recognition Workshops, IEEE
Liu Z, Li Y, Ren F, Goh WL, Yu H (2018) Squeezedtext: a real-time scene text recognition by binary convolutional encoder-decoder network, Proceeding of Thirty-Second AAAI Conference on Artificial Intelligence, pp 7194–7201
Liu W, Chaofeng C, Wong K (2018) SAFE: Scale Aware Feature Encoder for Scene Text Recognition. Proceedings of Asian Conference on Computer Vision. Springer, pp 196–211
Liu X, Meng G, Pan C (2019) Scene text detection and recognition with advances in deep learning: a survey. Int J Document Anal Recog (IJDAR) 22(2):143–162
Long S, He X, Ya C (2018) Scene Text Detection and Recognition: The Deep Learning Era”, Int J Comput Vis. https://doi.org/10.1007/s11263-020-01369-0
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competitions. Proceedings of International Conference on Document Analysis and Recognition, IEEE, pp 682–687
Lue HT, Wen MG, Cheng HY, Fan KC, Lin CW, Yu CC (2010) A novel character segmentation method for text images captured by cameras. Electron Telecommun Res Inst (ETRI) J 32(5):729–739
Luo C, Jin L, Sun Z (2019) Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn 90:109–118
Mancas-Thillou C, Gosselin B (2006) Character segmentation-by-recognition using log-Gabor filters, Proceeding of 18th International Conference on Pattern Recognition (ICPR'06). IEEE 2:901–904
Mishra A, Alahari K, Jawahar CV (2012) Scene text recognition using higher order language priors, Proceedings of British Machine Vision Conference, pp 127.1–127.11
Mishra A, Alahari K, Jawahar CV (2012) Top-Down and Bottom-Up Cues for Scene Text Recognition. Proceedings of International Conference on Computer Vision and Pattern Recognition. IEEE, pp 2687–2694
Mollah AF, Basu S, Nasipuri M (2011) Segmentation of camera captured business card images for mobile devices. Int J Comput Sci Appl 1(1):33–37
Moysset B, Kermorvant C, Wolf C (2017) Full-Page Text Recognition Learning Where to Start and When to Stop, Proceedings of 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE 1:871–876
Nayef N, Patel Y, Busta M, Chowdhury PN, Karatzas D, Khlif W, Matas J, Pal U, Burie JC, Liu CL, Ogier JM (2019) ICDAR2019 Robust reading challenge on multi-lingual scene text detection and recognition–RRC-MLT-2019”. arXiv preprint arXiv:1907.00945
Negishi K, Iwamura M, Omachi S, Aso H (2005) Isolated character recognition by searching features in scene images, Proc First Int Workshop Camera-Based Document Anal Recog:140–147
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning, In NIPS workshop on deep learning and unsupervised feature learning
Neumann L, Matas J (2010) A method for text localization and recognition in real-world images”, Proceedings of Asian Conference on Computer Vision. Springer, pp 770–783
Neumann L, Matas J (2015) Real-time lexicon-free scene text localization and recognition. IEEE Trans Pattern Anal Mach Intell 38(9):1872–1885
Noola DA, Kodabagi MM (2015) An approach to extract line, word and character from scene text image. Int J Emerg Technol Comput Sci Electron 14(2):916–922
Patel C, Patel A, Shah D (2013) A review of character segmentation method. Int J Current Eng Technol 3(5):2075–2078
Pruthi D, Dhingra B, Lipton ZC (2019) Combating adversarial misspellings with robust word recognition. Proc 57th Annual Meeting Assoc Computation Linguistics, pp 5582–5591
Qiao Z, Zhou Y, Yang D, Zhou Y, Wang W (2020) SEED: semantics enhanced encoder-decoder framework for scene text recognition. Proc IEEE/CVF Conf Comput Vis Pattern Recog:13528–13537
Rong X, Yi C, Tian Y (2017) Unambiguous text localization and retrieval for cluttered scenes. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 5494–5502
Roy S, Shivakumara P, Roy PP, Tan CL (2012) Wavelet-gradient-fusion for video text binarization. Proceedings of International Conference on Patten Recognition. IEEE, pp 3300–3303
Roy P, Bhattacharya S, Ghosh S, Pal U (2020) STEFANN: scene text editor using font adaptive neural network. Proc IEEE/CVF Conf Comput Vis Pattern Recog, pp 13228–13237
Saidane Z, Garcia C (2007) Robust Binarization for Video Text Recognition, Proceedings of International Conference on Document Analysis and Recognition. IEEE 2:874–879
Sambyal N, Abrol P (2016) Connected component based English character set segmentation. Int J Scientific Tech Advancements 2(4):303–306
Saric M (2017) Scene text segmentation using low variation extremal regions and sorting based character grouping. Int J Neurocomput Elsevier 266:56–65
Sarshogh MR, Hines EK, (2019) A Multitask Network for Localization and Recognition of Text in Images, arXiv preprint arXiv:1906.09266
Seeri SV, Pujari JD, Hiremath PS (2016) Text Localization and Character Extraction in Natural Scene Images using Contourlet Transform and SVM Classifier. Int J Image, Graphics Signal Process 8(5):36–66
Sengupta P, Mollah AF (2019) Scene Text Component Segmentation Using Hierarchical Distance Slicing. International Journal of Computational Intelligence & IoT 2(1) Elsevier:336–339
Shi C, Wang C, Xiao B, Zhang Y, Gao S, Z. Zhang 2013 Scene text recognition using part-based tree-structured character detection. Proc IEEE Conference Comput Vis Pattern Recog, pp. 2961–2968
Shi B, Yao, C. Zhang, Guo S (2015), “Automatic script identification in the wild. Pro Int Con Document Anal Recog, pp 531–535
Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4168–4176
Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304
Shi B, Yang M, Wang X, Lyu P, Bai X, Yao C (2018) Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans Pattern Anal Mach Intell 31(11):855–868
Shruthi V, Sunitha R (2015) Text detection and character segmentation from natural scene images based using graph cut Labelling. Int J Eng Comput Sci 4(5):12123–12126
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition arXiv preprint arXiv:1409.1556
Su B, Lu S (2017) Accurate recognition of words in scenes without character segmentation using recurrent neural network, Pattern Recognition, vol. 63, Elsevier, pp 397–405
Tsai YS, Hsieh YY, Ho CH, Chang YC, Chang YY, Lin HJ, Chuang JH (2018) Rule-based optical character recognition for serial number on Renminbi banknote. Proc Electron Imag, pp 308.1–308.6
Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition”. Proceedings of International Conference on Computer Vision. IEEE, pp 1457–1464
Wang T, Wu D, Coates A, Ng A (2012) End-to-end text recognition with convolutional neural networks. Proc Int Conf Pattern Recog:3304–3308
Xu X, Chen J, Xiao J, Gao L, Shen F, Shen HT (2020) What machines see is not what they get: fooling scene text recognition models with adversarial text images Proc IEEE/CVF Conf Comput Vision Pattern Recog, pp 12304–12314
Yang C, Yin XC, Li Z, Wu J, Guo C, Wang H, Xiao L (2017) AdaDNNs: adaptive ensemble of deep neural networks for scene text recognition. arXiv preprint arXiv:1710.03425
Yang M, Guan Y, Liao M, He X, Bian K, Bai S, Yao C, Bai X (2019) Symmetry-constrained rectification network for scene text recognition. Proc IEEE Int Conf Comput Vis, pp 9147–9156
Yao C, Bai X, Liu W (2014) A unified framework for multioriented text detection and recognition. IEEE Trans Image Process 23:4737–4749
Yi C, Tian Y (2014) Scene text recognition in mobile applications by character descriptor and structure configuration. IEEE Trans Image Process 23(7):2972–2982
Yousef M, Hussain KF, Mohammed US (2020) Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. Int J Patt Recog 14(8):107–482
Zhan F, Lu S, (2019) Esir: End-to-end scene text recognition via iterative image rectification. Proceedings of the Conference on Computer Vision and Pattern Recognition, pp. 2059–2068, IEEE
Zhan F, Zhu H, Lu S (2019) )Scene text synthesis for efficient and effective deep network training, arXiv preprint arXiv:1901.09193
Zhang Y, Zhang C (2003) A new algorithm for character segmentation of license plate. Proceeding of Intelligent Vehicles Symposium. IEEE, pp 106–109
Zhang Y, Shuai N, Wenju L, Xing X, Dongxiang Z, Shen TH (2019) Sequence-To-Sequence Domain Adaptation Network for Robust Text Image Recognition. Proc Conf Comput Vis Pattern Recognition, 2740–2749, IEEE
Zhang H, Yao Q, Yang M, Xu Y, Bai X. (2020) Efficient Backbone Search for Scene Text Recognition,” arXiv preprint arXiv:2003.06567
Zhou Z, Li L, Tan CL (2010) Edge based Binarization of video text images. Proceedings of International Conference on Pattern Recognition, pp 133–136
Zuo LQ, Sun HM, Mao QC, Rong Q, Jia RS (2019) Natural Scene Text Recognition Based on Encoder-Decoder Framework. IEEE Access 7:62616–62623
Acknowledgements
The authors are thankful to the Department of Computer Science and Engineering of Aliah University, Kolkata, India for providing every kind of support for carrying out this research work. P. Sengupta is grateful to Dept. of MA & ME, Govt. of West Bengal for providing Swami Vivekananda Merit cum Means Fellowship.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sengupta, P., Mollah, A.F. Journey of scene text components recognition: Progress and open issues. Multimed Tools Appl 80, 6079–6104 (2021). https://doi.org/10.1007/s11042-020-09862-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09862-x