Abstract
Text recognition in the wild is a challenging task in the field of computer vision and machine learning. Existing optical character recognition engines cannot perform well in the natural scene. In this context, deep learning models have emerged as a powerful state-of-the-art technique in the classification and recognition process. This study proposes a new Convolutional Neural Network based system for scene text reading. We investigate how to combine the character recognition module followed by the word recognition module to achieve the overall system goal. The first module analyzes characters within multi-scale images by relaying on the power of the convolutional network and the fully connected network for character recognition. The second module relies on the Viterbi search to find the closest word to a given characters sequence. For the sake of more precision, a bigram based linguistic module is applied. The proposed system achieves the state-of-the-art performance on three standard scene text recognition benchmarks: chars74k, ICDAR 2003 and ICDAR 2013. In particular, this performance is proven on both of character and word recognition accuracy as well as speed aspects via a comparative study between different deep learning architectures.
Similar content being viewed by others
References
Ahmed SB, Naz S, Razzak MI, Yousaf R (2017) Deep learning based isolated arabic scene character recognition. In: ASAR, Nancy, France, April 3-5, 2017, pp 46–51
Ahmed SB, Razzak MI, Yusof R (2020) Text in a wild and its challenges. Springer, Singapore, pp 13–30. https://doi.org/10.1007/978-981-15-1297-1-2
Almazán J, Gordo A, Fornés A, Valveny E (2014) Word spotting and recognition with embedded attributes. PAMI 36(12):2552–2566
Altwaijry N, A.T.I. (2020) Arabic handwriting recognition system using convolutional neural network. Neural Comput Applic (2020). https://doi.org/10.1007/s00521-020-05070-8
Arafat SY, Iqbal MJ (2020) Urdu-text detection and recognition in natural scene images using deep learning. IEEE Access 8:96787–96803. https://doi.org/10.1109/ACCESS.2020.2994214
Bahi HE, Zatni A (2019) Text recognition in document images obtained by a smartphone based on deep convolutional and recurrent neural network. Multimed Tools Appl 78 (18):26453–26481. https://doi.org/10.1007/s11042-019-07855-z
Bai X, Yao C, Liu W (2016) Strokelets: A learned multi-scale mid-level representation for scene text recognition. TIP 25(6):2789–2802
Bhunia AK, Kumar G, Roy PP, Balasubramanian R, Pal U (2018) Text recognition in scene image and video frame using color channel selection. Multimed Tools Appl 77(7):8551–8578. https://doi.org/10.1007/s11042-017-4750-6
Bigorda LG, Karatzas D (2016) A fine-grained approach to scene text script identification. In: DAS, Santorini, Greece, April 11-14, 2016, pp 192–197
Bissacco A, Cummins M, Netzer Y, Neven H (2013) Photoocr: Reading text in uncontrolled conditions. In: ICCV, Sydney, Australia, December 1-8, 2013, pp 785–792
Borisyuk F, Gordo A, Sivakumar V (2018) Rosetta: Large scale system for text detection and recognition in images. In: KDD, London, UK, August 19-23, 2018, pp 71–79
Chan T, Jia K, Gao S, Lu J, Zeng Z, Ma Y (2015) Pcanet: A simple deep learning baseline for image classification? TIP 24(12):5017–5032
Chang C, Lin C (2001) Training nu-support vector classifiers: Theory and algorithms. Neural Comput 13(9):2119–2147
Chen X, Wang T, Zhu Y, Jin L, Luo C (2020) Adaptive embedding gate for attention-based scene text recognition. Neurocomputing 381:261–271. https://doi.org/10.1016/j.neucom.2019.11.049
Coates A, Carpenter B, Case C, Satheesh S, Suresh B, Wang T, Wu DJ, Ng AY (2011) Text detection and character recognition in scene images with unsupervised feature learning. In: ICDAR, Beijing, China, September 18-21 2011, pp 440–445
de Campos TE, Babu BR, Varma M (2009) Character recognition in natural images. In: VISAPP, Portugal, February 5-8, 2009, vol 2, pp 273–280
Elagouni K, Garcia C, Mamalet F, Sébillot P. (2012) Combining multi-scale character recognition and linguistic knowledge for natural scene text OCR. In: DAS, Queenslands, Australia, March 27-29, 2012, pp 120–124
Ghifary M, Kleijn WB, Zhang M, Balduzzi D (2015) Domain generalization for object recognition with multi-task autoencoders. In: ICCV, Santiago, Chile, December 7-13, 2015, pp 2551–2559
Goel V, Mishra A, Alahari K, Jawahar CV (2013) Whole is greater than sum of parts: Recognizing scene text words. In: ICDAR 2013, Washington, DC, USA, August 25-28 2013, pp 398–402
Gordo A (2015) Supervised mid-level features for word image representation. In: CVPR, Boston, MA, USA, June 7-12 2015, pp 2956–2964
Guemri K, Drira F, Walha R, Alimi AM, Lebourgeois F (2017) Edge based blind single image deblurring with sparse priors. In: VISIGRAPP - Volume 4: VISAPP, Porto Portugal, pp 174–181
Hassaballah M, Awad AI (2020) Deep learning in computer vision: Principles and applications. CRC Press Taylor and Francis Group. https://doi.org/10.1201/9781351003827
Hassaballah M, Hosny K (2019) Recent advances in computer vision: Theories and applications. Springer International Publishing, New York. https://doi.org/10.1007/978-3-030-03000-1
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. IJCV 116(1):1–20
Jaderberg M, Vedaldi A, Zisserman A (2014). In: ECCV, Switzerland, September 6-12, 2014, Part IV, pp 512–528
Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda LG, Mestre SR, Mas J, Mota DF, Almazán J., de las Heras L (2013) ICDAR 2013 robust reading competition. In: ICDAR, Washington, DC, USA, August 25-28, 2013, pp 1484–1493
LeCun Y, Chopra S, Ranzato M, Huang FJ (2007) Energy-based models in document recognition and computer vision. In: ICDAR 23-26 September, Curitiba, Paraná Brazil. https://doi.org/10.1109/ICDAR.2007.107, pp 337–341
Liao M, Shi B, Bai X (2018) Textboxes++: A single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690. https://doi.org/10.1109/TIP.2018.2825107
Liu X, Kawanishi T, Wu X, Kashino K (2016) Scene text recognition with CNN classifier and wfst-based word labeling. In: ICPR. https://doi.org/10.1109/ICPR.2016.7900259. IEEE, pp 3999–4004
Liu X, Kawanishi T, Wu X, Kashino K (2016) Scene text recognition with high performance CNN classifier and efficient word inference. In: ICASSP, Shanghai, China, March 20-25 2016, pp 1322–1326
Long S, He X, Yao C (2018) Scene text detection and recognition: The deep learning era. CoRR abs/181104256
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competitions. In: ICDAR, 2-Volume Set, 3-6 August 2003, Scotland UK, pp 682–687
Mallek A, Drira F, Walha R, Alimi AM, Lebourgeois F (2017) Deep learning with sparse prior - application to text detection in the wild. In: VISIGRAPP - Volume 5: VISAPP, Porto, Portugal, February 27 - March 1, 2017, pp 243–250
Mishra A, Alahari K, Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition. In: CVPR, Providence, RI, USA June 16-21, 2012, pp 2687–2694
Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: ACCV, New Zealand, November 8-12, 2010, Part III, pp 770–783
Neumann L, Matas J (2013) Scene text localization and recognition with oriented stroke detection. In: ICCV, Australia, December 1-8, 2013, pp 97–104
Neycharan JG, Ahmadyfard A (2018) Edge color transform: a new operator for natural scene text localization. Multimed Tools Appl 77(6):7615–7636. https://doi.org/10.1007/s11042-017-4663-4
Novikova T, Barinova O, Kohli P, Lempitsky VS (2012) Large-lexicon attribute-consistent text recognition in natural images. In: ECCV, Florence, Italy, October 7-13, 2012, Part VI, pp 752–765
Portaz M, Kohl M, Chevallet J, Quénot G, Mulhem P (2019) Object instance identification with fully convolutional networks. Multimed Tools Appl 78(3):2747–2764. https://doi.org/10.1007/s11042-018-5798-7
Rodríguez-Serrano JA, Gordo A, Perronnin F (2015) Label embedding: A frugal baseline for text recognition. IJCV 113(3):193–207
Rothe R, Guillaumin M, Gool LJV (2014) Non-maximum suppression for object detection by passing messages between windows. In: ACCV, Singapore, November 1-5, 2014, Part I, pp 290–306
Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304. https://doi.org/10.1109/TPAMI.2016.2646371
Shivakumara P, Bhowmick S, Su B, Tan CL, Pal U (2011) A new gradient based character segmentation method for video text recognition. In: ICDAR, Beijing, China, September 18-21, 2011, pp 126–130
Shivakumara P, Sreedhar RP, Phan TQ, Lu S, Tan CL (2012) Multioriented video scene text detection through bayesian classification and boundary growing. IEEE Trans Circ Syst Video Techn 22(8):1227–1235
Su B, Lu S (2014) Accurate scene text recognition based on recurrent neural network. In: Computer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Singapore, Singapore, November 1-5, 2014, Revised Selected Papers, Part I. https://doi.org/10.1007/978-3-319-16865-4_3, pp 35–48
Thillou C, Ferreira S, Gosselin B (2005) An embedded application for degraded text recognition. EURASIP J Adv Sig Proc 2005(13):2127–2135
Tian S, Lu S, Su B, Tan CL (2013) Scene text recognition using co-occurrence of histogram of oriented gradients. In: ICDAR, Washington, DC, USA, August 25-28, 2013, pp 912–916
Tong G, Li Y, Gao H, Chen H, Wang H, Yang X (2020) MA-CRNN: a multi-scale attention CRNN for chinese text line recognition in natural scenes. Int J Document Anal Recognit 23(2):103–114. https://doi.org/10.1007/s10032-019-00348-7
Tounsi M, Moalla I, Alimi AM (2016) Supervised dictionary learning in bof framework for scene character recognition. In: ICPR Cancún, Mexico, December 4-8, 2016, pp 3987–3992
Tounsi M, Moalla I, Lebourgeois F, Alimi AM (2018) Multilingual scene character recognition system using sparse auto-encoder for efficient local features representation in bag of features. CoRR abs/1806.07374
Wang K, Babenko B, Belongie SJ (2011) End-to-end scene text recognition. In: ICCV, Barcelona, Spain, November 6-13, 2011, pp 1457–1464
Wang K, Belongie SJ (2010) Word spotting in the wild. In: ECCV, Crete, Greece, September 5-11, 2010, Proceedings, Part I, pp 591–604
Wang D, Wang H, Zhang D, Li J, Zhang D (2015) Robust scene text recognition using sparse coding based features. CoRR abs/1512.08669
Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: ICPR, Tsukuba, Japan, November 11-15, 2012, pp 3304–3308
Xu C, Yang J, Gao J (2019) Coupled-learning convolutional neural networks for object recognition. Multimed Tools Appl 78(1):573–589. https://doi.org/10.1007/s11042-017-5262-0
Yi C, Yang X, Tian Y (2013) Feature representations for scene text character recognition: A comparative study. In: ICDAR, Washington, DC, USA, August 25-28, 2013, pp 907–911
Yin M, Lang C, Li Z, Feng S, Wang T (2019) Recurrent convolutional network for video-based smoke detection. Multimed Tools Appl 78(1):237–256. https://doi.org/10.1007/s11042-017-5561-5
Yuan J, Wei B, Liu Y, Zhang Y, Wang L (2015) A method for text line detection in natural images. Multimed Tools Appl 74(3):859–884. https://doi.org/10.1007/s11042-013-1702-7
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: CVPR, Las Vegas, NV, USA, June 27-30, 2016, pp 4159–4167
Acknowledgments
This work was carried out with the support of the Ministry of Higher Education and Scientific Research and within the framework of Tunisian-Indian cooperation in the field of scientific research and technology.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Harizi, R., Walha, R., Drira, F. et al. Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition. Multimed Tools Appl 81, 3091–3106 (2022). https://doi.org/10.1007/s11042-021-10663-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-10663-z