Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition

Harizi, Riadh; Walha, Rim; Drira, Fadoua; Zaied, Mourad

doi:10.1007/s11042-021-10663-z

Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition

1167: Data Science on Multimedia Data: Challenges and Applications
Published: 17 March 2021

Volume 81, pages 3091–3106, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Riadh Harizi¹,
Rim Walha ORCID: orcid.org/0000-0002-0483-6329¹,
Fadoua Drira¹ &
…
Mourad Zaied²

637 Accesses
7 Citations
Explore all metrics

Abstract

Text recognition in the wild is a challenging task in the field of computer vision and machine learning. Existing optical character recognition engines cannot perform well in the natural scene. In this context, deep learning models have emerged as a powerful state-of-the-art technique in the classification and recognition process. This study proposes a new Convolutional Neural Network based system for scene text reading. We investigate how to combine the character recognition module followed by the word recognition module to achieve the overall system goal. The first module analyzes characters within multi-scale images by relaying on the power of the convolutional network and the fully connected network for character recognition. The second module relies on the Viterbi search to find the closest word to a given characters sequence. For the sake of more precision, a bigram based linguistic module is applied. The proposed system achieves the state-of-the-art performance on three standard scene text recognition benchmarks: chars74k, ICDAR 2003 and ICDAR 2013. In particular, this performance is proven on both of character and word recognition accuracy as well as speed aspects via a comparative study between different deep learning architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

Convolutional neural network: a review of models, methodologies and applications to object detection

Article 20 December 2019

A survey of the recent architectures of deep convolutional neural networks

Article 21 April 2020

Notes

References

Ahmed SB, Naz S, Razzak MI, Yousaf R (2017) Deep learning based isolated arabic scene character recognition. In: ASAR, Nancy, France, April 3-5, 2017, pp 46–51
Ahmed SB, Razzak MI, Yusof R (2020) Text in a wild and its challenges. Springer, Singapore, pp 13–30. https://doi.org/10.1007/978-981-15-1297-1-2
Google Scholar
Almazán J, Gordo A, Fornés A, Valveny E (2014) Word spotting and recognition with embedded attributes. PAMI 36(12):2552–2566
Article Google Scholar
Altwaijry N, A.T.I. (2020) Arabic handwriting recognition system using convolutional neural network. Neural Comput Applic (2020). https://doi.org/10.1007/s00521-020-05070-8
Arafat SY, Iqbal MJ (2020) Urdu-text detection and recognition in natural scene images using deep learning. IEEE Access 8:96787–96803. https://doi.org/10.1109/ACCESS.2020.2994214
Article Google Scholar
Bahi HE, Zatni A (2019) Text recognition in document images obtained by a smartphone based on deep convolutional and recurrent neural network. Multimed Tools Appl 78 (18):26453–26481. https://doi.org/10.1007/s11042-019-07855-z
Article Google Scholar
Bai X, Yao C, Liu W (2016) Strokelets: A learned multi-scale mid-level representation for scene text recognition. TIP 25(6):2789–2802
MathSciNet MATH Google Scholar
Bhunia AK, Kumar G, Roy PP, Balasubramanian R, Pal U (2018) Text recognition in scene image and video frame using color channel selection. Multimed Tools Appl 77(7):8551–8578. https://doi.org/10.1007/s11042-017-4750-6
Article Google Scholar
Bigorda LG, Karatzas D (2016) A fine-grained approach to scene text script identification. In: DAS, Santorini, Greece, April 11-14, 2016, pp 192–197
Bissacco A, Cummins M, Netzer Y, Neven H (2013) Photoocr: Reading text in uncontrolled conditions. In: ICCV, Sydney, Australia, December 1-8, 2013, pp 785–792
Borisyuk F, Gordo A, Sivakumar V (2018) Rosetta: Large scale system for text detection and recognition in images. In: KDD, London, UK, August 19-23, 2018, pp 71–79
Chan T, Jia K, Gao S, Lu J, Zeng Z, Ma Y (2015) Pcanet: A simple deep learning baseline for image classification? TIP 24(12):5017–5032
MathSciNet MATH Google Scholar
Chang C, Lin C (2001) Training nu-support vector classifiers: Theory and algorithms. Neural Comput 13(9):2119–2147
Article Google Scholar
Chen X, Wang T, Zhu Y, Jin L, Luo C (2020) Adaptive embedding gate for attention-based scene text recognition. Neurocomputing 381:261–271. https://doi.org/10.1016/j.neucom.2019.11.049
Article Google Scholar
Coates A, Carpenter B, Case C, Satheesh S, Suresh B, Wang T, Wu DJ, Ng AY (2011) Text detection and character recognition in scene images with unsupervised feature learning. In: ICDAR, Beijing, China, September 18-21 2011, pp 440–445
de Campos TE, Babu BR, Varma M (2009) Character recognition in natural images. In: VISAPP, Portugal, February 5-8, 2009, vol 2, pp 273–280
Elagouni K, Garcia C, Mamalet F, Sébillot P. (2012) Combining multi-scale character recognition and linguistic knowledge for natural scene text OCR. In: DAS, Queenslands, Australia, March 27-29, 2012, pp 120–124
Ghifary M, Kleijn WB, Zhang M, Balduzzi D (2015) Domain generalization for object recognition with multi-task autoencoders. In: ICCV, Santiago, Chile, December 7-13, 2015, pp 2551–2559
Goel V, Mishra A, Alahari K, Jawahar CV (2013) Whole is greater than sum of parts: Recognizing scene text words. In: ICDAR 2013, Washington, DC, USA, August 25-28 2013, pp 398–402
Gordo A (2015) Supervised mid-level features for word image representation. In: CVPR, Boston, MA, USA, June 7-12 2015, pp 2956–2964
Guemri K, Drira F, Walha R, Alimi AM, Lebourgeois F (2017) Edge based blind single image deblurring with sparse priors. In: VISIGRAPP - Volume 4: VISAPP, Porto Portugal, pp 174–181
Hassaballah M, Awad AI (2020) Deep learning in computer vision: Principles and applications. CRC Press Taylor and Francis Group. https://doi.org/10.1201/9781351003827
Hassaballah M, Hosny K (2019) Recent advances in computer vision: Theories and applications. Springer International Publishing, New York. https://doi.org/10.1007/978-3-030-03000-1
Book Google Scholar
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. IJCV 116(1):1–20
Article MathSciNet Google Scholar
Jaderberg M, Vedaldi A, Zisserman A (2014). In: ECCV, Switzerland, September 6-12, 2014, Part IV, pp 512–528
Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda LG, Mestre SR, Mas J, Mota DF, Almazán J., de las Heras L (2013) ICDAR 2013 robust reading competition. In: ICDAR, Washington, DC, USA, August 25-28, 2013, pp 1484–1493
LeCun Y, Chopra S, Ranzato M, Huang FJ (2007) Energy-based models in document recognition and computer vision. In: ICDAR 23-26 September, Curitiba, Paraná Brazil. https://doi.org/10.1109/ICDAR.2007.107, pp 337–341
Liao M, Shi B, Bai X (2018) Textboxes++: A single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690. https://doi.org/10.1109/TIP.2018.2825107
Article MathSciNet Google Scholar
Liu X, Kawanishi T, Wu X, Kashino K (2016) Scene text recognition with CNN classifier and wfst-based word labeling. In: ICPR. https://doi.org/10.1109/ICPR.2016.7900259. IEEE, pp 3999–4004
Liu X, Kawanishi T, Wu X, Kashino K (2016) Scene text recognition with high performance CNN classifier and efficient word inference. In: ICASSP, Shanghai, China, March 20-25 2016, pp 1322–1326
Long S, He X, Yao C (2018) Scene text detection and recognition: The deep learning era. CoRR abs/181104256
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competitions. In: ICDAR, 2-Volume Set, 3-6 August 2003, Scotland UK, pp 682–687
Mallek A, Drira F, Walha R, Alimi AM, Lebourgeois F (2017) Deep learning with sparse prior - application to text detection in the wild. In: VISIGRAPP - Volume 5: VISAPP, Porto, Portugal, February 27 - March 1, 2017, pp 243–250
Mishra A, Alahari K, Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition. In: CVPR, Providence, RI, USA June 16-21, 2012, pp 2687–2694
Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: ACCV, New Zealand, November 8-12, 2010, Part III, pp 770–783
Neumann L, Matas J (2013) Scene text localization and recognition with oriented stroke detection. In: ICCV, Australia, December 1-8, 2013, pp 97–104
Neycharan JG, Ahmadyfard A (2018) Edge color transform: a new operator for natural scene text localization. Multimed Tools Appl 77(6):7615–7636. https://doi.org/10.1007/s11042-017-4663-4
Article Google Scholar
Novikova T, Barinova O, Kohli P, Lempitsky VS (2012) Large-lexicon attribute-consistent text recognition in natural images. In: ECCV, Florence, Italy, October 7-13, 2012, Part VI, pp 752–765
Portaz M, Kohl M, Chevallet J, Quénot G, Mulhem P (2019) Object instance identification with fully convolutional networks. Multimed Tools Appl 78(3):2747–2764. https://doi.org/10.1007/s11042-018-5798-7
Article Google Scholar
Rodríguez-Serrano JA, Gordo A, Perronnin F (2015) Label embedding: A frugal baseline for text recognition. IJCV 113(3):193–207
Article Google Scholar
Rothe R, Guillaumin M, Gool LJV (2014) Non-maximum suppression for object detection by passing messages between windows. In: ACCV, Singapore, November 1-5, 2014, Part I, pp 290–306
Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304. https://doi.org/10.1109/TPAMI.2016.2646371
Article Google Scholar
Shivakumara P, Bhowmick S, Su B, Tan CL, Pal U (2011) A new gradient based character segmentation method for video text recognition. In: ICDAR, Beijing, China, September 18-21, 2011, pp 126–130
Shivakumara P, Sreedhar RP, Phan TQ, Lu S, Tan CL (2012) Multioriented video scene text detection through bayesian classification and boundary growing. IEEE Trans Circ Syst Video Techn 22(8):1227–1235
Article Google Scholar
Su B, Lu S (2014) Accurate scene text recognition based on recurrent neural network. In: Computer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Singapore, Singapore, November 1-5, 2014, Revised Selected Papers, Part I. https://doi.org/10.1007/978-3-319-16865-4_3, pp 35–48
Thillou C, Ferreira S, Gosselin B (2005) An embedded application for degraded text recognition. EURASIP J Adv Sig Proc 2005(13):2127–2135
Google Scholar
Tian S, Lu S, Su B, Tan CL (2013) Scene text recognition using co-occurrence of histogram of oriented gradients. In: ICDAR, Washington, DC, USA, August 25-28, 2013, pp 912–916
Tong G, Li Y, Gao H, Chen H, Wang H, Yang X (2020) MA-CRNN: a multi-scale attention CRNN for chinese text line recognition in natural scenes. Int J Document Anal Recognit 23(2):103–114. https://doi.org/10.1007/s10032-019-00348-7
Article Google Scholar
Tounsi M, Moalla I, Alimi AM (2016) Supervised dictionary learning in bof framework for scene character recognition. In: ICPR Cancún, Mexico, December 4-8, 2016, pp 3987–3992
Tounsi M, Moalla I, Lebourgeois F, Alimi AM (2018) Multilingual scene character recognition system using sparse auto-encoder for efficient local features representation in bag of features. CoRR abs/1806.07374
Wang K, Babenko B, Belongie SJ (2011) End-to-end scene text recognition. In: ICCV, Barcelona, Spain, November 6-13, 2011, pp 1457–1464
Wang K, Belongie SJ (2010) Word spotting in the wild. In: ECCV, Crete, Greece, September 5-11, 2010, Proceedings, Part I, pp 591–604
Wang D, Wang H, Zhang D, Li J, Zhang D (2015) Robust scene text recognition using sparse coding based features. CoRR abs/1512.08669
Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: ICPR, Tsukuba, Japan, November 11-15, 2012, pp 3304–3308
Xu C, Yang J, Gao J (2019) Coupled-learning convolutional neural networks for object recognition. Multimed Tools Appl 78(1):573–589. https://doi.org/10.1007/s11042-017-5262-0
Article Google Scholar
Yi C, Yang X, Tian Y (2013) Feature representations for scene text character recognition: A comparative study. In: ICDAR, Washington, DC, USA, August 25-28, 2013, pp 907–911
Yin M, Lang C, Li Z, Feng S, Wang T (2019) Recurrent convolutional network for video-based smoke detection. Multimed Tools Appl 78(1):237–256. https://doi.org/10.1007/s11042-017-5561-5
Article Google Scholar
Yuan J, Wei B, Liu Y, Zhang Y, Wang L (2015) A method for text line detection in natural images. Multimed Tools Appl 74(3):859–884. https://doi.org/10.1007/s11042-013-1702-7
Article Google Scholar
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: CVPR, Las Vegas, NV, USA, June 27-30, 2016, pp 4159–4167

Download references

Acknowledgments

This work was carried out with the support of the Ministry of Higher Education and Scientific Research and within the framework of Tunisian-Indian cooperation in the field of scientific research and technology.

Author information

Authors and Affiliations

REGIM-Lab, ENIS, University of Sfax, BP 1173, 3038, Sfax, Tunisia
Riadh Harizi, Rim Walha & Fadoua Drira
Research Team on Intelligent Machines, ENIG, University of Gabes, Gabes, Tunisia
Mourad Zaied

Authors

Riadh Harizi
View author publications
You can also search for this author in PubMed Google Scholar
Rim Walha
View author publications
You can also search for this author in PubMed Google Scholar
Fadoua Drira
View author publications
You can also search for this author in PubMed Google Scholar
Mourad Zaied
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rim Walha.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Harizi, R., Walha, R., Drira, F. et al. Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition. Multimed Tools Appl 81, 3091–3106 (2022). https://doi.org/10.1007/s11042-021-10663-z

Download citation

Received: 24 January 2020
Revised: 03 December 2020
Accepted: 04 February 2021
Published: 17 March 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s11042-021-10663-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition

Abstract

Access this article

Similar content being viewed by others

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

Convolutional neural network: a review of models, methodologies and applications to object detection

A survey of the recent architectures of deep convolutional neural networks

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition

Abstract

Access this article

Similar content being viewed by others

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

Convolutional neural network: a review of models, methodologies and applications to object detection

A survey of the recent architectures of deep convolutional neural networks

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation