Text recognition in natural scenes based on deep learning

Jiang, Yi; Jiang, Zhongyu; He, Liang; Chen, Shuai

doi:10.1007/s11042-022-12024-w

Text recognition in natural scenes based on deep learning

Published: 16 February 2022

Volume 81, pages 10545–10559, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yi Jiang¹,
Zhongyu Jiang²,
Liang He³ &
…
Shuai Chen²

397 Accesses
2 Citations
Explore all metrics

Abstract

Aiming at the problems of character segmentation and dictionary dependence in text recognition in natural scenes, a text recognition algorithm based on Attention mechanism and connection time classification (CTC) loss is proposed. Convolutional neural network and bidirectional long short – term memory network are used to realize image feature coding, which avoids the gradient vanishing problem of recurrent neural network (RNN) with the increase of time. And the Attention-CTC structure is used to decode the feature sequence, which effectively solves the problem of unconstrained attention decoding. The algorithm avoids extra processing of alignment and subsequent syntax processing, and improves the speed of training convergence and significantly improves the recognition rate of text. It has a certain research value in recognition accuracy. Experimental results show that the algorithm has good robustness to text images with fuzzy fonts and complex background.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Fig. 5

A review on the long short-term memory model

Article 13 May 2020

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

A survey of the recent architectures of deep convolutional neural networks

Article 21 April 2020

References

Alazab M, Khan S, Krishnan SSR et al (2020) A multidirectional LSTM model for predicting the stability of a smart grid. IEEE Access PP(99):1–11
Google Scholar
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations, San Diego, pp 89–93
Google Scholar
Bahdanau D, Chorowski J, Serdyuk D, et al. End-to-end attention-based large vocabulary speech recognition. Shanghai: The 41st IEEE International Conference on Acoustics, Speech and Signal Processing, 2016:4945–4949.
Bai X, Shi B, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304
Google Scholar
Chen ZJ, Chen DP, Zhang YS, Cheng XZ et al (2020) Deep learning for autonomous ship-oriented small ship detection. Saf Sci 130:132–141
Article Google Scholar
Chen JN, Gao S, Sun HZ et al (2020) An end-to-end speech recognition algorithm based on attention mechanism. Syst Eng Soc China:6–14
Chen ZJ, Cai H, Zhang YS, Wu CZ et al (2020) A novel sparse representation model for pedestrian abnormal trajectory understanding. Expert Syst Appl 144:516–525
Google Scholar
Chen JN, Gao S, Sun HZ et al (2020) An end-to-end speech recognition algorithm based on attention mechanism. Syst Eng Soc China:640–646
Danish V, Alazab M, Sobia W, Hamad N, et al. IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture. Computer Networks, 2020:171–177.
Fernández-Díaz M, Gallardo-Antolín A (2020) An attention long short-term memory based system for automatic classification of speech intelligibility. Eng Appl Artif Intell 96:1–8
Article Google Scholar
Ganesh J, Hubert C (2020) Data augmentation for handwritten digit recognition using generative adversarial networks. Multimed Tools Appl 79:35055–35068
Article Google Scholar
Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE conference on computer vision and pattern recognition, Columbus, OH, 2014, pp. 580–587.
Graves A, Gomez F (2016) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. International Conference on Machine Learning, Hong Kong, pp 742–748
Google Scholar
Hakak S, Alazab M, Khan S, … Khan WZ (2021) An ensemble machine learning approach through effective feature extraction to classify fake news. Futur Gener Comput Syst 117:114–123
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Hori T, Watanabe S, Zhang Y et al (2017) Advances in joint CTC-attention based end-to-end speech recognition with a deep CNN encoder and RNN-LM. IEEE International Conference, USA, pp 1672–1679
Google Scholar
Huang XH, Qiao LS, Yu WT et al (2020) End-to-end sequence labeling via convolutional recurrent neural network with a connectionist temporal classification layer. Int J Comput Intell Syst 13(1):66–73
Article Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. International Conference on Machine Learning, Lille Grand Palais, pp 682–689
Google Scholar
Jabbari M, Khushaba RN, Nazarpour K (2020) EMG-based hand gesture classification with long short-term memory deep recurrent neural networks. Ann Conf Canadian Med Biol Eng Soc:3302–3305
Kim S, Hori T, Watanabe S. Joint CTC-attention based end-to-end speech recognition using multi-task learning. New Orleans: The 42nd IEEE International Conference on Acoustics, Speech and Signal Processing, 2017:798–805.
Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector. European Conference on Computer Vision, 2016:21–37.
Luong MT, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. Lisbon: Empirical Methods Nat Language Process:316–325
Qu S, Xi Y, Ding S (2017) Visual attention based on long-short term memory model for image caption generation. Melbourne: Control Decis Conf:234–239
Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. Proc IEEE Conf Comput Vis Pattern Recognit:779–788
Ren S, He K, Girshick R, … Sun J (2017 Jun) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Sitalakshmi V, Mamoun A, Qing Y (2018) Use of data visualisation for zero-day malware detection. Security Commun Networks 2018:807–816
Google Scholar
Szegedy C, Vanhoucke V, Ioffe S et al (2016) Rethinking the inception architecture for computer vision. Computer Vision and Pattern Recognition, Las Vegas, pp 272–281
Google Scholar
Szegedy C, Ioffe S, Vanhoucke V et al (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. The Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, pp 626–634
Google Scholar
Tian Z, Huang W, He T, et al. Detecting text in natural image with connectionist text proposal network. Springer, Cham, 2016. LNCS, vol. 9912, pp. 56–72.
Tsai ST, Kuo EJ, Tiwary P (2020) Learning molecular dynamics with simple language model built upon long short-term memory neural network. Nat Commun 11(1):1015–1021
Article Google Scholar
Wang LL, Wang BQ, Zhao PP et al (2020) Malware detection algorithm based on the attention mechanism and ResNet. Chin J Electron 29(6):473–480
Google Scholar
Xiong HP, Chen XX, Chen CW (2018) Text location in image based on convolution neural network. Electronic Sci Technol 31(1):51–59
Google Scholar
Xu K, Li D, Cassimatis N et al (2018) LCANet: end-to-end lipreading with cascaded attention-CTC. Xi’an: China Automatic Face Gesture Recogn:351–360
Xu MX, Du XY, Wang DH (2019) Super-resolution restoration of single vehicle image based on ESPCN-VISR model. Adv Sci Industry Res Center: Sci Eng Res Center:517–528
Xue HT, Yang JD, Tan KD (2015) Application of an improved BP neural network in handwriting recognition. Electronic Sci Technol 28(5):20–27
Google Scholar
Yin Z, Tang CH, Zhang XX (2016) Image recognition based on improved sparse auto-encoder. Electronic Sci Technol 29(1):124–127
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Communications Engineering, Harbin University of Science and Technology, 52 Xuefu Road, Nangang District, Harbin, Heilongjiang, China
Yi Jiang
School of Automation, Harbin University of Science and Technology, Harbin, China
Zhongyu Jiang & Shuai Chen
School of Software, Northwestern Polytechnical University, Xi’an, China
Liang He

Authors

Yi Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Zhongyu Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Liang He
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongyu Jiang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, Y., Jiang, Z., He, L. et al. Text recognition in natural scenes based on deep learning. Multimed Tools Appl 81, 10545–10559 (2022). https://doi.org/10.1007/s11042-022-12024-w

Download citation

Received: 11 February 2021
Revised: 16 April 2021
Accepted: 03 January 2022
Published: 16 February 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11042-022-12024-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text recognition in natural scenes based on deep learning

Abstract

Access this article

Similar content being viewed by others

A review on the long short-term memory model

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

A survey of the recent architectures of deep convolutional neural networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Text recognition in natural scenes based on deep learning

Abstract

Access this article

Similar content being viewed by others

A review on the long short-term memory model

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

A survey of the recent architectures of deep convolutional neural networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation