Skip to main content
Log in

PIEED: Position information enhanced encoder-decoder framework for scene text recognition

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Scene text recognition (STR) technology has a rapid development with the rise of deep learning. Recently, the encoder-decoder framework based on attention mechanism is widely used in STR for better recognition. However, the commonly used Long Short Term Memory (LSTM) network in the framework tends to ignore certain position or visual information. To address this problem, we propose a Position Information Enhanced Encoder-Decoder (PIEED) framework for scene text recognition, in which an addition position information enhancement (PIE) module is proposed to compensate the shortage of the LSTM network. Our module tends to retain more position information in the feature sequence, as well as the context information extracted by the LSTM network, which is helpful to improve the recognition accuracy of the text without context. Besides that, our fusion decoder can make full use of the output of the proposed module and the LSTM network, so as to independently learn and preserve useful features, which is helpful to improve the recognition accuracy while not increase the number of arguments. Our overall framework can be trained end-to-end only using images and ground truth. Extensive experiments on several benchmark datasets demonstrate that our proposed framework surpass state-of-the-art ones on both regular and irregular text recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Neumann L, Matas J (2016) Real-Time Lexicon-Free Scene Text Localization and Recognition. IEEE Trans Pattern Anal Mach Intell 38(9):1872–1885

    Article  Google Scholar 

  2. Rodriguez J, Gordo A, Perronnin F (2015) Label embedding: a frugal baseline for text recognition. Int J Comput Vis 113:193– 207

    Article  Google Scholar 

  3. Bai X, Yao C, Liu W (2016) Strokelets: a learned Multi-Scale Mid-Level representation for scene text recognition. IEEE Trans Image Process 25(6):2789–2802

    Article  MathSciNet  Google Scholar 

  4. Li S, Tang M, Guo Q, Lei J, Zhang J (2017) Deep neural network with attention model for scene text recognition. IET Comput Vis 11(7):605–612

    Article  Google Scholar 

  5. Huang Y, Sun X, Jin L, Luo C (2020) EPAN: Effective Parts attention network for scene text recognition. Neurocomputing 376:202–213

    Article  Google Scholar 

  6. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20

    Article  MathSciNet  Google Scholar 

  7. Wang Y, Wang M, Fujita H (2020) Word Sense Disambiguation: A comprehensive knowledge exploitation framework. Knowledge-Based Systems. https://doi.org/10.1016/j.knosys.2019.105030

  8. Soni R, Kumar B, Chand S (2019) Text detection and localization in natural scene images based on text awareness score. Appl Intell 49:1376–1405

    Article  Google Scholar 

  9. Baek J, Kim G, Lee J, Park S, Han D, Yun S, Oh S, Lee H (2019) What is wrong with scene text recognition model comparisons? dataset and model analysis. In: IEEE International Conference on Computer Vision, pp 4715–4723

  10. Shi B, Yang M, Wang X, Lyu P, Co Yao, Bai X (2019) Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans Pattern Anal Mach Intell 41(9):2035–2048

    Article  Google Scholar 

  11. Zhan F, Lu S (2019) ESIR: End-to-end scene text recognition via iterative image rectification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2059–2068

  12. Luo C, Jin L, Sun Z (2019) MORAN: A multi-object rectified attention network for scene text recognition. Pattern Recogn 90:109–118

    Article  Google Scholar 

  13. Yang M, Guan Y, Liao M, He X, Bian K, Bai S, Yao C, Bai X (2019) Symmetry-constrained rectification network for scene text recognition. In: IEEE International Conference on Computer Vision, pp 9147–9156

  14. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations

  15. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  16. Lee C, Osindero S (2016) Recursive recurrent nets with attention modeling for ocr in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2231–2239

  17. Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304

    Article  Google Scholar 

  18. Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 4167–4176

  19. Gers F, Schraudolph N, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3(8):115–143

    MathSciNet  MATH  Google Scholar 

  20. Graves A, Fernández S, Gomez F, Schmidhuber J (2019) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: International Conference on Machine Learning, pp 369-376

  21. Cheng Z, Bai F, Xu Y, Zheng G, Pu S, Zhou S (2017) Focusing attention: Towards accurate text recognition in natural images. In: IEEE International Conference on Computer Vision, pp 5076–5084

  22. Qiao Z, Zhou Y, Yang D, Zhou Y, Wang W (2020) SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13528–13537

  23. Yang X, He D, Zhou Z, Kifer D, Giles C (2017) Learning to Read Irregular Text with Attention Mechanisms. In: International Joint Conference on Artificial Intelligence, pp 3280–3286

  24. Li H, Wang P, Shen C, Zhang G (2019) Show, attend and read: A simple and strong baseline for irregular text recognition. In: AAAI Conference on Artificial Intelligence, pp 8610–8617

  25. Wang P, Yang L, Li H, Deng Y, Shen C, Zhang Y (2020) A holistic representation guided attention network for scene text recognition. Neurocomputing 414:67–75

    Article  Google Scholar 

  26. Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian Conference on Computer Vision, pp 770–783

  27. Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2963–2970

  28. Yao C, Bai X, Liu W (2014) A unified framework for multi oriented text detection and recognition. IEEE Trans Image Process 23(11):4737–4749

    Article  MathSciNet  Google Scholar 

  29. Gao Y, Chen Y, Wang J, Tang M, Lu H (2019) Reading scene text with fully convolutional sequence modeling. Neurocomputing 339:161–170

    Article  Google Scholar 

  30. Su B, Lu S (2017) Accurate recognition of words in scenes without character segmentation using recurrent neural network. Pattern Recogn 63:397–405

    Article  Google Scholar 

  31. Phan T, Shivakumara P, Tian S, Tan C (2019) Recognizing text with perspective distortion in natural scenes. In: IEEE International Conference on Computer Vision, pp 569–576

  32. Jaderberg M, Simonyan K, Zisserman A (2015) Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp 2017–2025

  33. Sutskever I, Vinyals O, Le Q (2014) Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp 3104–3112

  34. Luong M T, Pham H, Manning C D (2015) Effective approaches to attention-based neural machine translation. Computer Science

  35. Litman R, Anschel O, Tsiper S, Litman R, Mazor S, Manmatha R (2020) SCATTER: Selective context attentional scene text recognizer. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11962–11972

  36. Chen K, Wang T, Zhu Y, Jin L, Luo C (2020) Adaptive embedding gate for attention-based scene text recognition. Neurocomputing 381:261–271

    Article  Google Scholar 

  37. Wang T, Zhu Y, Jin L, Luo C, Chen X, Wu Y, Wang Q, Cai M (2020) Decoupled Attention Network for Text Recognition. In: AAAI Conference on Artificial Intelligence, pp 12216–12224

  38. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, pp 5998–6008

  39. Sheng F, Chen Z, Xu B (2019) NRTR: A no-recurrence sequence-to-sequence model for scene text recognition. In: International Conference on Document Analysis and Recognition, pp 781– 786

  40. Mishra A, Alahari K, Jawahar C (2012) Scene text recognition using higher order language priors. In: British Machine Vision Conference, pp 1–11

  41. Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: IEEE International Conference on Computer Vision, pp 1457–1464

  42. Karatzas D, Shafait F, Uchida S, Iwamura M et al (2013) ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition, pp 1484–1493

  43. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S et al (2015) ICDAR 2015 competition on robust reading. In: International Conference on Document Analysis and Recognition, pp 1156–1160

  44. Quy T, Shivakumara P, Tian S, Lim T (2013) Recognizing text with perspective distortion in natural scenes. In: IEEE International Conference on Computer Vision, pp 569–576

  45. Risnumawan A, Shivakumara P, Chan C, Tan C (2014) A robust arbitrary text detection system for natural scene images. Expert Syst Appl 41(18):8027–8048

    Article  Google Scholar 

  46. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20

    Article  MathSciNet  Google Scholar 

  47. Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2315–2324

  48. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp 8026–8037

  49. Liu W, Chen C, Wong K, Su Z, Han J (2016) STAR-Net: A Spatial Attention Residue Network for Scene Text Recognition. In: British Machine Vision Conference, pp 2–7

  50. Liu W, Chen C, Wong K (2018) Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition. In: AAAI on Artificial Intelligence, pp 7154–7161

  51. Cheng Z, Xu Y, Bai F, Niu Y, Pu S, Zhou S (2018) Aon: Towards arbitrarily-oriented text recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5571–5579

  52. Xie Z, Huang Y, Zhu Y, Jin L, Liu Y, Xie L (2019) Aggregation cross-entropy for sequence recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6538–6547

  53. Wan Z, He M, Chen H, Bai X, Yao C (2019) Textscanner: Reading characters in order for robust scene text recognition. In: AAAI Conference on Artificial Intelligence, pp 12120–12127

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai He.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, X., He, K., Zhang, D. et al. PIEED: Position information enhanced encoder-decoder framework for scene text recognition. Appl Intell 51, 6698–6707 (2021). https://doi.org/10.1007/s10489-021-02219-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02219-3

Keywords

Navigation