Skip to main content

Attention Recurrent Neural Networks for Image-Based Sequence Text Recognition

  • Conference paper
  • First Online:
  • 1408 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12046))

Abstract

Image-based sequence text recognition is an important research direction in the field of computer vision. In this paper, we propose a new model called Attention Recurrent Neural Networks (ARNNs) for the image-based sequence text recognition. ARNNs embed the attention mechanism seamlessly into the recurrent neural networks (RNNs) through an attention gate. The attention gate generates a gating signal that is end-to-end trainable, which empowers the ARNNs to adaptively focus on the important information. The proposed attention gate can be applied to any recurrent networks, e.g., standard RNN, Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU). Experimental results on several benchmark datasets demonstrate that ARNNs consistently improves previous approaches on the image-based sequence text recognition tasks.

Supported by the Major Project for New Generation of AI under Grant No. 2018AAA0100400, the National Natural Science Foundation of China (NSFC) under Grant No. 41706010, the Joint Fund of the Equipments Pre-Research and Ministry of Education of China under Grand No. 6141A020337, the Graduate Education Reform and Research Project of Ocean University of China under Grand No. HDJG19001, and the Fundamental Research Funds for the Central Universities of China.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)

    Google Scholar 

  2. Bengio, Y., Simard, P.Y., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)

    Article  Google Scholar 

  3. Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: PhotoOCR: reading text in uncontrolled conditions. In: ICCV, pp. 785–792 (2013)

    Google Scholar 

  4. Britz, D., Goldie, A., Luong, M., Le, Q.V.: Massive exploration of neural machine translation architectures. CoRR abs/1703.03906 (2017)

    Google Scholar 

  5. Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: ICCV, pp. 5086–5094 (2017)

    Google Scholar 

  6. Chevalier, G.: LARNN: linear attention recurrent neural network. CoRR abs/1808.05578 (2018)

    Google Scholar 

  7. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734 (2014)

    Google Scholar 

  8. Chung, J., Gülçehre, Ç., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR abs/1412.3555 (2014)

    Google Scholar 

  9. Fu, J., Zheng, H., Mei, T.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: CVPR, pp. 4476–4484 (2017)

    Google Scholar 

  10. Gao, Y., Chen, Y., Wang, J., Lu, H.: Reading scene text with attention convolutional sequence modeling. CoRR abs/1709.04303 (2017)

    Google Scholar 

  11. Ghosh, S.K., Valveny, E., Bagdanov, A.D.: Visual attention models for scene text recognition. In: ICDAR, pp. 943–948 (2017)

    Google Scholar 

  12. Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S., Shet, V.D.: Multi-digit number recognition from street view imagery using deep convolutional neural networks. In: ICLR (2014)

    Google Scholar 

  13. Graves, A., Fernández, S., Gomez, F.J., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006)

    Google Scholar 

  14. Graves, A., Mohamed, A., Hinton, G.E.: Speech recognition with deep recurrent neural networks. In: ICASSP, pp. 6645–6649 (2013)

    Google Scholar 

  15. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  16. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. CoRR abs/1406.2227 (2014)

    Google Scholar 

  17. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. IJCV 116(1), 1–20 (2016)

    Article  MathSciNet  Google Scholar 

  18. Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: NIPS, pp. 2017–2025 (2015)

    Google Scholar 

  19. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)

    Google Scholar 

  20. Lee, C., Osindero, S.: Recursive recurrent nets with attention modeling for OCR in the wild. In: CVPR, pp. 2231–2239 (2016)

    Google Scholar 

  21. Li, S., Li, W., Cook, C., Zhu, C., Gao, Y.: Independently recurrent neural network (indrnn): building a longer and deeper RNN. In: CVPR, pp. 5457–5466 (2018)

    Google Scholar 

  22. Liu, G., Guo, J.: Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 337, 325–338 (2019)

    Article  Google Scholar 

  23. Luong, M., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. CoRR abs/1508.04025 (2015)

    Google Scholar 

  24. Marti, U.V., Bunke, H.: The IAM-database: an english sentence database for offline handwriting recognition. IJDAR 5(1), 39–46 (2002)

    Article  Google Scholar 

  25. Mishra, A., Alahari, K., V. Jawahar, C.: Scene text recognition using higher order language priors. In: BMVC, September 2012

    Google Scholar 

  26. Mnih, V., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention. In: NIPS, pp. 2204–2212 (2014)

    Google Scholar 

  27. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS (2011)

    Google Scholar 

  28. Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR, pp. 3538–3545 (2012)

    Google Scholar 

  29. Palangi, H., et al.: Deep sentence embedding using the long short term memory network: Analysis and application to information retrieval. CoRR abs/1502.06922 (2015)

    Google Scholar 

  30. Santoro, A., et al.: Relational recurrent neural networks. In: NIPS, pp. 7310–7321 (2018)

    Google Scholar 

  31. Shang, L., Lu, Z., Li, H.: Neural responding machine for short-text conversation. In: ACL, pp. 1577–1586 (2015)

    Google Scholar 

  32. Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. TPAMI 41(9), 2035–2048 (2018)

    Article  Google Scholar 

  33. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. TPAMI 39(11), 2298–2304 (2017)

    Article  Google Scholar 

  34. Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: CVPR, pp. 4168–4176 (2016)

    Google Scholar 

  35. Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: AAAI, pp. 4263–4270 (2017)

    Google Scholar 

  36. Su, B., Lu, S.: Accurate scene text recognition based on recurrent neural network. In: ACCV, pp. 35–48 (2014)

    Chapter  Google Scholar 

  37. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS, pp. 3104–3112 (2014)

    Google Scholar 

  38. Tian, Y., Hu, W., Jiang, H., Wu, J.: Densely connected attentional pyramid residual network for human pose estimation. Neurocomputing 347, 13–23 (2019)

    Article  Google Scholar 

  39. Tran, K.M., Bisazza, A., Monz, C.: Recurrent memory networks for language modeling. In: NAACL HLT, pp. 321–331 (2016)

    Google Scholar 

  40. Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 6000–6010 (2017)

    Google Scholar 

  41. Wang, K., Babenko, B., Belongie, S.J.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)

    Google Scholar 

  42. Wang, K., Belongie, S.J.: Word spotting in the wild. In: ECCV, pp. 591–604 (2010)

    Chapter  Google Scholar 

  43. Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML, pp. 2048–2057 (2015)

    Google Scholar 

  44. Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: CVPR, pp. 4042–4049 (2014)

    Google Scholar 

  45. Zhang, P., Xue, J., Lan, C., Zeng, W., Gao, Z., Zheng, N.: Adding attentiveness to the neurons in recurrent neural networks. In: ECCV, pp. 136–152 (2018)

    Google Scholar 

  46. Zimmermann, M., Bunke, H.: Automatic segmentation of the IAM off-line database for handwritten English text. In: Object Recognition Supported by User Interaction for Service Robots, vol. 4, pp. 35–39, August 2002

    Google Scholar 

  47. Zintgraf, L.M., Cohen, T.S., Adel, T., Welling, M.: Visualizing deep neural network decisions: Prediction difference analysis. CoRR abs/1702.04595 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoqiang Zhong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhong, G., Yue, G. (2020). Attention Recurrent Neural Networks for Image-Based Sequence Text Recognition. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12046. Springer, Cham. https://doi.org/10.1007/978-3-030-41404-7_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-41404-7_56

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-41403-0

  • Online ISBN: 978-3-030-41404-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics