Abstract
Image-based sequence text recognition is an important research direction in the field of computer vision. In this paper, we propose a new model called Attention Recurrent Neural Networks (ARNNs) for the image-based sequence text recognition. ARNNs embed the attention mechanism seamlessly into the recurrent neural networks (RNNs) through an attention gate. The attention gate generates a gating signal that is end-to-end trainable, which empowers the ARNNs to adaptively focus on the important information. The proposed attention gate can be applied to any recurrent networks, e.g., standard RNN, Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU). Experimental results on several benchmark datasets demonstrate that ARNNs consistently improves previous approaches on the image-based sequence text recognition tasks.
Supported by the Major Project for New Generation of AI under Grant No. 2018AAA0100400, the National Natural Science Foundation of China (NSFC) under Grant No. 41706010, the Joint Fund of the Equipments Pre-Research and Ministry of Education of China under Grand No. 6141A020337, the Graduate Education Reform and Research Project of Ocean University of China under Grand No. HDJG19001, and the Fundamental Research Funds for the Central Universities of China.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
Bengio, Y., Simard, P.Y., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: PhotoOCR: reading text in uncontrolled conditions. In: ICCV, pp. 785–792 (2013)
Britz, D., Goldie, A., Luong, M., Le, Q.V.: Massive exploration of neural machine translation architectures. CoRR abs/1703.03906 (2017)
Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: ICCV, pp. 5086–5094 (2017)
Chevalier, G.: LARNN: linear attention recurrent neural network. CoRR abs/1808.05578 (2018)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734 (2014)
Chung, J., Gülçehre, Ç., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR abs/1412.3555 (2014)
Fu, J., Zheng, H., Mei, T.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: CVPR, pp. 4476–4484 (2017)
Gao, Y., Chen, Y., Wang, J., Lu, H.: Reading scene text with attention convolutional sequence modeling. CoRR abs/1709.04303 (2017)
Ghosh, S.K., Valveny, E., Bagdanov, A.D.: Visual attention models for scene text recognition. In: ICDAR, pp. 943–948 (2017)
Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S., Shet, V.D.: Multi-digit number recognition from street view imagery using deep convolutional neural networks. In: ICLR (2014)
Graves, A., Fernández, S., Gomez, F.J., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006)
Graves, A., Mohamed, A., Hinton, G.E.: Speech recognition with deep recurrent neural networks. In: ICASSP, pp. 6645–6649 (2013)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. CoRR abs/1406.2227 (2014)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. IJCV 116(1), 1–20 (2016)
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: NIPS, pp. 2017–2025 (2015)
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)
Lee, C., Osindero, S.: Recursive recurrent nets with attention modeling for OCR in the wild. In: CVPR, pp. 2231–2239 (2016)
Li, S., Li, W., Cook, C., Zhu, C., Gao, Y.: Independently recurrent neural network (indrnn): building a longer and deeper RNN. In: CVPR, pp. 5457–5466 (2018)
Liu, G., Guo, J.: Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 337, 325–338 (2019)
Luong, M., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. CoRR abs/1508.04025 (2015)
Marti, U.V., Bunke, H.: The IAM-database: an english sentence database for offline handwriting recognition. IJDAR 5(1), 39–46 (2002)
Mishra, A., Alahari, K., V. Jawahar, C.: Scene text recognition using higher order language priors. In: BMVC, September 2012
Mnih, V., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention. In: NIPS, pp. 2204–2212 (2014)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS (2011)
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR, pp. 3538–3545 (2012)
Palangi, H., et al.: Deep sentence embedding using the long short term memory network: Analysis and application to information retrieval. CoRR abs/1502.06922 (2015)
Santoro, A., et al.: Relational recurrent neural networks. In: NIPS, pp. 7310–7321 (2018)
Shang, L., Lu, Z., Li, H.: Neural responding machine for short-text conversation. In: ACL, pp. 1577–1586 (2015)
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. TPAMI 41(9), 2035–2048 (2018)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. TPAMI 39(11), 2298–2304 (2017)
Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: CVPR, pp. 4168–4176 (2016)
Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: AAAI, pp. 4263–4270 (2017)
Su, B., Lu, S.: Accurate scene text recognition based on recurrent neural network. In: ACCV, pp. 35–48 (2014)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS, pp. 3104–3112 (2014)
Tian, Y., Hu, W., Jiang, H., Wu, J.: Densely connected attentional pyramid residual network for human pose estimation. Neurocomputing 347, 13–23 (2019)
Tran, K.M., Bisazza, A., Monz, C.: Recurrent memory networks for language modeling. In: NAACL HLT, pp. 321–331 (2016)
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 6000–6010 (2017)
Wang, K., Babenko, B., Belongie, S.J.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)
Wang, K., Belongie, S.J.: Word spotting in the wild. In: ECCV, pp. 591–604 (2010)
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML, pp. 2048–2057 (2015)
Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: CVPR, pp. 4042–4049 (2014)
Zhang, P., Xue, J., Lan, C., Zeng, W., Gao, Z., Zheng, N.: Adding attentiveness to the neurons in recurrent neural networks. In: ECCV, pp. 136–152 (2018)
Zimmermann, M., Bunke, H.: Automatic segmentation of the IAM off-line database for handwritten English text. In: Object Recognition Supported by User Interaction for Service Robots, vol. 4, pp. 35–39, August 2002
Zintgraf, L.M., Cohen, T.S., Adel, T., Welling, M.: Visualizing deep neural network decisions: Prediction difference analysis. CoRR abs/1702.04595 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhong, G., Yue, G. (2020). Attention Recurrent Neural Networks for Image-Based Sequence Text Recognition. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12046. Springer, Cham. https://doi.org/10.1007/978-3-030-41404-7_56
Download citation
DOI: https://doi.org/10.1007/978-3-030-41404-7_56
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41403-0
Online ISBN: 978-3-030-41404-7
eBook Packages: Computer ScienceComputer Science (R0)