Abstract
Offline handwritten text recognition is an important part of document analysis and it has been receiving a lot of attention from numerous researchers for decades. In this paper, we present a self-attention-based model for offline handwritten textline recognition. The proposed model consists of three main components: a feature extractor by CNN; an encoder by a BLSTM network and a self-attention module; and a decoder by CTC. The self-attention module is complementary to RNN in the encoder and helps the encoder to capture long-range and multi-level dependencies across an input sequence. According to the extensive experiments on the two datasets of IAM Handwriting and Kuzushiji, the proposed model achieves better accuracy than the state-of-the-art models. The self-attention map visualization shows that the self-attention mechanism helps the encoder capture long-range and multi-level dependencies across an input sequence.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Nguyen, K.C., Nguyen, C.T., Nakagawa, M.: A segmentation method of single- and multiple-touching characters in offline handwritten Japanese text recognition. IEICE Trans. Inf. Syst. E100.D, 2962–2972 (2017)
Wang, Q.-F., Yin, F., Liu, C.-L.: Handwritten Chinese text recognition by integrating multiple contexts. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1469–1481 (2012)
El-Yacoubi, A., Gilloux, M., Sabourin, R., Suen, C.Y.: An HMM-based approach for off-line unconstrained handwritten word modeling and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 21, 752–760 (1999)
España-Boquera, S., Castro-Bleda, M.J., Gorbe-Moya, J., Zamora-Martinez, F.: Improving offline handwritten text recognition with hybrid HMM/ANN models. IEEE Trans. Pattern Anal. Mach. Intell. 33, 767–779 (2011)
Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31, 855–868 (2009)
Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances Neural Information Processing System 21, NIPS 2021, pp. 545–552 (2008)
Pham, V., Bluche, T., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 285–290 (2014)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2298–2304 (2017)
Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 646–651 (2017)
Ly, N.-T., Nguyen, C.-T., Nguyen, K.-C., Nakagawa, M.: Deep convolutional recurrent network for segmentation-free offline handwritten Japanese text recognition. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 5–9 (2017)
Ly, N.T., Nguyen, C.T., Nakagawa, M.: Training an end-to-end model for offline handwritten japanese text recognition by generated synthetic patterns. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 74–79 (2018)
Ly, N.T., Nguyen, K.C., Nguyen, C.T., Nakagawa, M.: Recognition of anomalously deformed kana sequences in Japanese historical documents. IEICE Trans. Inf. Syst. E102D (2019)
Puigcerver, J.: Are Multidimensional recurrent layers really necessary for handwritten text recognition? In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 67–72 (2017)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5999–6009 (2017)
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421 (2015)
Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945–4949 (2016)
Sueiras, J., Ruiz, V., Sanchez, A., Velez, J.F.: Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing 289, 119–128 (2018)
Bluche, T., Louradour, J., Messina, R.: Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 1050–1055 (2017)
Bluche, T.: Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. Neural Inf. Process. Syst. (2016)
Ly, N.T., Nguyen, C.T., Nakagawa, M.: An attention-based end-to-end model for multiple text lines recognition in japanese historical documents. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 629–634 (2019)
Ly, N.T., Nguyen, C.T., Nakagawa, M.: An attention-based row-column encoder-decoder model for text recognition in Japanese historical documents. Pattern Recognit. Lett. 136, 134–141 (2020)
Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D., Shen, H.T.: Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2735–2744 (2019)
Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. arXiv (2020)
Ly, N.T., Nguyen, C.T., Nakagawa, M.: Attention augmented convolutional recurrent network for handwritten Japanese text recognition. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 163–168 (2020)
Ngo, T.T., Nguyen, H.T., Ly, N.T., Nakagawa, M.: Recurrent neural network transducer for Japanese and Chinese offline handwritten text recognition. arXiv (2021)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the ACM International Conference Proceeding Series, pp. 369–376 (2006)
Kuzushiji dataset. http://codh.rois.ac.jp/char-shape/book/. Accessed 07 Mar 2020
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the International Conference on Machine Learning, ICML 2015, pp. 448–456 (2015)
Moysset, B., Messina, R.: Are 2D-LSTM really dead for offline text recognition? In: International Journal on Document Analysis and Recognition, pp. 193–208 (2019)
Acknowledgments
This research is being partially supported by the grant-in-aid for scientific research (S) 18H05221 and (A) 18H03597.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Ly, N.T., Ngo, T.T., Nakagawa, M. (2022). A Self-attention Based Model for Offline Handwritten Text Recognition. In: Wallraven, C., Liu, Q., Nagahara, H. (eds) Pattern Recognition. ACPR 2021. Lecture Notes in Computer Science, vol 13189. Springer, Cham. https://doi.org/10.1007/978-3-031-02444-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-02444-3_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-02443-6
Online ISBN: 978-3-031-02444-3
eBook Packages: Computer ScienceComputer Science (R0)