A Self-attention Based Model for Offline Handwritten Text Recognition

Ly, Nam Tuan; Ngo, Trung Tan; Nakagawa, Masaki

doi:10.1007/978-3-031-02444-3_27

A Self-attention Based Model for Offline Handwritten Text Recognition

Conference paper
First Online: 10 May 2022

887 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13189))

Abstract

Offline handwritten text recognition is an important part of document analysis and it has been receiving a lot of attention from numerous researchers for decades. In this paper, we present a self-attention-based model for offline handwritten textline recognition. The proposed model consists of three main components: a feature extractor by CNN; an encoder by a BLSTM network and a self-attention module; and a decoder by CTC. The self-attention module is complementary to RNN in the encoder and helps the encoder to capture long-range and multi-level dependencies across an input sequence. According to the extensive experiments on the two datasets of IAM Handwriting and Kuzushiji, the proposed model achieves better accuracy than the state-of-the-art models. The self-attention map visualization shows that the self-attention mechanism helps the encoder capture long-range and multi-level dependencies across an input sequence.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Nguyen, K.C., Nguyen, C.T., Nakagawa, M.: A segmentation method of single- and multiple-touching characters in offline handwritten Japanese text recognition. IEICE Trans. Inf. Syst. E100.D, 2962–2972 (2017)
Google Scholar
Wang, Q.-F., Yin, F., Liu, C.-L.: Handwritten Chinese text recognition by integrating multiple contexts. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1469–1481 (2012)
Article Google Scholar
El-Yacoubi, A., Gilloux, M., Sabourin, R., Suen, C.Y.: An HMM-based approach for off-line unconstrained handwritten word modeling and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 21, 752–760 (1999)
Article Google Scholar
España-Boquera, S., Castro-Bleda, M.J., Gorbe-Moya, J., Zamora-Martinez, F.: Improving offline handwritten text recognition with hybrid HMM/ANN models. IEEE Trans. Pattern Anal. Mach. Intell. 33, 767–779 (2011)
Article Google Scholar
Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31, 855–868 (2009)
Article Google Scholar
Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances Neural Information Processing System 21, NIPS 2021, pp. 545–552 (2008)
Google Scholar
Pham, V., Bluche, T., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 285–290 (2014)
Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2298–2304 (2017)
Article Google Scholar
Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 646–651 (2017)
Google Scholar
Ly, N.-T., Nguyen, C.-T., Nguyen, K.-C., Nakagawa, M.: Deep convolutional recurrent network for segmentation-free offline handwritten Japanese text recognition. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 5–9 (2017)
Google Scholar
Ly, N.T., Nguyen, C.T., Nakagawa, M.: Training an end-to-end model for offline handwritten japanese text recognition by generated synthetic patterns. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 74–79 (2018)
Google Scholar
Ly, N.T., Nguyen, K.C., Nguyen, C.T., Nakagawa, M.: Recognition of anomalously deformed kana sequences in Japanese historical documents. IEICE Trans. Inf. Syst. E102D (2019)
Google Scholar
Puigcerver, J.: Are Multidimensional recurrent layers really necessary for handwritten text recognition? In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 67–72 (2017)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5999–6009 (2017)
Google Scholar
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421 (2015)
Google Scholar
Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945–4949 (2016)
Google Scholar
Sueiras, J., Ruiz, V., Sanchez, A., Velez, J.F.: Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing 289, 119–128 (2018)
Article Google Scholar
Bluche, T., Louradour, J., Messina, R.: Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 1050–1055 (2017)
Google Scholar
Bluche, T.: Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. Neural Inf. Process. Syst. (2016)
Google Scholar
Ly, N.T., Nguyen, C.T., Nakagawa, M.: An attention-based end-to-end model for multiple text lines recognition in japanese historical documents. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 629–634 (2019)
Google Scholar
Ly, N.T., Nguyen, C.T., Nakagawa, M.: An attention-based row-column encoder-decoder model for text recognition in Japanese historical documents. Pattern Recognit. Lett. 136, 134–141 (2020)
Article Google Scholar
Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D., Shen, H.T.: Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2735–2744 (2019)
Google Scholar
Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. arXiv (2020)
Google Scholar
Ly, N.T., Nguyen, C.T., Nakagawa, M.: Attention augmented convolutional recurrent network for handwritten Japanese text recognition. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 163–168 (2020)
Google Scholar
Ngo, T.T., Nguyen, H.T., Ly, N.T., Nakagawa, M.: Recurrent neural network transducer for Japanese and Chinese offline handwritten text recognition. arXiv (2021)
Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the ACM International Conference Proceeding Series, pp. 369–376 (2006)
Google Scholar
Kuzushiji dataset. http://codh.rois.ac.jp/char-shape/book/. Accessed 07 Mar 2020
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the International Conference on Machine Learning, ICML 2015, pp. 448–456 (2015)
Google Scholar
Moysset, B., Messina, R.: Are 2D-LSTM really dead for offline text recognition? In: International Journal on Document Analysis and Recognition, pp. 193–208 (2019)
Google Scholar

Download references

Acknowledgments

This research is being partially supported by the grant-in-aid for scientific research (S) 18H05221 and (A) 18H03597.

Author information

Authors and Affiliations

Tokyo University of Agriculture and Technology, Tokyo, Japan
Nam Tuan Ly, Trung Tan Ngo & Masaki Nakagawa

Authors

Nam Tuan Ly
View author publications
You can also search for this author in PubMed Google Scholar
Trung Tan Ngo
View author publications
You can also search for this author in PubMed Google Scholar
Masaki Nakagawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nam Tuan Ly .

Editor information

Editors and Affiliations

Korea University, Seoul, Korea (Republic of)
Christian Wallraven
Nanjing University, Nanjing, China
Qingshan Liu
Osaka University, Osaka, Japan
Hajime Nagahara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ly, N.T., Ngo, T.T., Nakagawa, M. (2022). A Self-attention Based Model for Offline Handwritten Text Recognition. In: Wallraven, C., Liu, Q., Nagahara, H. (eds) Pattern Recognition. ACPR 2021. Lecture Notes in Computer Science, vol 13189. Springer, Cham. https://doi.org/10.1007/978-3-031-02444-3_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-02444-3_27
Published: 10 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-02443-6
Online ISBN: 978-3-031-02444-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics