Skip to main content

A Self-attention Based Model for Offline Handwritten Text Recognition

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13189))

Abstract

Offline handwritten text recognition is an important part of document analysis and it has been receiving a lot of attention from numerous researchers for decades. In this paper, we present a self-attention-based model for offline handwritten textline recognition. The proposed model consists of three main components: a feature extractor by CNN; an encoder by a BLSTM network and a self-attention module; and a decoder by CTC. The self-attention module is complementary to RNN in the encoder and helps the encoder to capture long-range and multi-level dependencies across an input sequence. According to the extensive experiments on the two datasets of IAM Handwriting and Kuzushiji, the proposed model achieves better accuracy than the state-of-the-art models. The self-attention map visualization shows that the self-attention mechanism helps the encoder capture long-range and multi-level dependencies across an input sequence.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Nguyen, K.C., Nguyen, C.T., Nakagawa, M.: A segmentation method of single- and multiple-touching characters in offline handwritten Japanese text recognition. IEICE Trans. Inf. Syst. E100.D, 2962–2972 (2017)

    Google Scholar 

  2. Wang, Q.-F., Yin, F., Liu, C.-L.: Handwritten Chinese text recognition by integrating multiple contexts. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1469–1481 (2012)

    Article  Google Scholar 

  3. El-Yacoubi, A., Gilloux, M., Sabourin, R., Suen, C.Y.: An HMM-based approach for off-line unconstrained handwritten word modeling and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 21, 752–760 (1999)

    Article  Google Scholar 

  4. España-Boquera, S., Castro-Bleda, M.J., Gorbe-Moya, J., Zamora-Martinez, F.: Improving offline handwritten text recognition with hybrid HMM/ANN models. IEEE Trans. Pattern Anal. Mach. Intell. 33, 767–779 (2011)

    Article  Google Scholar 

  5. Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31, 855–868 (2009)

    Article  Google Scholar 

  6. Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances Neural Information Processing System 21, NIPS 2021, pp. 545–552 (2008)

    Google Scholar 

  7. Pham, V., Bluche, T., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 285–290 (2014)

    Google Scholar 

  8. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2298–2304 (2017)

    Article  Google Scholar 

  9. Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 646–651 (2017)

    Google Scholar 

  10. Ly, N.-T., Nguyen, C.-T., Nguyen, K.-C., Nakagawa, M.: Deep convolutional recurrent network for segmentation-free offline handwritten Japanese text recognition. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 5–9 (2017)

    Google Scholar 

  11. Ly, N.T., Nguyen, C.T., Nakagawa, M.: Training an end-to-end model for offline handwritten japanese text recognition by generated synthetic patterns. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 74–79 (2018)

    Google Scholar 

  12. Ly, N.T., Nguyen, K.C., Nguyen, C.T., Nakagawa, M.: Recognition of anomalously deformed kana sequences in Japanese historical documents. IEICE Trans. Inf. Syst. E102D (2019)

    Google Scholar 

  13. Puigcerver, J.: Are Multidimensional recurrent layers really necessary for handwritten text recognition? In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 67–72 (2017)

    Google Scholar 

  14. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5999–6009 (2017)

    Google Scholar 

  15. Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421 (2015)

    Google Scholar 

  16. Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945–4949 (2016)

    Google Scholar 

  17. Sueiras, J., Ruiz, V., Sanchez, A., Velez, J.F.: Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing 289, 119–128 (2018)

    Article  Google Scholar 

  18. Bluche, T., Louradour, J., Messina, R.: Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 1050–1055 (2017)

    Google Scholar 

  19. Bluche, T.: Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. Neural Inf. Process. Syst. (2016)

    Google Scholar 

  20. Ly, N.T., Nguyen, C.T., Nakagawa, M.: An attention-based end-to-end model for multiple text lines recognition in japanese historical documents. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 629–634 (2019)

    Google Scholar 

  21. Ly, N.T., Nguyen, C.T., Nakagawa, M.: An attention-based row-column encoder-decoder model for text recognition in Japanese historical documents. Pattern Recognit. Lett. 136, 134–141 (2020)

    Article  Google Scholar 

  22. Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D., Shen, H.T.: Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2735–2744 (2019)

    Google Scholar 

  23. Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. arXiv (2020)

    Google Scholar 

  24. Ly, N.T., Nguyen, C.T., Nakagawa, M.: Attention augmented convolutional recurrent network for handwritten Japanese text recognition. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 163–168 (2020)

    Google Scholar 

  25. Ngo, T.T., Nguyen, H.T., Ly, N.T., Nakagawa, M.: Recurrent neural network transducer for Japanese and Chinese offline handwritten text recognition. arXiv (2021)

    Google Scholar 

  26. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the ACM International Conference Proceeding Series, pp. 369–376 (2006)

    Google Scholar 

  27. Kuzushiji dataset. http://codh.rois.ac.jp/char-shape/book/. Accessed 07 Mar 2020

  28. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the International Conference on Machine Learning, ICML 2015, pp. 448–456 (2015)

    Google Scholar 

  29. Moysset, B., Messina, R.: Are 2D-LSTM really dead for offline text recognition? In: International Journal on Document Analysis and Recognition, pp. 193–208 (2019)

    Google Scholar 

Download references

Acknowledgments

This research is being partially supported by the grant-in-aid for scientific research (S) 18H05221 and (A) 18H03597.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nam Tuan Ly .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ly, N.T., Ngo, T.T., Nakagawa, M. (2022). A Self-attention Based Model for Offline Handwritten Text Recognition. In: Wallraven, C., Liu, Q., Nagahara, H. (eds) Pattern Recognition. ACPR 2021. Lecture Notes in Computer Science, vol 13189. Springer, Cham. https://doi.org/10.1007/978-3-031-02444-3_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-02444-3_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-02443-6

  • Online ISBN: 978-3-031-02444-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics