LABT: A Sequence-to-Sequence Model for Mongolian Handwritten Text Recognition with Local Aggregation BiLSTM and Transformer

Li, Yu; Wei, Hongxi; Sun, Shiwen

doi:10.1007/978-3-031-70536-6_21

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14805))

Included in the following conference series:

International Conference on Document Analysis and Recognition

468 Accesses

Abstract

Mongolian handwritten text recognition poses challenges with the unique characteristics of Mongolian script, its large vocabulary, and the presence of out-of-vocabulary (OOV) words. This paper proposes a model that uses local aggregation BiLSTM for sequence modeling of visual features and Transformer for word prediction. Specifically, we introduce a local aggregation operation in BiLSTM (Bidirectional Long and Short Term Memory) to improve contextual understanding by aggregating adjacent information at each time step. The improved BiLSTM is able to capture context-dependent and letter shape changes that occur in different contexts. It effectively addresses the difficulty of accurately identifying variable letters and generating OOV words without relying on predefined words during training. The contextual features extracted by BiLSTM are passed through multiple layers of Transformer’s encoder and decoder. At each layer, the representations of the previous layer are accessible, allowing layered representations to be refined and improved. By using hierarchical representations, accurate predictions can be made even in large vocabulary text recognition tasks. Our proposed model achieves state-of-the-art performance on two commonly used Mongolian handwritten text recognition datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

S5TR: Simple Single Stage Sequencer for Scene Text Recognition

RETRACTED ARTICLE: Effective offline handwritten text recognition model based on a sequence-to-sequence approach with CNN–RNN networks

Article 02 January 2021

Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes

References

An, Y., Xia, X., et al.: Chinese clinical named entity recognition via multi-head self-attention based bilstm-crf. Artif. Intell. Med. 127, 102–114 (2022)
Article Google Scholar
Baek, J., Kim, G., et al.: What is wrong with scene text recognition model comparisons? dataset and model analysis. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4715–4723. IEEE (2019)
Google Scholar
Daoerji, F., Guanglai, G., et al.: DNN-HMM for large vocabulary Mongolian offline handwriting recognition. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, pp. 72–77. IEEE (2016)
Google Scholar
Daoerji, F., Guanglai, G., et al.: MHW Mongolian offline handwritten dataset and its application. J. Chin. Inf. Process. 32(1), 89–95 (2018)
Google Scholar
Fu, J., Liu, J., et al.: Dual attention network for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154. IEEE (2019)
Google Scholar
He, K., Zhang, X., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)
Google Scholar
Hu, H., Wei, H., et al.: The CNN based machine-printed traditional Mongolian characters recognition. In: Proceedings of Chinese Control Conference, pp. 3937–3941. IEEE (2017)
Google Scholar
Jaderberg, M., Simonyan, K., et al.: Spatial transformer networks. In: Proceedings of Neural Information Processing Systems, pp. 2017–2025. MIT (2015)
Google Scholar
Li, X., Wang, W., et al.: Selective kernel networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 510–519. IEEE (2019)
Google Scholar
Liu, W., et al.: Star-net: a spatial attention residue network for scene text recognition. In: Proceedings of British Machine Vision Conference, pp. 19–33. Springer (2016)
Google Scholar
Ocasio, W.: Attention to attention. Organiz. Sci. 22(5), 1286–1296 (2011)
Article Google Scholar
Riaz, N., Arbab, H., et al.: Conv-transformer architecture for unconstrained off-line urdu handwriting recognition. Int. J. Doc. Anal. Recogn. 25(4), 373–384 (2022)
Article Google Scholar
Shaiq, M.D., Cheema, M.D.A., et al.: Transformer based Urdu handwritten text optical character reader. arXiv preprint arXiv:2206.04575 (2022)
Sheng, F., Chen, Z., et al.: NRTR: a no-recurrence sequence-to-sequence model for scene text recognition. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 781–786. IEEE (2019)
Google Scholar
Shi, B., Wang, X., et al.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176. IEEE (2016)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Proceedings of Neural Information Processing Systems, pp. 5998–6008. MIT (2017)
Google Scholar
Wei, H., Gao, M., et al.: Named entity recognition from biomedical texts using a fusion attention-based bilstm-crf. IEEE Access 7, 73627–73636 (2019)
Article Google Scholar
Wei, H., Gao, G.: A keyword retrieval system for historical mongolian document images. Int. J. Doc. Anal. Recogn. 17, 33–45 (2014)
Article Google Scholar
Wei, H., Liu, C., Zhang, H., Bao, F., Gao, G.: End-to-end model for offline handwritten Mongolian word recognition. In: Tang, J., Kan, M.-Y., Zhao, D., Li, S., Zan, H. (eds.) NLPCC 2019. LNCS (LNAI), vol. 11839, pp. 220–230. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32236-6_19
Chapter Google Scholar
Woo, S., Park, J., et al.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision, pp. 3–19. IEEE (2018)
Google Scholar
Wu, H., Daoerji, F., et al.: A multi-scale based Mongolian offline handwriting recognition method. J. Chin. Inf. Process. 36(10), 81 (2022)
Google Scholar
Yang, L., Wang, P., et al.: A holistic representation guided attention network for scene text recognition. Neurocomputing 414, 67–75 (2020)
Article Google Scholar
Zhang, H., Wu, C., et al.: Resnest: split-attention networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2736–2746. IEEE (2022)
Google Scholar
Zhao, W., Gao, L.: CoMER: modeling coverage for transformer-based handwritten mathematical expression recognition. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022, Part XXVIII, pp. 392–408. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_23

Download references

Acknowledgement

This study is supported by the Project for Science and Technology of Inner Mongolia Autonomous Region under Grant 2019GG281, the Natural Science Foundation of Inner Mongolia Autonomous Region under Grant 2019ZD14, the Program for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region under Grant NJYT-20-A05, the fund of supporting the reform and development of local universities (Disciplinary Construction) and construction project of “Inner Mongolia Science and Technology Achievement Transfer and Transformation Demonstration Zone, University Collaborative Innovation Base, and University Entrepreneurship Training Base” (Supercomputing Power Project: 21300-231510).

Author information

Authors and Affiliations

School of Computer Science, Inner Mongolia University, Hohhot, 010010, China
Yu Li, Hongxi Wei & Shiwen Sun
Provincial Key Laboratory of Mongolian Information Processing Technology, Hohhot, 010010, China
Yu Li, Hongxi Wei & Shiwen Sun
National and Local Joint Engineering Research Center of Mongolian Information Processing Technology, Hohhot, 010010, China
Yu Li, Hongxi Wei & Shiwen Sun

Authors

Yu Li
View author publications
You can also search for this author in PubMed Google Scholar
Hongxi Wei
View author publications
You can also search for this author in PubMed Google Scholar
Shiwen Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongxi Wei .

Editor information

Editors and Affiliations

Luleå Tekniska Universitet, Luleå, Sweden
Elisa H. Barney Smith
Luleå Tekniska Universitet, Luleå, Sweden
Marcus Liwicki
Tsinghua University, Beijing, China
Liangrui Peng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Wei, H., Sun, S. (2024). LABT: A Sequence-to-Sequence Model for Mongolian Handwritten Text Recognition with Local Aggregation BiLSTM and Transformer. In: Barney Smith, E.H., Liwicki, M., Peng, L. (eds) Document Analysis and Recognition - ICDAR 2024. ICDAR 2024. Lecture Notes in Computer Science, vol 14805. Springer, Cham. https://doi.org/10.1007/978-3-031-70536-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-70536-6_21
Published: 03 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70535-9
Online ISBN: 978-3-031-70536-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

LABT: A Sequence-to-Sequence Model for Mongolian Handwritten Text Recognition with Local Aggregation BiLSTM and Transformer

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

S5TR: Simple Single Stage Sequencer for Scene Text Recognition

RETRACTED ARTICLE: Effective offline handwritten text recognition model based on a sequence-to-sequence approach with CNN–RNN networks

Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

LABT: A Sequence-to-Sequence Model for Mongolian Handwritten Text Recognition with Local Aggregation BiLSTM and Transformer

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

S5TR: Simple Single Stage Sequencer for Scene Text Recognition

RETRACTED ARTICLE: Effective offline handwritten text recognition model based on a sequence-to-sequence approach with CNN–RNN networks

Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation