Skip to main content

A Multi-feature Fusion Approach for Words Recognition of Ancient Mongolian Documents

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15037))

Included in the following conference series:

  • 122 Accesses

Abstract

Ancient Mongolian documents are valuable repositories of historical information and cultural significance. Analyzing these documents effectively demands specialized recognition research, as the absence of some words in current lexicons makes recognizing out-of-vocabulary (OOV) words crucial. To better recognize ancient Mongolian documents, an end-to-end approach based on multi-feature fusion called Ancient Mongolian Documents Recognition Unit (AMDRU) is proposed in this paper. This approach improves the ability of the model to understand images in ancient documents by leveraging information from word images at different scales. AMDRU receives word images and processes them through a custom-designed feature extractor to capture multi-scale structural details. These features are then input into an encoder utilizing the efficient additive attention mechanism, enabling superior understanding and representation of essential information. The encoded features are passed to a Transformer decoder to convert image data into text. The final output is a prediction of the corresponding strings. To address the uneven data distribution in ancient documents and enhance the learning of rare word images, the asymmetric loss is utilized, which significantly improves the model’s ability to learn from word images and boosts recognition performance. Experimental results demonstrate that our proposed approach can capture the structural features of characters in ancient Mongolian documents more accurately, and its recognition performance outperforms existing methods. It shows particularly better performance in the challenging task of recognizing OOV words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S.J., Lee, H.: What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: IEEE International Conference on Computer Vision, pp. 4715–4723 (2019)

    Google Scholar 

  2. Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733 (2016)

  3. Dai, R., Liu, C., Xiao, B.: Chinese character recognition: history, status and prospects. Front. Comput. Sci. China 1, 126–136 (2007)

    Article  Google Scholar 

  4. Douglas, D.H., Peucker, T.K.: Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: Int. J. Geogr. Inform. Geovisual. 10(2), 112–122 (1973)

    Google Scholar 

  5. Gao, G., Su, X., Wei, H., Gong, Y.: Classical mongolian words recognition in historical document. In: International Conference on Document Analysis and Recognition, pp. 692–697. IEEE (2011)

    Google Scholar 

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  7. Kang, Y., Wei, H., Zhang, H., Gao, G.: Woodblock-printing Mongolian words recognition by Bi-lSTM with attention mechanism. In: International Conference on Document Analysis and Recognition, pp. 910–915. IEEE (2019)

    Google Scholar 

  8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inform. Process. Syst. 25 (2012)

    Google Scholar 

  9. Li, B., Peng, L., Ji, J.: Historical Chinese character recognition method based on style transfer mapping. In: International Workshop on Document Analysis Systems, pp. 96–100. IEEE (2014)

    Google Scholar 

  10. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

    Google Scholar 

  11. Lu, M., Bao, F., Zhang, H., Gao, G.: The image and ground truth dataset of Mongolian movable-type newspapers for text recognition. Int. J. Doc. Anal. Recogn. 1–13 (2023)

    Google Scholar 

  12. Ma, W., Zhang, H., Jin, L., Wu, S., Wang, J., Wang, Y.: Joint layout analysis, character detection and recognition for historical document digitization. In: International Conference on Frontiers in Handwriting Recognition, pp. 31–36. IEEE (2020)

    Google Scholar 

  13. Ramer, U.: An iterative procedure for the polygonal approximation of plane curves. Comput. Graphics Image Process. 1(3), 244–256 (1972)

    Article  Google Scholar 

  14. Ridnik, T., Ben-Baruch, E., Zamir, N., Noy, A., Friedman, I., Protter, M., Zelnik-Manor, L.: Asymmetric loss for multi-label classification. In: IEEE International Conference on Computer Vision, pp. 82–91 (2021)

    Google Scholar 

  15. Shaker, A., Maaz, M., Rasheed, H., Khan, S., Yang, M.H., Khan, F.S.: Swiftformer: efficient additive attention for transformer-based real-time mobile vision applications. arXiv preprint arXiv:2303.15446 (2023)

  16. Stallings, W.: Approaches to Chinese character recognition. Pattern Recogn. 8(2), 87–98 (1976)

    Article  Google Scholar 

  17. Su, X., Gao, G., Wei, H., Bao, F.: Enhancing the Mongolian historical document recognition system with multiple knowledge-based strategies. In: International Conference on Neural Information Processing, pp. 536–544. Springer (2015)

    Google Scholar 

  18. Su, X., Gao, G., Wei, H., Bao, F.: A knowledge-based recognition system for historical Mongolian documents. Int. J. Doc. Anal. Recogn. 19, 221–235 (2016)

    Article  Google Scholar 

  19. Sun, S., Wei, H., Wang, Y.: A hybrid approach using convolution and transformer for Mongolian ancient documents recognition. In: International Conference on Neural Information Processing, pp. 165–176. Springer (2023)

    Google Scholar 

  20. Van Phan, T., Cong Nguyen, K., Nakagawa, M.: A nom historical document recognition system for digital archiving. Int. J. Doc. Anal. Recogn. 19, 49–64 (2016)

    Article  Google Scholar 

  21. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inform. Process. Syst. 30 (2017)

    Google Scholar 

  22. Wei, H., Gao, G.: A holistic recognition approach for woodblock-print mongolian words based on convolutional neural network. In: International Conference on Image Processing, pp. 2726–2730. IEEE (2019)

    Google Scholar 

  23. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision, pp. 3–19 (2018)

    Google Scholar 

  24. Yang, H., Jin, L., Sun, J.: Recognition of chinese text in historical documents with page-level annotations. In: International Conference on Frontiers in Handwriting Recognition, pp. 199–204. IEEE (2018)

    Google Scholar 

Download references

Acknowledgment

This study is supported by the Project for Science and Technology of Inner Mongolia Autonomous Region under Grant 2019GG281, the Natural Science Foundation of Inner Mongolia Autonomous Region under Grant 2024MS06029, and the Program for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region under Grant NJYT-20-A05.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongxi Wei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, S., Wei, H., Wang, Y., He, C. (2025). A Multi-feature Fusion Approach for Words Recognition of Ancient Mongolian Documents. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15037. Springer, Singapore. https://doi.org/10.1007/978-981-97-8511-7_24

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-8511-7_24

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-8510-0

  • Online ISBN: 978-981-97-8511-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics