Abstract
Ancient Mongolian documents are valuable repositories of historical information and cultural significance. Analyzing these documents effectively demands specialized recognition research, as the absence of some words in current lexicons makes recognizing out-of-vocabulary (OOV) words crucial. To better recognize ancient Mongolian documents, an end-to-end approach based on multi-feature fusion called Ancient Mongolian Documents Recognition Unit (AMDRU) is proposed in this paper. This approach improves the ability of the model to understand images in ancient documents by leveraging information from word images at different scales. AMDRU receives word images and processes them through a custom-designed feature extractor to capture multi-scale structural details. These features are then input into an encoder utilizing the efficient additive attention mechanism, enabling superior understanding and representation of essential information. The encoded features are passed to a Transformer decoder to convert image data into text. The final output is a prediction of the corresponding strings. To address the uneven data distribution in ancient documents and enhance the learning of rare word images, the asymmetric loss is utilized, which significantly improves the model’s ability to learn from word images and boosts recognition performance. Experimental results demonstrate that our proposed approach can capture the structural features of characters in ancient Mongolian documents more accurately, and its recognition performance outperforms existing methods. It shows particularly better performance in the challenging task of recognizing OOV words.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S.J., Lee, H.: What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: IEEE International Conference on Computer Vision, pp. 4715–4723 (2019)
Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733 (2016)
Dai, R., Liu, C., Xiao, B.: Chinese character recognition: history, status and prospects. Front. Comput. Sci. China 1, 126–136 (2007)
Douglas, D.H., Peucker, T.K.: Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: Int. J. Geogr. Inform. Geovisual. 10(2), 112–122 (1973)
Gao, G., Su, X., Wei, H., Gong, Y.: Classical mongolian words recognition in historical document. In: International Conference on Document Analysis and Recognition, pp. 692–697. IEEE (2011)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Kang, Y., Wei, H., Zhang, H., Gao, G.: Woodblock-printing Mongolian words recognition by Bi-lSTM with attention mechanism. In: International Conference on Document Analysis and Recognition, pp. 910–915. IEEE (2019)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inform. Process. Syst. 25 (2012)
Li, B., Peng, L., Ji, J.: Historical Chinese character recognition method based on style transfer mapping. In: International Workshop on Document Analysis Systems, pp. 96–100. IEEE (2014)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Lu, M., Bao, F., Zhang, H., Gao, G.: The image and ground truth dataset of Mongolian movable-type newspapers for text recognition. Int. J. Doc. Anal. Recogn. 1–13 (2023)
Ma, W., Zhang, H., Jin, L., Wu, S., Wang, J., Wang, Y.: Joint layout analysis, character detection and recognition for historical document digitization. In: International Conference on Frontiers in Handwriting Recognition, pp. 31–36. IEEE (2020)
Ramer, U.: An iterative procedure for the polygonal approximation of plane curves. Comput. Graphics Image Process. 1(3), 244–256 (1972)
Ridnik, T., Ben-Baruch, E., Zamir, N., Noy, A., Friedman, I., Protter, M., Zelnik-Manor, L.: Asymmetric loss for multi-label classification. In: IEEE International Conference on Computer Vision, pp. 82–91 (2021)
Shaker, A., Maaz, M., Rasheed, H., Khan, S., Yang, M.H., Khan, F.S.: Swiftformer: efficient additive attention for transformer-based real-time mobile vision applications. arXiv preprint arXiv:2303.15446 (2023)
Stallings, W.: Approaches to Chinese character recognition. Pattern Recogn. 8(2), 87–98 (1976)
Su, X., Gao, G., Wei, H., Bao, F.: Enhancing the Mongolian historical document recognition system with multiple knowledge-based strategies. In: International Conference on Neural Information Processing, pp. 536–544. Springer (2015)
Su, X., Gao, G., Wei, H., Bao, F.: A knowledge-based recognition system for historical Mongolian documents. Int. J. Doc. Anal. Recogn. 19, 221–235 (2016)
Sun, S., Wei, H., Wang, Y.: A hybrid approach using convolution and transformer for Mongolian ancient documents recognition. In: International Conference on Neural Information Processing, pp. 165–176. Springer (2023)
Van Phan, T., Cong Nguyen, K., Nakagawa, M.: A nom historical document recognition system for digital archiving. Int. J. Doc. Anal. Recogn. 19, 49–64 (2016)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inform. Process. Syst. 30 (2017)
Wei, H., Gao, G.: A holistic recognition approach for woodblock-print mongolian words based on convolutional neural network. In: International Conference on Image Processing, pp. 2726–2730. IEEE (2019)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision, pp. 3–19 (2018)
Yang, H., Jin, L., Sun, J.: Recognition of chinese text in historical documents with page-level annotations. In: International Conference on Frontiers in Handwriting Recognition, pp. 199–204. IEEE (2018)
Acknowledgment
This study is supported by the Project for Science and Technology of Inner Mongolia Autonomous Region under Grant 2019GG281, the Natural Science Foundation of Inner Mongolia Autonomous Region under Grant 2024MS06029, and the Program for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region under Grant NJYT-20-A05.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sun, S., Wei, H., Wang, Y., He, C. (2025). A Multi-feature Fusion Approach for Words Recognition of Ancient Mongolian Documents. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15037. Springer, Singapore. https://doi.org/10.1007/978-981-97-8511-7_24
Download citation
DOI: https://doi.org/10.1007/978-981-97-8511-7_24
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8510-0
Online ISBN: 978-981-97-8511-7
eBook Packages: Computer ScienceComputer Science (R0)