A Multi-feature Fusion Approach for Words Recognition of Ancient Mongolian Documents

Sun, Shiwen; Wei, Hongxi; Wang, Yiming; He, Chao

doi:10.1007/978-981-97-8511-7_24

Shiwen Sun^15,16,17,
Hongxi Wei^15,16,17,
Yiming Wang^15,16,17 &
…
Chao He^15,16,17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15037))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

122 Accesses

Abstract

Ancient Mongolian documents are valuable repositories of historical information and cultural significance. Analyzing these documents effectively demands specialized recognition research, as the absence of some words in current lexicons makes recognizing out-of-vocabulary (OOV) words crucial. To better recognize ancient Mongolian documents, an end-to-end approach based on multi-feature fusion called Ancient Mongolian Documents Recognition Unit (AMDRU) is proposed in this paper. This approach improves the ability of the model to understand images in ancient documents by leveraging information from word images at different scales. AMDRU receives word images and processes them through a custom-designed feature extractor to capture multi-scale structural details. These features are then input into an encoder utilizing the efficient additive attention mechanism, enabling superior understanding and representation of essential information. The encoded features are passed to a Transformer decoder to convert image data into text. The final output is a prediction of the corresponding strings. To address the uneven data distribution in ancient documents and enhance the learning of rare word images, the asymmetric loss is utilized, which significantly improves the model’s ability to learn from word images and boosts recognition performance. Experimental results demonstrate that our proposed approach can capture the structural features of characters in ancient Mongolian documents more accurately, and its recognition performance outperforms existing methods. It shows particularly better performance in the challenging task of recognizing OOV words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Hybrid Approach Using Convolution and Transformer for Mongolian Ancient Documents Recognition

Pho(SC)Net: An Approach Towards Zero-Shot Word Image Recognition in Historical Documents

HWNet v2: an efficient word image representation for handwritten documents

Article 31 July 2019

References

Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S.J., Lee, H.: What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: IEEE International Conference on Computer Vision, pp. 4715–4723 (2019)
Google Scholar
Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733 (2016)
Dai, R., Liu, C., Xiao, B.: Chinese character recognition: history, status and prospects. Front. Comput. Sci. China 1, 126–136 (2007)
Article Google Scholar
Douglas, D.H., Peucker, T.K.: Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: Int. J. Geogr. Inform. Geovisual. 10(2), 112–122 (1973)
Google Scholar
Gao, G., Su, X., Wei, H., Gong, Y.: Classical mongolian words recognition in historical document. In: International Conference on Document Analysis and Recognition, pp. 692–697. IEEE (2011)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Kang, Y., Wei, H., Zhang, H., Gao, G.: Woodblock-printing Mongolian words recognition by Bi-lSTM with attention mechanism. In: International Conference on Document Analysis and Recognition, pp. 910–915. IEEE (2019)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inform. Process. Syst. 25 (2012)
Google Scholar
Li, B., Peng, L., Ji, J.: Historical Chinese character recognition method based on style transfer mapping. In: International Workshop on Document Analysis Systems, pp. 96–100. IEEE (2014)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Lu, M., Bao, F., Zhang, H., Gao, G.: The image and ground truth dataset of Mongolian movable-type newspapers for text recognition. Int. J. Doc. Anal. Recogn. 1–13 (2023)
Google Scholar
Ma, W., Zhang, H., Jin, L., Wu, S., Wang, J., Wang, Y.: Joint layout analysis, character detection and recognition for historical document digitization. In: International Conference on Frontiers in Handwriting Recognition, pp. 31–36. IEEE (2020)
Google Scholar
Ramer, U.: An iterative procedure for the polygonal approximation of plane curves. Comput. Graphics Image Process. 1(3), 244–256 (1972)
Article Google Scholar
Ridnik, T., Ben-Baruch, E., Zamir, N., Noy, A., Friedman, I., Protter, M., Zelnik-Manor, L.: Asymmetric loss for multi-label classification. In: IEEE International Conference on Computer Vision, pp. 82–91 (2021)
Google Scholar
Shaker, A., Maaz, M., Rasheed, H., Khan, S., Yang, M.H., Khan, F.S.: Swiftformer: efficient additive attention for transformer-based real-time mobile vision applications. arXiv preprint arXiv:2303.15446 (2023)
Stallings, W.: Approaches to Chinese character recognition. Pattern Recogn. 8(2), 87–98 (1976)
Article Google Scholar
Su, X., Gao, G., Wei, H., Bao, F.: Enhancing the Mongolian historical document recognition system with multiple knowledge-based strategies. In: International Conference on Neural Information Processing, pp. 536–544. Springer (2015)
Google Scholar
Su, X., Gao, G., Wei, H., Bao, F.: A knowledge-based recognition system for historical Mongolian documents. Int. J. Doc. Anal. Recogn. 19, 221–235 (2016)
Article Google Scholar
Sun, S., Wei, H., Wang, Y.: A hybrid approach using convolution and transformer for Mongolian ancient documents recognition. In: International Conference on Neural Information Processing, pp. 165–176. Springer (2023)
Google Scholar
Van Phan, T., Cong Nguyen, K., Nakagawa, M.: A nom historical document recognition system for digital archiving. Int. J. Doc. Anal. Recogn. 19, 49–64 (2016)
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inform. Process. Syst. 30 (2017)
Google Scholar
Wei, H., Gao, G.: A holistic recognition approach for woodblock-print mongolian words based on convolutional neural network. In: International Conference on Image Processing, pp. 2726–2730. IEEE (2019)
Google Scholar
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision, pp. 3–19 (2018)
Google Scholar
Yang, H., Jin, L., Sun, J.: Recognition of chinese text in historical documents with page-level annotations. In: International Conference on Frontiers in Handwriting Recognition, pp. 199–204. IEEE (2018)
Google Scholar

Download references

Acknowledgment

This study is supported by the Project for Science and Technology of Inner Mongolia Autonomous Region under Grant 2019GG281, the Natural Science Foundation of Inner Mongolia Autonomous Region under Grant 2024MS06029, and the Program for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region under Grant NJYT-20-A05.

Author information

Authors and Affiliations

School of Computer Science, Inner Mongolia University, Hohhot, 010010, China
Shiwen Sun, Hongxi Wei, Yiming Wang & Chao He
Provincial Key Laboratory of Mongolian Information Processing Technology, Hohhot, 010010, China
Shiwen Sun, Hongxi Wei, Yiming Wang & Chao He
National and Local Joint Engineering Research Center of Mongolian Information Processing Technology, Hohhot, 010010, China
Shiwen Sun, Hongxi Wei, Yiming Wang & Chao He

Authors

Shiwen Sun
View author publications
You can also search for this author in PubMed Google Scholar
Hongxi Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yiming Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chao He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongxi Wei .

Editor information

Editors and Affiliations

Peking University, Beijing, Beijing, China
Zhouchen Lin
Nankai University, Tianjin, China
Ming-Ming Cheng
Chinese Academy of Sciences, Beijing, China
Ran He
Xinjiang University, Ürümqi, Xinjiang, China
Kurban Ubul
Xinjiang University, Ürümqi, China
Wushouer Silamu
Peking University, Beijing, China
Hongbin Zha
Tsinghua University, Beijing, China
Jie Zhou
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, S., Wei, H., Wang, Y., He, C. (2025). A Multi-feature Fusion Approach for Words Recognition of Ancient Mongolian Documents. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15037. Springer, Singapore. https://doi.org/10.1007/978-981-97-8511-7_24

Download citation

DOI: https://doi.org/10.1007/978-981-97-8511-7_24
Published: 03 November 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8510-0
Online ISBN: 978-981-97-8511-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Multi-feature Fusion Approach for Words Recognition of Ancient Mongolian Documents

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Hybrid Approach Using Convolution and Transformer for Mongolian Ancient Documents Recognition

Pho(SC)Net: An Approach Towards Zero-Shot Word Image Recognition in Historical Documents

HWNet v2: an efficient word image representation for handwritten documents

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Multi-feature Fusion Approach for Words Recognition of Ancient Mongolian Documents

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Hybrid Approach Using Convolution and Transformer for Mongolian Ancient Documents Recognition

Pho(SC)Net: An Approach Towards Zero-Shot Word Image Recognition in Historical Documents

HWNet v2: an efficient word image representation for handwritten documents

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation