Skip to main content
Log in

A knowledge-based recognition system for historical Mongolian documents

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

This paper proposes a knowledge-based system to recognize historical Mongolian documents in which the words exhibit remarkable variation and character overlapping. According to the characteristics of Mongolian word formation, the system combines a holistic scheme and a segmentation-based scheme for word recognition. Several types of words and isolated suffixes that cannot be segmented into glyph-units or do not require segmentation are recognized using the holistic scheme. The remaining words are recognized using the segmentation-based scheme, which is the focus of this paper. We exploit the knowledge of the glyph characteristics to segment words into glyph-units in the segmentation-based scheme. Convolutional neural networks are employed not only for word recognition in the holistic scheme, but also for glyph-unit recognition in the segmentation-based scheme. Based on the analysis of recognition errors in the segmentation-based scheme, the system is enhanced by integrating three strategies into glyph-unit recognition. These strategies involve incorporating baseline information, glyph-unit grouping, and recognizing under-segmented and over-segmented fragments. The proposed system achieves 80.86 % word accuracy on the Mongolian Kanjur test samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Li, W., Gao, G., Hou, H., Li, Z.: A design and implementation of element segmentation in the recognition of printed mongolian characters. J. Inn. Mong. Univ. 34(3), 357–360 (2003)

    Google Scholar 

  2. Wei, H.: Study of key techniques in the printed mongolian character recognition. Ph.D. thesis (2006)

  3. Wei, H., Gao, G.: Machine-printed traditional mongolian characters recognition using bp neural networks. In: Proceedings of the International Conference on Computational Intelligence and Software Engineering (CiSE), pp. 1–7 (2009)

  4. Peng, L., Liu, C., Ding, X., Jin, J., Wu, Y., Wang, H., Bao, Y.: Multi-font printed mongolian document recognition system. Int. J. Doc. Anal. Recognit. 13(2), 93–106 (2010)

    Article  Google Scholar 

  5. Gao, G., Su, X., Wei, H., Gong, Y.: Classical mongolian words recognition in historical document. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 692–697. IEEE Computer Society (2011)

  6. Wei, H., Gao, G.: A keyword retrieval system for historical mongolian document images. Int. J. Doc. Anal. Recognit. 17(1), 33–45 (2014)

    Article  Google Scholar 

  7. Quejingzabu.: Mongolian Unicode. Inner Mongolia University Press, Hohhot (2000)

  8. Aghbari, Z.A., Brook, S.: Hah manuscripts: a holistic paradigm for classifying and retrieving historical arabic handwritten documents. Expert Syst. Appl. 36(8), 10942–10951 (2009)

    Article  Google Scholar 

  9. Sagheer, M.W., He, C.L., Nobile, N., Suen, C.Y.: Holistic urdu handwritten word recognition using support vector machine. In: Proceedings of the 20th International Conference on Pattern Recognition (ICPR), pp. 1900–1903 (2010)

  10. Ding, X., Liu, H.: Segmentation-driven offline handwritten chinese and arabic script recognition. In: Doermann, D., Jaeger, S. (eds.) Proceedings of the 2006 Conference on Arabic and Chinese Handwriting Recognition (SACH), pp. 196–217. Springer, Berlin (2006)

  11. Lee, H., Verma, B.: Binary segmentation algorithm for english cursive handwriting recognition. Pattern Recognit. 45(4), 1306–1317 (2012)

    Article  Google Scholar 

  12. Zand, M., Nilchi, A.N., Monadjemi, S.A.: Recognition-based segmentation in persian character recognition. In: Proceedings of World Academy of Science, Engineering and Technology, vol. 2. pp 162–166 (2008)

  13. Saba, T., Rehman, A., Elarbi-Boudihir, M.: Methods and strategies on off-line cursive touched characters segmentation: a directional review. Artif. Intell. Rev. 42, 1047–1066 (2014)

    Article  Google Scholar 

  14. Alginahi, Y.M.: A survey on Arabic character segmentation. Int. J. Doc. Anal. Recognit. 16(2), 105–126 (2013)

    Article  Google Scholar 

  15. Verma, B., Lee, H.: Segment confidence-based binary segmentation (scbs) for cursive handwritten words. Expert Syst. Appl. 38(9), 11167–11175 (2011)

    Article  Google Scholar 

  16. Cheng, C.K., Blumenstein, M.: The neural-based segmentation of cursive words using enhanced heuristics. In: Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR), vol. 2. pp. 650–654 (2005)

  17. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979)

    Article  Google Scholar 

  18. Sermanet, P., Chintala, S., LeCun, Y.: Convolutional neural networks applied to house numbers digit classification. In: International Conference on Pattern Recognition (ICPR), pp. 3288–3291 (2012)

  19. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1106–1114 (2012)

    Google Scholar 

  20. Szegedy, C., Toshev, A., Erhan, D.: Deep neural networks for object detection. Adv. Neural Inf. Process. Syst. 26, 2553–2561 (2013)

    Google Scholar 

  21. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. In: International Conference on Learning Representations (ICLR) (2014)

  22. Kim, H.J., Lee, J.S., Yang, H.S.: Human action recognition using a modified convolutional neural network. In: the 4th International Symposium on Neural Networks (ISNN), pp. 715–723 (2007)

  23. Indermuhle, E., Liwicki, M., Bunke, H.: Recognition of handwritten historical documents: Hmm-adaptation versus writer specific training. In: Proceedings of the 11th International Conference on Frontiers in Handwriting Recognition (2008)

  24. Vinciarelli, A., Bengio, S.: Writer adaptation techniques in hmm based off-line cursive script recognition. Pattern Recognit. Lett. 23(8), 905–916 (2002)

    Article  MATH  Google Scholar 

  25. Palm, R.B.: Prediction as a candidate for learning deep hierarchical models of data. Master (2012)

  26. Tian, X., Zhang, Y.: Segmentation of touching characters in mathematical expressions using contour feature technique. In: Proceedings of the 8th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD), vol. 1. pp. 206–209 (2007)

  27. Xu, L., Yin, F., Liu, C.L.: Touching character splitting of chinese handwriting using contour analysis and dtw. In: Proceedings of 2010 Chinese Conference on Pattern Recognition (CCPR), pp. 1–5 (2010)

  28. Ramer, U.: An iterative procedure for the polygonal approximation of plane curves. Comput. Graph. Image Process. 1(3), 244–256 (1972)

    Article  Google Scholar 

  29. Douglas, D., Peucker, T.: Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartogr. Int. J. Geogr. Inf. Geovis. 10(2), 112–122 (1973)

    Google Scholar 

  30. Xiao, X., Leedham, G.: Knowledge-based english cursive script segmentation. Pattern Recognit. Lett. 21(10), 945–954 (2000)

    Article  Google Scholar 

  31. Sulong, G., Rehman, A., Saba, T.: Improved offline connected script recognition based on hybrid strategy. Int. J. Eng. Sci. Technol. 2(6), 1603–1611 (2010)

    Google Scholar 

Download references

Acknowledgments

This work was funded by National Natural Science Foundation of China (Grant Nos. 61263037, 61463038, and 61563040) and the Research Project of Higher Education School of Inner Mongolia Autonomous Region of China (Grant No. NJZY14007).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guanglai Gao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Su, X., Gao, G., Wei, H. et al. A knowledge-based recognition system for historical Mongolian documents. IJDAR 19, 221–235 (2016). https://doi.org/10.1007/s10032-016-0267-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-016-0267-1

Keywords

Navigation