Skip to main content
Log in

Synthetic Sample Extension in Implementation of Tangut Character Databases

  • Published:
Automatic Control and Computer Sciences Aims and scope Submit manuscript

Abstract

The Tangut script was a logographic writing system used for the extinct Tangut language of the Western Xia Dynasty, which spanned 1038 to 1227. The technic of optical character recognition, machine learning, and computer vision will help greatly in the unscrambling of the character in the ancient scripts. But all these technics are based on the character database, which provides learning samples and test standards. In the process of building the Tangut Character Databases using the ancient Tangut scripts as a data source, it is found that the problem of imbalanced class distribution significantly compromises the performance of learning algorithms. A method of synthetic sample generation was proposed in this paper to improve the performance of learning and recognition of Tangut characters. The comparison of recognition accuracy between the learning base in the original data set and the synthetic generated data set was demonstrated, and presented an impressive superiority utilizing the researchers’ method. The organization of Tangut character databases was also introduced in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Tianshun, W., The Battle History of Western Xia, Ningxia People’s Press, 1993.

    Google Scholar 

  2. Ren, B., Western Xia: The Kingdom Lost in Historical Memories, Beijing: Foreign Language Press, 2005.

    Google Scholar 

  3. Fanwen, L., Comprehensive History of Western Xia, Beijing, Yinchuan: People’s Press, Ningxia People’s Press, 2005.

    Google Scholar 

  4. Kwanten, L., The structure of the Tangut Hsi-Hsia characters, Toung Pao, 1989, vol. 75, pp. 1–42.

    Article  Google Scholar 

  5. Xirong, M. and Xingyu, W., Preprocessing in XIXIA character recognition system, Comput. Eng. Appl., 2002, pp. 48–50.

    Google Scholar 

  6. Xirong, M. and Xingyu, W., Study on feature extraction of Xixia characters, Comput. Eng. Appl., 2002, pp. 38–41.

    Google Scholar 

  7. Xirong, M. and Xingyu, W., Study on the extraction of stroke for Xixia characters based on thinning, Comput. Eng. Appl., 2002, pp. 30–31, 47.

    Google Scholar 

  8. Guangfu, M., Chen, P., and Changqing, L., Xixia characters recognition based on elastic mesh, J. Chin. Inf. Process., 2011, pp. 109–113.

    Google Scholar 

  9. He, H.B. and Garcia, E.A., Learning from imbalanced data, IEEE Trans. Knowl. Data Eng., 2009, vol. 21, pp. 1263–1284.

    Article  Google Scholar 

  10. Chawla, N.V., Bowyer, K.W., Hall, L.O., and Kegelmeyer, W.P., SMOTE: Synthetic minority over-sampling technique, J. Artif. Intell. Res., 2002, vol. 16, pp. 321–357.

    Article  MATH  Google Scholar 

  11. Sun, Y., Wang, Y., and Wang, Y., Boosting for learning multiple classes with imbalanced class distribution, International Conference on Data Mining, 2006, pp. 592–602.

    Google Scholar 

  12. Abe, N., Zadrozny, B., and Langford, J., An iterative method for multi-class cost-sensitive learning, Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 3–11.

    Google Scholar 

  13. Tan, A.C., Gilbert, D., and Deville, Y., Multi-class protein fold classification using a new ensemble machine learning approach, Genome Inf., 2003, vol. 14, pp. 206–217.

    Google Scholar 

  14. Zhou, Z.H. and Liu, X.Y., Training cost-sensitive neural networks with methods addressing the class imbalance problem, IEEE Trans. Knowl. Data Eng., 2006, vol. 18, pp. 63–77.

    Article  Google Scholar 

  15. Zhou, Z.H. and Liu, X.Y., On multi-class cost-sensitive learning, Comput. Intell., 2010, vol. 26, pp. 232–257.

    Article  MathSciNet  Google Scholar 

  16. Chen, K., Lu, B.L., and Kwok, J.T., Efficient classification of multi-label and imbalanced data using min-max modular classifiers, International Joint Conference on Neural Networks, 2006, pp. 1770–1775.

    Google Scholar 

  17. Liu, C.L., Yin, F., Wang, D.H., and Wang, Q.F., CASIA online and offline Chinese handwriting databases, 2011 International Conference on Document Analysis and Recognition, 2011, pp. 37–41.

    Google Scholar 

  18. Xuejun, F., Interpretion of Tangut shi ding pin of Flower Garland Sutra, Doctoral Dissertation, Shaanxi Normal University, 2013.

    Google Scholar 

  19. Fanwen, L., Tangut-Chinese Dictionary, Beijing: China Social Sciences Press, 1997.

    Google Scholar 

  20. Schaefer, S., Mcphail, T., and Warren, J., Image deformation using moving least squares, ACM Trans. Graph., 2006, vol. 25, pp. 533–540.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yifei Meng.

Additional information

The article is published in the original.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Meng, Y., Yuan, X., Wei, X. et al. Synthetic Sample Extension in Implementation of Tangut Character Databases. Aut. Control Comp. Sci. 52, 334–343 (2018). https://doi.org/10.3103/S0146411618040089

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0146411618040089

Keywords

Navigation