Synthetic Sample Extension in Implementation of Tangut Character Databases

Meng, Yifei; Yuan, Xue; Wei, Xueye; Yang, Wenhui; Chen, Yan

doi:10.3103/S0146411618040089

Synthetic Sample Extension in Implementation of Tangut Character Databases

Published: 20 September 2018

Volume 52, pages 334–343, (2018)
Cite this article

Automatic Control and Computer Sciences Aims and scope Submit manuscript

Yifei Meng^1,2,
Xue Yuan¹,
Xueye Wei¹,
Wenhui Yang² &
…
Yan Chen²

39 Accesses
2 Citations
Explore all metrics

Abstract

The Tangut script was a logographic writing system used for the extinct Tangut language of the Western Xia Dynasty, which spanned 1038 to 1227. The technic of optical character recognition, machine learning, and computer vision will help greatly in the unscrambling of the character in the ancient scripts. But all these technics are based on the character database, which provides learning samples and test standards. In the process of building the Tangut Character Databases using the ancient Tangut scripts as a data source, it is found that the problem of imbalanced class distribution significantly compromises the performance of learning algorithms. A method of synthetic sample generation was proposed in this paper to improve the performance of learning and recognition of Tangut characters. The comparison of recognition accuracy between the learning base in the original data set and the synthetic generated data set was demonstrated, and presented an impressive superiority utilizing the researchers’ method. The organization of Tangut character databases was also introduced in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Odia character recognition: a directional review

Article 18 August 2016

Kalyan S. Dash, N. B. Puhan & G. Panda

A sigma-lognormal model-based approach to generating large synthetic online handwriting sample databases

Article 26 May 2017

Ujjwal Bhattacharya, Réjean Plamondon, … Swapan K. Parui

On developing complete character set Meitei Mayek handwritten character database

Article 17 February 2021

Deena Hijam & Sarat Saharia

References

Tianshun, W., The Battle History of Western Xia, Ningxia People’s Press, 1993.
Google Scholar
Ren, B., Western Xia: The Kingdom Lost in Historical Memories, Beijing: Foreign Language Press, 2005.
Google Scholar
Fanwen, L., Comprehensive History of Western Xia, Beijing, Yinchuan: People’s Press, Ningxia People’s Press, 2005.
Google Scholar
Kwanten, L., The structure of the Tangut Hsi-Hsia characters, Toung Pao, 1989, vol. 75, pp. 1–42.
Article Google Scholar
Xirong, M. and Xingyu, W., Preprocessing in XIXIA character recognition system, Comput. Eng. Appl., 2002, pp. 48–50.
Google Scholar
Xirong, M. and Xingyu, W., Study on feature extraction of Xixia characters, Comput. Eng. Appl., 2002, pp. 38–41.
Google Scholar
Xirong, M. and Xingyu, W., Study on the extraction of stroke for Xixia characters based on thinning, Comput. Eng. Appl., 2002, pp. 30–31, 47.
Google Scholar
Guangfu, M., Chen, P., and Changqing, L., Xixia characters recognition based on elastic mesh, J. Chin. Inf. Process., 2011, pp. 109–113.
Google Scholar
He, H.B. and Garcia, E.A., Learning from imbalanced data, IEEE Trans. Knowl. Data Eng., 2009, vol. 21, pp. 1263–1284.
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., and Kegelmeyer, W.P., SMOTE: Synthetic minority over-sampling technique, J. Artif. Intell. Res., 2002, vol. 16, pp. 321–357.
Article MATH Google Scholar
Sun, Y., Wang, Y., and Wang, Y., Boosting for learning multiple classes with imbalanced class distribution, International Conference on Data Mining, 2006, pp. 592–602.
Google Scholar
Abe, N., Zadrozny, B., and Langford, J., An iterative method for multi-class cost-sensitive learning, Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 3–11.
Google Scholar
Tan, A.C., Gilbert, D., and Deville, Y., Multi-class protein fold classification using a new ensemble machine learning approach, Genome Inf., 2003, vol. 14, pp. 206–217.
Google Scholar
Zhou, Z.H. and Liu, X.Y., Training cost-sensitive neural networks with methods addressing the class imbalance problem, IEEE Trans. Knowl. Data Eng., 2006, vol. 18, pp. 63–77.
Article Google Scholar
Zhou, Z.H. and Liu, X.Y., On multi-class cost-sensitive learning, Comput. Intell., 2010, vol. 26, pp. 232–257.
Article MathSciNet Google Scholar
Chen, K., Lu, B.L., and Kwok, J.T., Efficient classification of multi-label and imbalanced data using min-max modular classifiers, International Joint Conference on Neural Networks, 2006, pp. 1770–1775.
Google Scholar
Liu, C.L., Yin, F., Wang, D.H., and Wang, Q.F., CASIA online and offline Chinese handwriting databases, 2011 International Conference on Document Analysis and Recognition, 2011, pp. 37–41.
Google Scholar
Xuejun, F., Interpretion of Tangut shi ding pin of Flower Garland Sutra, Doctoral Dissertation, Shaanxi Normal University, 2013.
Google Scholar
Fanwen, L., Tangut-Chinese Dictionary, Beijing: China Social Sciences Press, 1997.
Google Scholar
Schaefer, S., Mcphail, T., and Warren, J., Image deformation using moving least squares, ACM Trans. Graph., 2006, vol. 25, pp. 533–540.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronic and Information Engineering Beijing Jiaotong University, Beijing, China
Yifei Meng, Xue Yuan & Xueye Wei
School of Physics and Electronic-Electrical Engineering Ningxia University, Yinchuan, China
Yifei Meng, Wenhui Yang & Yan Chen

Authors

Yifei Meng
View author publications
You can also search for this author in PubMed Google Scholar
Xue Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xueye Wei
View author publications
You can also search for this author in PubMed Google Scholar
Wenhui Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yifei Meng.

Additional information

The article is published in the original.

About this article

Cite this article

Meng, Y., Yuan, X., Wei, X. et al. Synthetic Sample Extension in Implementation of Tangut Character Databases. Aut. Control Comp. Sci. 52, 334–343 (2018). https://doi.org/10.3103/S0146411618040089

Download citation

Received: 10 May 2017
Accepted: 15 January 2018
Published: 20 September 2018
Issue Date: July 2018
DOI: https://doi.org/10.3103/S0146411618040089

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Synthetic Sample Extension in Implementation of Tangut Character Databases

Abstract

Access this article

Similar content being viewed by others

Odia character recognition: a directional review

A sigma-lognormal model-based approach to generating large synthetic online handwriting sample databases

On developing complete character set Meitei Mayek handwritten character database

References

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Keywords

Navigation

Synthetic Sample Extension in Implementation of Tangut Character Databases

Abstract

Access this article

Similar content being viewed by others

Odia character recognition: a directional review

A sigma-lognormal model-based approach to generating large synthetic online handwriting sample databases

On developing complete character set Meitei Mayek handwritten character database

References

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Share this article

Keywords

Search

Navigation