Loading [a11y]/accessibility-menu.js
A statistical method for Uyghur tokenization | IEEE Conference Publication | IEEE Xplore

A statistical method for Uyghur tokenization

Publisher: IEEE

Abstract:

Tokenization is very important for Uyghur language processing. Tokenization of Uyghur, an agglutinative language, is quite different from other languages such as Chinese ...View more

Abstract:

Tokenization is very important for Uyghur language processing. Tokenization of Uyghur, an agglutinative language, is quite different from other languages such as Chinese and English. In this paper we propose a two-steps statistical tokenization method for Uyghur. Two related factors, the feature template scheme and the manually tokenized corpora, are also discussed. The preliminary experiment results demonstrate that the proposed method is effective: the F-measure of tokenization reaches 88.9% in the open test.
Date of Conference: 24-27 September 2009
Date Added to IEEE Xplore: 06 November 2009
ISBN Information:
Publisher: IEEE
Conference Location: Dalian, China

References

References is not available for this document.