Conferences >2009 International Conference...

A statistical method for Uyghur tokenization

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Tokenization is very important for Uyghur language processing. Tokenization of Uyghur, an agglutinative language, is quite different from other languages such as Chinese ...View more

Metadata

Abstract:

Tokenization is very important for Uyghur language processing. Tokenization of Uyghur, an agglutinative language, is quite different from other languages such as Chinese and English. In this paper we propose a two-steps statistical tokenization method for Uyghur. Two related factors, the feature template scheme and the manually tokenized corpora, are also discussed. The preliminary experiment results demonstrate that the proposed method is effective: the F-measure of tokenization reaches 88.9% in the open test.

Published in: 2009 International Conference on Natural Language Processing and Knowledge Engineering

Date of Conference: 24-27 September 2009

Date Added to IEEE Xplore: 06 November 2009

ISBN Information:

DOI: 10.1109/NLPKE.2009.5313764

Conference Location: Dalian, China

Contents

References is not available for this document.

A statistical method for Uyghur tokenization

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A statistical method for Uyghur tokenization

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?