Abstract
One of the most significant problems in POS (Part-of-Speech) tagging of Chinese texts is an identification of words in a sentence, since there is no blank to delimit the words. Because it is impossible to pre-register all the words in a dictionary, the problem of unknown words inevitably occurs during this process. Therefore, the unknown word problem has remarkable effects on the accuracy of the sound in Chinese TTS (Text-to-Speech) system. In this paper, we present a SVM (support vector machine) based method that predicts the unknown words for the result of word segmentation and tagging. For high speed processing to be used in a TTS, we pre-detect the candidate boundary of the unknown words before starting actual prediction. Therefore we perform a two-phase unknown word prediction in the steps of detection and prediction. Results of the experiments are very promising by showing high precision and high recall with also high speed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chang, C.-C., Lin, C.-J.: LIBSVM: a Library for Support Vector Machines. a guide of beginners (2003), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chen, K.-J., Ma, W.-Y.: Unknown word extraction for chinese documents. In: Proceedings of COLING 2002, pp. 169–175 (2002)
Goh, C.-L., Asahara, M., Matsumono, Y.: Chinese unknown word identification using character-based tagging and chunking. In: Proceedings of the 41th ACL Conference, pp. 197-200 (2003)
Ha, J.-H., Zheng, Y., Lee, G.G.: Chinese segmentation and postagging by automatic pos dictionary training. In: Proceedings of the 14th Conference of Korean and Korean Information Processing, pp. 33–39 (2002) (in Korean)
Lv, Y.-J., Zhao, T.-J., Yang, M.-Y., Yu, H., Li, S.: Leveled unknown chinese word recognition by dynamic programming. Journal of Chinese information 15(1) (2000) (in Chinese)
Zhang, K., Liu, Q., Zhang, H., Cheng, X.-Q.: Automatic recognition of chinese unknown words based on roles tagging. In: Proceedings of the 1st SIGHAN Workshop on Chinese Language Processing, COLING (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ha, J., Zheng, Y., Kim, B., Lee, G.G., Seong, YS. (2005). High Speed Unknown Word Prediction Using Support Vector Machine for Chinese Text-to-Speech Systems. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_54
Download citation
DOI: https://doi.org/10.1007/978-3-540-30211-7_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24475-2
Online ISBN: 978-3-540-30211-7
eBook Packages: Computer ScienceComputer Science (R0)