Skip to main content

High Speed Unknown Word Prediction Using Support Vector Machine for Chinese Text-to-Speech Systems

  • Conference paper
Natural Language Processing – IJCNLP 2004 (IJCNLP 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3248))

Included in the following conference series:

Abstract

One of the most significant problems in POS (Part-of-Speech) tagging of Chinese texts is an identification of words in a sentence, since there is no blank to delimit the words. Because it is impossible to pre-register all the words in a dictionary, the problem of unknown words inevitably occurs during this process. Therefore, the unknown word problem has remarkable effects on the accuracy of the sound in Chinese TTS (Text-to-Speech) system. In this paper, we present a SVM (support vector machine) based method that predicts the unknown words for the result of word segmentation and tagging. For high speed processing to be used in a TTS, we pre-detect the candidate boundary of the unknown words before starting actual prediction. Therefore we perform a two-phase unknown word prediction in the steps of detection and prediction. Results of the experiments are very promising by showing high precision and high recall with also high speed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chang, C.-C., Lin, C.-J.: LIBSVM: a Library for Support Vector Machines. a guide of beginners (2003), http://www.csie.ntu.edu.tw/~cjlin/libsvm

  2. Chen, K.-J., Ma, W.-Y.: Unknown word extraction for chinese documents. In: Proceedings of COLING 2002, pp. 169–175 (2002)

    Google Scholar 

  3. Goh, C.-L., Asahara, M., Matsumono, Y.: Chinese unknown word identification using character-based tagging and chunking. In: Proceedings of the 41th ACL Conference, pp. 197-200 (2003)

    Google Scholar 

  4. Ha, J.-H., Zheng, Y., Lee, G.G.: Chinese segmentation and postagging by automatic pos dictionary training. In: Proceedings of the 14th Conference of Korean and Korean Information Processing, pp. 33–39 (2002) (in Korean)

    Google Scholar 

  5. Lv, Y.-J., Zhao, T.-J., Yang, M.-Y., Yu, H., Li, S.: Leveled unknown chinese word recognition by dynamic programming. Journal of Chinese information 15(1) (2000) (in Chinese)

    Google Scholar 

  6. Zhang, K., Liu, Q., Zhang, H., Cheng, X.-Q.: Automatic recognition of chinese unknown words based on roles tagging. In: Proceedings of the 1st SIGHAN Workshop on Chinese Language Processing, COLING (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ha, J., Zheng, Y., Kim, B., Lee, G.G., Seong, YS. (2005). High Speed Unknown Word Prediction Using Support Vector Machine for Chinese Text-to-Speech Systems. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30211-7_54

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24475-2

  • Online ISBN: 978-3-540-30211-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics