Skip to main content

Semi Supervised Learning for Prediction of Prosodic Phrase Boundaries in Chinese TTS Using Conditional Random Fields

  • Conference paper
  • 2317 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6676))

Abstract

Hierarchical prosody structure generation is a key component for a speech synthesis system. One major feature of the prosody of Mandarin Chinese speech flow is prosodic phrase grouping. In this paper we proposed an approach for prediction of Chinese prosodic phrase boundaries from a limited amount of labeled training examples and some amount of unlabeled data using conditional random fields. Some useful unlabeled data are chosen based on the assigned labels and the prediction probabilities of the current learned model. The useful unlabeled data is then exploited to improve the learning. Experiments show that the approach improves overall performance. The precision and recall ratio are improved.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Niu, Z., Chai, P.: Segmentation of Prosodic Phrases for Improving the Naturalness of Synthesized Mandarin Chinese Speech. In: ICSLP 2000 Conference, Beijing, China, pp. 350–353 (2000)

    Google Scholar 

  2. Yao, Q., Chu, M., Hu, P.: Segmenting unrestricted Chinese text into prosodic words instead of lexical words. In: ICASSP 2001 Conference, Salt Lake City, pp. 825–828 (2001)

    Google Scholar 

  3. Veilleux, N.M., Ostendorf, M., Price, P.J., Shattuck-Hufnagel, S.: Markov Modeling of prosodic phrase structure. In: ICASSP 1990, New Mexico, USA, pp. 777–780 (1990)

    Google Scholar 

  4. Li, J., Hu, G., Wang, R.: Chinese prosody phrase prediction based on maximum entropy model. In: Interspeech 2004, Jeju Island, Korea, pp. 729–732 (2004)

    Google Scholar 

  5. Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: 33rd Annual Meeting of the Association for Computational Linguistics, USA, pp. 189–196 (1995)

    Google Scholar 

  6. Riloff, E., Wiebe, J., Wilson, T.: Learning subjective nouns using extraction pattern bootstrapping. In: 7th Conference on Natural Language Learning (CoNLL 2003), Canada, pp. 25–32 (2003)

    Google Scholar 

  7. Maeireizo, B., Litman, D., Hwa, R.: Co-training for predicting emotions with spoken dialogue data. In: 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Spain (2004)

    Google Scholar 

  8. Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models. In: 7th IEEE Workshop on Applications of Computer Vision 2005, USA, pp. 29–36 (2005)

    Google Scholar 

  9. Lafferty, J., McCallum, A., Pereiram, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: 18th International Conference on Machine Learning, USA, pp. 282–289 (2001)

    Google Scholar 

  10. McCallum, A., Freitag, D., Pereira, F.: Maximum Entropy Markov Models for Information Extraction and Segmentation. In: ICML 2000, USA, pp. 591–598 (2000)

    Google Scholar 

  11. della Pietra, S., della Pietra, V., Lafferty, J.: Inducing Features of Random Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4), 380–393 (1997)

    Article  Google Scholar 

  12. Sanders, E., Taylor, P.: Using statistical models to predict phrase boundaries for speech synthesis. In: 4th European Conference on Speech Communication and Technology, Spain, pp.19–25 (1995)

    Google Scholar 

  13. Wong, T.-L., Lam, W.: Semi-Supervised learning for sequence labeling using conditional random fields. In: Proceeding of 4th International Conference on Machine Learning and Cybernetics, China, pp. 2832–2837 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhao, Z., Ma, X., Pei, W. (2011). Semi Supervised Learning for Prediction of Prosodic Phrase Boundaries in Chinese TTS Using Conditional Random Fields. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds) Advances in Neural Networks – ISNN 2011. ISNN 2011. Lecture Notes in Computer Science, vol 6676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21090-7_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21090-7_56

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21089-1

  • Online ISBN: 978-3-642-21090-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics