Semi Supervised Learning for Prediction of Prosodic Phrase Boundaries in Chinese TTS Using Conditional Random Fields

Zhao, Ziping; Ma, Xirong; Pei, Weidong

doi:10.1007/978-3-642-21090-7_56

Ziping Zhao²¹,
Xirong Ma²¹ &
Weidong Pei²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6676))

Included in the following conference series:

International Symposium on Neural Networks

2350 Accesses

Abstract

Hierarchical prosody structure generation is a key component for a speech synthesis system. One major feature of the prosody of Mandarin Chinese speech flow is prosodic phrase grouping. In this paper we proposed an approach for prediction of Chinese prosodic phrase boundaries from a limited amount of labeled training examples and some amount of unlabeled data using conditional random fields. Some useful unlabeled data are chosen based on the assigned labels and the prediction probabilities of the current learned model. The useful unlabeled data is then exploited to improve the learning. Experiments show that the approach improves overall performance. The precision and recall ratio are improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Detection of Prosodic Boundaries in Speech Using Wav2Vec 2.0

Symbolic Melody Phrase Segmentation Using Neural Network with Conditional Random Field

Profanity Detection from Audio Recordings Using Natural Language Processing Techniques

References

Niu, Z., Chai, P.: Segmentation of Prosodic Phrases for Improving the Naturalness of Synthesized Mandarin Chinese Speech. In: ICSLP 2000 Conference, Beijing, China, pp. 350–353 (2000)
Google Scholar
Yao, Q., Chu, M., Hu, P.: Segmenting unrestricted Chinese text into prosodic words instead of lexical words. In: ICASSP 2001 Conference, Salt Lake City, pp. 825–828 (2001)
Google Scholar
Veilleux, N.M., Ostendorf, M., Price, P.J., Shattuck-Hufnagel, S.: Markov Modeling of prosodic phrase structure. In: ICASSP 1990, New Mexico, USA, pp. 777–780 (1990)
Google Scholar
Li, J., Hu, G., Wang, R.: Chinese prosody phrase prediction based on maximum entropy model. In: Interspeech 2004, Jeju Island, Korea, pp. 729–732 (2004)
Google Scholar
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: 33rd Annual Meeting of the Association for Computational Linguistics, USA, pp. 189–196 (1995)
Google Scholar
Riloff, E., Wiebe, J., Wilson, T.: Learning subjective nouns using extraction pattern bootstrapping. In: 7th Conference on Natural Language Learning (CoNLL 2003), Canada, pp. 25–32 (2003)
Google Scholar
Maeireizo, B., Litman, D., Hwa, R.: Co-training for predicting emotions with spoken dialogue data. In: 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Spain (2004)
Google Scholar
Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models. In: 7th IEEE Workshop on Applications of Computer Vision 2005, USA, pp. 29–36 (2005)
Google Scholar
Lafferty, J., McCallum, A., Pereiram, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: 18th International Conference on Machine Learning, USA, pp. 282–289 (2001)
Google Scholar
McCallum, A., Freitag, D., Pereira, F.: Maximum Entropy Markov Models for Information Extraction and Segmentation. In: ICML 2000, USA, pp. 591–598 (2000)
Google Scholar
della Pietra, S., della Pietra, V., Lafferty, J.: Inducing Features of Random Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4), 380–393 (1997)
Article Google Scholar
Sanders, E., Taylor, P.: Using statistical models to predict phrase boundaries for speech synthesis. In: 4th European Conference on Speech Communication and Technology, Spain, pp.19–25 (1995)
Google Scholar
Wong, T.-L., Lam, W.: Semi-Supervised learning for sequence labeling using conditional random fields. In: Proceeding of 4th International Conference on Machine Learning and Cybernetics, China, pp. 2832–2837 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer and Information Engineering, Tianjin Normal University, Tianjin, China
Ziping Zhao, Xirong Ma & Weidong Pei

Authors

Ziping Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xirong Ma
View author publications
You can also search for this author in PubMed Google Scholar
Weidong Pei
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Automation, Key Laboratory of Complex Systems and Intelligence Science, Chinese Academy of Sciences, 100190, Beijing, China
Derong Liu
College of Information Science and Engineering, Northeastern University, 110004, Shenyang, Liaoing, China
Huaguang Zhang
Department of Electrical and Computer Engineering, University of Cyprus, 75 Kallipoleos Avenue, 1678, Nicosia, Cyprus
Marios Polycarpou
Dipartimento di Elettronica, Politecnico di Milano, Piazza L. da Vinci 32, 20133, Milano, Italy
Cesare Alippi
Deptartment of Electrical, Computer and Biomedical Engineering, University of Rhode Island, 02881, Kingston, RI, USA
Haibo He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, Z., Ma, X., Pei, W. (2011). Semi Supervised Learning for Prediction of Prosodic Phrase Boundaries in Chinese TTS Using Conditional Random Fields. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds) Advances in Neural Networks – ISNN 2011. ISNN 2011. Lecture Notes in Computer Science, vol 6676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21090-7_56

Download citation

DOI: https://doi.org/10.1007/978-3-642-21090-7_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21089-1
Online ISBN: 978-3-642-21090-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Semi Supervised Learning for Prediction of Prosodic Phrase Boundaries in Chinese TTS Using Conditional Random Fields

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Detection of Prosodic Boundaries in Speech Using Wav2Vec 2.0

Symbolic Melody Phrase Segmentation Using Neural Network with Conditional Random Field

Profanity Detection from Audio Recordings Using Natural Language Processing Techniques

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Semi Supervised Learning for Prediction of Prosodic Phrase Boundaries in Chinese TTS Using Conditional Random Fields

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Detection of Prosodic Boundaries in Speech Using Wav2Vec 2.0

Symbolic Melody Phrase Segmentation Using Neural Network with Conditional Random Field

Profanity Detection from Audio Recordings Using Natural Language Processing Techniques

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation