Abstract
This paper describes a tree-based model of prosodic phrasing for Chinese text-to-speech (TTS) systems. The model uses classification and regression trees (CART) techniques to generate the decision tree automatically. We collected 559 sentences from CCTV news program and built a corresponding speech corpus uttered by a professional male announcer. The prosodic boundaries were manually marked on the recorded speech, and word identification, part-of-speech tagging and syntactic analysis were also done on the text. A decision tree was then trained on 371 sentences (of approximately 50 min length), and tested on 188 sentences (of approximately 28 min length). Features for modeling prosody are proposed, and their effectiveness is measured by interpreting the resulting tree. We achieved a success rate of about 93%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ostendorf, M., Wightman, C.W.: Parse Scoring with Prosodic Information: an Analysis/ Synthesis Approach. Computer Speech and Language, 7 (1993) 193–210
Bachenko, J., Fitzpatrick, E.: A Computational Grammar of Discourse-Neutral Prosodic Phrasing in English. Computational Linguistics, 16 (1990) 155–170
Willemse, R., Boves, L.: Context Free Wild Card Parsing in a Text-to-Speech System. In: ICASSP, (1991) 757–760
Taylor, P., Black, A.W.: Assigning Phrase Breaks from Part-of-Speech Sequences. Computer Speech and Language, 12 (1998) 99–117
Muller, A.F., Zimmermann, H.G., Neuneier, R.: Robust Generation of Symbolic Prosody by a Neural Classifier Based on Autoassociators. In: ICASSP, (1996) 1285–1288
Wang, M.Q., Hirschberg, J.: Automatic Classification of Intonational Phrase Boundaries. Computer Speech and Language, 6 (1992) 175–196
Hirschberg, J., Prieto, P.: Training Intonational Phrasing Rules Automatically for English and Spanish Text-to-Speech. Speech Communication, 18 (1996) 281–290
Lee, S., Oh, Y.H. Tree-Based Modeling of Prosodic Phrasing and Segmental Duration for Korean TTS Systems. Speech Communication, 28 (1999) 283–300
Fordyce, C.S., Ostendorf, M.: Prosody Prediction for Speech Synthesis Using Transformational Rule-Based Learning. In: ICSLP, (1998) 682–685
Chou, F.C., Tseng, C.Y., Chen, K.J.: A Chinese Text-to-Speech System Based on Part-Of Speech Analysis, Prosodic Modeling and Non-uniform Units. In: ICASSP, (1997) 923–926
Breiman L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Belmont, CA: Wadsworth (1984)
Bai, S.H.: The Study and Realization of Statistics Based Approach to Tagging Chinese Corpus. Master thesis, Tsinghua University, (1992) (In Chinese)
Chen, W.J., Lin, F.Z., Li, J.M., Zhang, B.: Prosodic Phrase Analysis Based on Probability and Statistics. Computer Engineering and Applications, 37 (2001) 10–12 (In Chinese)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, W., Linn, F., Li, J., Zhangh, B. (2001). A Tree-Based Model of Prosodic Phrasing for Chinese Text-to-Speech Systems. In: Shum, HY., Liao, M., Chang, SF. (eds) Advances in Multimedia Information Processing — PCM 2001. PCM 2001. Lecture Notes in Computer Science, vol 2195. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45453-5_143
Download citation
DOI: https://doi.org/10.1007/3-540-45453-5_143
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42680-6
Online ISBN: 978-3-540-45453-3
eBook Packages: Springer Book Archive