A Tree-Based Model of Prosodic Phrasing for Chinese Text-to-Speech Systems

Chen, Weijun; Linn, Fuzong; Li, Jianmin; Zhangh, Bo

doi:10.1007/3-540-45453-5_143

Weijun Chen^7,8,
Fuzong Linn^7,8,
Jianmin Li^7,8 &
…
Bo Zhangh^7,8

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2195))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

699 Accesses
1 Citations

Abstract

This paper describes a tree-based model of prosodic phrasing for Chinese text-to-speech (TTS) systems. The model uses classification and regression trees (CART) techniques to generate the decision tree automatically. We collected 559 sentences from CCTV news program and built a corresponding speech corpus uttered by a professional male announcer. The prosodic boundaries were manually marked on the recorded speech, and word identification, part-of-speech tagging and syntactic analysis were also done on the text. A decision tree was then trained on 371 sentences (of approximately 50 min length), and tested on 188 sentences (of approximately 28 min length). Features for modeling prosody are proposed, and their effectiveness is measured by interpreting the resulting tree. We achieved a success rate of about 93%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ostendorf, M., Wightman, C.W.: Parse Scoring with Prosodic Information: an Analysis/ Synthesis Approach. Computer Speech and Language, 7 (1993) 193–210
Article Google Scholar
Bachenko, J., Fitzpatrick, E.: A Computational Grammar of Discourse-Neutral Prosodic Phrasing in English. Computational Linguistics, 16 (1990) 155–170
Google Scholar
Willemse, R., Boves, L.: Context Free Wild Card Parsing in a Text-to-Speech System. In: ICASSP, (1991) 757–760
Google Scholar
Taylor, P., Black, A.W.: Assigning Phrase Breaks from Part-of-Speech Sequences. Computer Speech and Language, 12 (1998) 99–117
Article Google Scholar
Muller, A.F., Zimmermann, H.G., Neuneier, R.: Robust Generation of Symbolic Prosody by a Neural Classifier Based on Autoassociators. In: ICASSP, (1996) 1285–1288
Google Scholar
Wang, M.Q., Hirschberg, J.: Automatic Classification of Intonational Phrase Boundaries. Computer Speech and Language, 6 (1992) 175–196
Article Google Scholar
Hirschberg, J., Prieto, P.: Training Intonational Phrasing Rules Automatically for English and Spanish Text-to-Speech. Speech Communication, 18 (1996) 281–290
Article Google Scholar
Lee, S., Oh, Y.H. Tree-Based Modeling of Prosodic Phrasing and Segmental Duration for Korean TTS Systems. Speech Communication, 28 (1999) 283–300
Article Google Scholar
Fordyce, C.S., Ostendorf, M.: Prosody Prediction for Speech Synthesis Using Transformational Rule-Based Learning. In: ICSLP, (1998) 682–685
Google Scholar
Chou, F.C., Tseng, C.Y., Chen, K.J.: A Chinese Text-to-Speech System Based on Part-Of Speech Analysis, Prosodic Modeling and Non-uniform Units. In: ICASSP, (1997) 923–926
Google Scholar
Breiman L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Belmont, CA: Wadsworth (1984)
MATH Google Scholar
Bai, S.H.: The Study and Realization of Statistics Based Approach to Tagging Chinese Corpus. Master thesis, Tsinghua University, (1992) (In Chinese)
Google Scholar
Chen, W.J., Lin, F.Z., Li, J.M., Zhang, B.: Prosodic Phrase Analysis Based on Probability and Statistics. Computer Engineering and Applications, 37 (2001) 10–12 (In Chinese)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Weijun Chen, Fuzong Linn, Jianmin Li & Bo Zhangh
State Key Laboratory of Intelligent Technology and Systems, Tsinghua University, 100084, Beijing, China
Weijun Chen, Fuzong Linn, Jianmin Li & Bo Zhangh

Authors

Weijun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Fuzong Linn
View author publications
You can also search for this author in PubMed Google Scholar
Jianmin Li
View author publications
You can also search for this author in PubMed Google Scholar
Bo Zhangh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research China, 5/F Beijing Sigma Center 49 Zhichung Road, Haidian District, Beijing, 100080, China
Heung-Yeung Shum
Institute of Information Science, Academia Sinica, Taiwan
Mark Liao
Department of Electrical Engineering, Columbia University, New York, NY, 10027, USA
Shih-Fu Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, W., Linn, F., Li, J., Zhangh, B. (2001). A Tree-Based Model of Prosodic Phrasing for Chinese Text-to-Speech Systems. In: Shum, HY., Liao, M., Chang, SF. (eds) Advances in Multimedia Information Processing — PCM 2001. PCM 2001. Lecture Notes in Computer Science, vol 2195. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45453-5_143

Download citation

DOI: https://doi.org/10.1007/3-540-45453-5_143
Published: 20 November 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42680-6
Online ISBN: 978-3-540-45453-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics