Abstract
The paper describes a F0 model based on template and statistical method in speech synthesis. Being focused on the notion of templates, we confirmed that F0 patterns for a speech unit can be extracted from various anamorphosis of F0 contours in spontaneous speech. Furthermore, prosody cost function and statistical training method are used to assign and adapt the weights of template selection in real application. Unlike other methods, the approach may give feedback as to exactly what are the crucial parameters determining the successful choice of patterns. Final test proves the method in the paper can generate the synthesized speech with high naturalness, and is also much suitable to the multilingual prosody processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jianhua, T., et al.: Clustering and feature learning based F0 prediction for Chinese speech synthesis. In: ICSLP 2002, Denver (2002)
Ross, K.: Modeling of intonation for speech synthesis, Ph.D. Thesis, College of Engineering, Boston University (1995)
Jianhua, T., et al.: Trainable prosodic model for standard Chinese Text-to-Speech system. Chinese Journal of Acoustic 20, 257–265 (2001)
Jensen, U., Moore, R.K., Dalsgaard, P., Lindberg, B.: Modeling intonation contours at the phrase level using continuous density hidden Markov models. Computer Speech and Language 8, 247–260 (1994)
Shih, C., Kochanski, G.P.: Chinese Tone Modeling with Stem-ML. In: ICSLP 2000 (2000)
Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: ICASSP 1996 (1996)
Fujisaki, H., et al.: Analysis and modeling of tonal features in polysyllabic words and sentences of the Standard Chinese. ICSLP 2, 841–844 (1990)
Wu, Z.J.: Tone-sandhi in sentences in Standard Chinese. Chinese of China (6), 439–450
Fujisaki, H., et al.: Analysis and modeling of tonal features in polysyllabic words and sentences of the Standard Chinese. ICSLP 2, 841–844 (1990)
Mueller, A., Tao, J., Hoffmann, R.: Data-driven importance analysis of linguistic and phonetic information. In: ICSLP 2000 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tao, J. (2004). F0 Prediction Model of Speech Synthesis Based on Template and Statistical Method. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2004. Lecture Notes in Computer Science(), vol 3206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30120-2_63
Download citation
DOI: https://doi.org/10.1007/978-3-540-30120-2_63
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23049-6
Online ISBN: 978-3-540-30120-2
eBook Packages: Springer Book Archive