Abstract
In this paper we present a condensed description of a European Portuguese segmental duration’s model for TTS purposes and concentrate on its evaluation. This model is based on artificial neural networks. The evaluation of the model quality was made by comparison with read speech. The standard deviation reached in test set is 19.5 ms and the linear correlation coefficient is 0.84. The model is perceptually evaluated with 4.12 against 4.30 for natural human read speech in a scale of 5.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Campbell, W.N., “Predicting Segmental Durations for Accommodation within a Syllable-Level Timing Framework”, Proceeding Eurospeech 93, volume 2, pag. 1081–1084.
Van Santen, J.P.H., “Assignment of segmental duration in text-to-speech synthesis”, in Computer Speech and Language, 8, 95–128, 1994.
Barbosa P., Bailly G., “Generation of pauses within the z-score model”, in “Progress in Speech Synthesis”, by Van Santen J.P. et al, editors. Springer-Verlag, 1997.
Barbosa P., “A Model of Segment (and Pause) Duration Generation for Brazilian Portuguese Text-to-Speech Synthesis”, in Eurospeech’97, Rodes.
Klatt, D.H., “Linguistic uses of segmental duration in English: Acoustic and perceptual evidence”, JASA, 59, 1209–1221, 1976.
Zellner, B., “Caractérisation et prédiction du débit de parole en français — Une étude de cas”, PhD, U. de Lausanne, 1998.
Salgado, Xavier F., e Banga E.R., “Segmental Duration Modelling in a Text-to-Speech System for the Galician Language”, in Eurospeech’99, Budapeste.
Córdoba, Vallejo, Montero, Gutierrez, López., Pardo, “Automatic Modelling of Duration in a Spanish Text-to-Speech System Using Neural Networks. Eurospeech’99.
Hifny, Y., Rashwan, M., “Duration Modeling for Arabic Text to Speech Synthesis”, Proceedings of ICSLP’ 2002.
Chung, H., “Segment Duration in Spoken Korean”, Proceedings of ICSLP’ 2002.
Mixdorff, H., “An Integrated Approach to Modeling German Prosody”, Thesis for Dr.-Ing. Habil., Technical University of Dresden, 2002.
Teixeira, J.P., Freitas, D., Braga, D., Barros, M.J., Latsch, V., “Phonetic Events from the Labeling the European Portuguese Database for Speech Synthesis, FEUP/IPB-DB”, in Eurospeech’ 01, Aalborg.
Hagan, M.T., Menhaj, M., “Training feedforward networks with the Marquardt algorithm”, IEEE Transactions on Neural Networks, vol. 5, n 6, 1994.
Riedmiller, M., and H. Braun, “A direct adaptive method for faster backpropagation learning: The RPROP algorithm”, Proceedings of the IEEE International Conference on Neural Networks, 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Teixeira, J.P., Freitas, D. (2003). Evaluation of a Segmental Durations Model for TTS. In: Mamede, N.J., Trancoso, I., Baptista, J., das Graças Volpe Nunes, M. (eds) Computational Processing of the Portuguese Language. PROPOR 2003. Lecture Notes in Computer Science(), vol 2721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45011-4_6
Download citation
DOI: https://doi.org/10.1007/3-540-45011-4_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40436-1
Online ISBN: 978-3-540-45011-5
eBook Packages: Springer Book Archive