Evaluation of a Segmental Durations Model for TTS

Teixeira, João Paulo; Freitas, Diamantino

doi:10.1007/3-540-45011-4_6

João Paulo Teixeira⁴ &
Diamantino Freitas⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2721))

Included in the following conference series:

International Workshop on Computational Processing of the Portuguese Language

445 Accesses

Abstract

In this paper we present a condensed description of a European Portuguese segmental duration’s model for TTS purposes and concentrate on its evaluation. This model is based on artificial neural networks. The evaluation of the model quality was made by comparison with read speech. The standard deviation reached in test set is 19.5 ms and the linear correlation coefficient is 0.84. The model is perceptually evaluated with 4.12 against 4.30 for natural human read speech in a scale of 5.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

DNN-Based Duration Modeling for Synthesizing Short Sentences

LSTM-Based Speech Segmentation for TTS Synthesis

Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis

References

Campbell, W.N., “Predicting Segmental Durations for Accommodation within a Syllable-Level Timing Framework”, Proceeding Eurospeech 93, volume 2, pag. 1081–1084.
Google Scholar
Van Santen, J.P.H., “Assignment of segmental duration in text-to-speech synthesis”, in Computer Speech and Language, 8, 95–128, 1994.
Article Google Scholar
Barbosa P., Bailly G., “Generation of pauses within the z-score model”, in “Progress in Speech Synthesis”, by Van Santen J.P. et al, editors. Springer-Verlag, 1997.
Google Scholar
Barbosa P., “A Model of Segment (and Pause) Duration Generation for Brazilian Portuguese Text-to-Speech Synthesis”, in Eurospeech’97, Rodes.
Google Scholar
Klatt, D.H., “Linguistic uses of segmental duration in English: Acoustic and perceptual evidence”, JASA, 59, 1209–1221, 1976.
Google Scholar
Zellner, B., “Caractérisation et prédiction du débit de parole en français — Une étude de cas”, PhD, U. de Lausanne, 1998.
Google Scholar
Salgado, Xavier F., e Banga E.R., “Segmental Duration Modelling in a Text-to-Speech System for the Galician Language”, in Eurospeech’99, Budapeste.
Google Scholar
Córdoba, Vallejo, Montero, Gutierrez, López., Pardo, “Automatic Modelling of Duration in a Spanish Text-to-Speech System Using Neural Networks. Eurospeech’99.
Google Scholar
Hifny, Y., Rashwan, M., “Duration Modeling for Arabic Text to Speech Synthesis”, Proceedings of ICSLP’ 2002.
Google Scholar
Chung, H., “Segment Duration in Spoken Korean”, Proceedings of ICSLP’ 2002.
Google Scholar
Mixdorff, H., “An Integrated Approach to Modeling German Prosody”, Thesis for Dr.-Ing. Habil., Technical University of Dresden, 2002.
Google Scholar
Teixeira, J.P., Freitas, D., Braga, D., Barros, M.J., Latsch, V., “Phonetic Events from the Labeling the European Portuguese Database for Speech Synthesis, FEUP/IPB-DB”, in Eurospeech’ 01, Aalborg.
Google Scholar
Hagan, M.T., Menhaj, M., “Training feedforward networks with the Marquardt algorithm”, IEEE Transactions on Neural Networks, vol. 5, n 6, 1994.
Google Scholar
Riedmiller, M., and H. Braun, “A direct adaptive method for faster backpropagation learning: The RPROP algorithm”, Proceedings of the IEEE International Conference on Neural Networks, 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

Polytechnic Institute of Bragança, Faculty of Engineering of University of Porto, Portugal
João Paulo Teixeira & Diamantino Freitas

Authors

João Paulo Teixeira
View author publications
You can also search for this author in PubMed Google Scholar
Diamantino Freitas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

L2F, INESC-ID Lisboa, Technical University of Lisbon, Rua Alves Redol, 9, 1000-029, Lisbon, Portugal
Nuno J. Mamede & Isabel Trancoso &
Faculty of Humanities and Social Sciences, University of Algarve, Campus de Gambelas, 8005-139, Faro, Portugal
Jorge Baptista
NILC, ICMC-USP São-Carlos, Av. do Trabalhador São-Carlense, 400, 13560-970, São Carlos, SP, Brazil
Maria das Graças Volpe Nunes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Teixeira, J.P., Freitas, D. (2003). Evaluation of a Segmental Durations Model for TTS. In: Mamede, N.J., Trancoso, I., Baptista, J., das Graças Volpe Nunes, M. (eds) Computational Processing of the Portuguese Language. PROPOR 2003. Lecture Notes in Computer Science(), vol 2721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45011-4_6

Download citation

DOI: https://doi.org/10.1007/3-540-45011-4_6
Published: 18 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40436-1
Online ISBN: 978-3-540-45011-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Evaluation of a Segmental Durations Model for TTS

Abstract

Access this chapter

Preview

Similar content being viewed by others

DNN-Based Duration Modeling for Synthesizing Short Sentences

LSTM-Based Speech Segmentation for TTS Synthesis

Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Evaluation of a Segmental Durations Model for TTS

Abstract

Access this chapter

Preview

Similar content being viewed by others

DNN-Based Duration Modeling for Synthesizing Short Sentences

LSTM-Based Speech Segmentation for TTS Synthesis

Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation