Evaluation of Deep Learning Approaches to Text-to-Speech Systems for European Portuguese

Quintas, Sebastião; Trancoso, Isabel

doi:10.1007/978-3-030-41505-1_4

Sebastião Quintas^14,15 &
Isabel Trancoso^14,15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12037))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

527 Accesses
2 Citations

Abstract

Deep Learning models are considered state-of-the-art regarding Text-to-Speech, displaying very natural and realistic results. However, it is known that these machine learning methods usually require large amounts of data to operate properly. Due to this, an assessment of the system’s ability to generalize to different instances becomes relevant, specially when learning from small data sets to create new voices. This study describes the assessment of a deep learning approach to TTS for European Portuguese. We show that we can use transfer learning techniques to fine-tune a Tacotron-2 model to a specific voice, while preserving speaker identity, without requiring large amounts of data. We also perform a comparison between the developed model and a statistical parametric speech synthesizer enhanced by deep learning, concluding that Tacotron-2 provided an overall better word pronunciation, naturalness and intonation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Shen, J., Pang, R., Weiss, R.J.: Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. Google Inc., December 2017
Google Scholar
Oord, A., Dieleman, S., Simonyan, K.: Wavenet: A generative model for raw audio. Google’s Deepmind, September 2016
Google Scholar
Wu, Z., Watts, O., King, S.: Merlin: an open source neural network speech synthesis system. In: 9th ISCA Speech Synthesis Workshop, September 2016
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N.: Attention is all you need. In: NIPS Proceedings (2017)
Google Scholar
Sutskever, I., Vinyals, O.: Sequence to sequence learning with neural networks. In: NIPS Proceedings, June 2014
Google Scholar
Wang, Y., Ryan, R.J., Stanton, D.: Tacotron: Towards End-to-End Speech Synthesis. Google Inc., April 2017
Google Scholar
Mamah, R.: Open-Source Tacotron-2. https://github.com/Rayhane-mamah/Tacotron-2
Synsig: Evaluation. https://www.synsig.org/index.php/Evaluation
Isabel, M., Trancoso, M., Viana, C., Silva, F.M.: On the pronunciation of common lexica and proper names in European Portuguese. In: 2nd Onomastica Research Colloquium, December 1994
Google Scholar
Moniz, H., Batista, F., Trancoso, I., Mata, A.I.: Análise de interrogativas em diferentes domínios. APL, July 2012
Google Scholar
Truckenbrodt, H.: On rises and falls in interrogatives. Actes d’IDP, June 2009
Google Scholar

Download references

Acknowledgments

This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2019. The authors gratefully acknowledge the contributions of Ana Londral, Sérgio Paulo, Luís Bernardo, and Catarina Gonçalves.

Author information

Authors and Affiliations

INESC-ID, Lisbon, Portugal
Sebastião Quintas & Isabel Trancoso
Instituto Superior Técnico, Lisbon, Portugal
Sebastião Quintas & Isabel Trancoso

Authors

Sebastião Quintas
View author publications
You can also search for this author in PubMed Google Scholar
Isabel Trancoso
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastião Quintas .

Editor information

Editors and Affiliations

University of Évora, Evora, Portugal
Paulo Quaresma
University of Évora, Evora, Portugal
Renata Vieira
University of São Paulo, São Carlos, Brazil
Sandra Aluísio
University of Lisbon, Lisbon, Portugal
Helena Moniz
INESC-ID/ISCTE-IUL, Lisbon, Portugal
Fernando Batista
University of Évora, Evora, Portugal
Teresa Gonçalves

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Quintas, S., Trancoso, I. (2020). Evaluation of Deep Learning Approaches to Text-to-Speech Systems for European Portuguese. In: Quaresma, P., Vieira, R., Aluísio, S., Moniz, H., Batista, F., Gonçalves, T. (eds) Computational Processing of the Portuguese Language. PROPOR 2020. Lecture Notes in Computer Science(), vol 12037. Springer, Cham. https://doi.org/10.1007/978-3-030-41505-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-41505-1_4
Published: 24 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41504-4
Online ISBN: 978-3-030-41505-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics