Abstract
Deep Learning models are considered state-of-the-art regarding Text-to-Speech, displaying very natural and realistic results. However, it is known that these machine learning methods usually require large amounts of data to operate properly. Due to this, an assessment of the system’s ability to generalize to different instances becomes relevant, specially when learning from small data sets to create new voices. This study describes the assessment of a deep learning approach to TTS for European Portuguese. We show that we can use transfer learning techniques to fine-tune a Tacotron-2 model to a specific voice, while preserving speaker identity, without requiring large amounts of data. We also perform a comparison between the developed model and a statistical parametric speech synthesizer enhanced by deep learning, concluding that Tacotron-2 provided an overall better word pronunciation, naturalness and intonation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Shen, J., Pang, R., Weiss, R.J.: Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. Google Inc., December 2017
Oord, A., Dieleman, S., Simonyan, K.: Wavenet: A generative model for raw audio. Google’s Deepmind, September 2016
Wu, Z., Watts, O., King, S.: Merlin: an open source neural network speech synthesis system. In: 9th ISCA Speech Synthesis Workshop, September 2016
Vaswani, A., Shazeer, N., Parmar, N.: Attention is all you need. In: NIPS Proceedings (2017)
Sutskever, I., Vinyals, O.: Sequence to sequence learning with neural networks. In: NIPS Proceedings, June 2014
Wang, Y., Ryan, R.J., Stanton, D.: Tacotron: Towards End-to-End Speech Synthesis. Google Inc., April 2017
Mamah, R.: Open-Source Tacotron-2. https://github.com/Rayhane-mamah/Tacotron-2
Synsig: Evaluation. https://www.synsig.org/index.php/Evaluation
Isabel, M., Trancoso, M., Viana, C., Silva, F.M.: On the pronunciation of common lexica and proper names in European Portuguese. In: 2nd Onomastica Research Colloquium, December 1994
Moniz, H., Batista, F., Trancoso, I., Mata, A.I.: Análise de interrogativas em diferentes domínios. APL, July 2012
Truckenbrodt, H.: On rises and falls in interrogatives. Actes d’IDP, June 2009
Acknowledgments
This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2019. The authors gratefully acknowledge the contributions of Ana Londral, Sérgio Paulo, Luís Bernardo, and Catarina Gonçalves.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Quintas, S., Trancoso, I. (2020). Evaluation of Deep Learning Approaches to Text-to-Speech Systems for European Portuguese. In: Quaresma, P., Vieira, R., Aluísio, S., Moniz, H., Batista, F., Gonçalves, T. (eds) Computational Processing of the Portuguese Language. PROPOR 2020. Lecture Notes in Computer Science(), vol 12037. Springer, Cham. https://doi.org/10.1007/978-3-030-41505-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-41505-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41504-4
Online ISBN: 978-3-030-41505-1
eBook Packages: Computer ScienceComputer Science (R0)