Abstract
In view of the Tacotron Griffin-Lim algorithm in speech synthesis system recovery phase information of the obvious effect of the synthetic speech artificial processing, low protect boomed, this paper proposes a speech synthesis method based on Tacotron + WaveNet network architecture, the method is based on the sequence mapping Seq2Seq structure, first of all, the input text into one—hot vector, and introduces attention mechanism for MEL spectrograms, finally using WaveNet vocoder back-end processing network reconstruct the phase information of speech signal, so as to convert the input text into waveform. The test language of the experiment was LJ-Speech, and the experiment was conducted for English language. The experimental results showed that the average subjective opinion score MOS was 4.23, which was higher than Tacotron end-to-end speech synthesis method in terms of synthesis naturalness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hunt AJ, Black AW (1996) Unit selection in a concatenative speech synthesis system using a large speech database. In: IEEE international conference on acoustics
Lin D (1998) Automatic retrieval and clustering of similar words. Meeting of the association for computational linguistics & international conference on computational linguistics
Tokuda K, Yoshimura T, Masuko T, Kobayashi T, Kitamura T (2000) Speech parameter generation algorithms for HMM-based speech synthesis. In: Proceedings of ICASSP, Istanbul, Turkey, vol 3, pp 1315–1318
Zen H, Tokuda K, Black AW (2009) Statistical parametric speech synthesis. Speech Commun 51(11):1039–1064
Ze H, Senior A, Schuster M (2013) Statistical parametric speech synthesis using deep neural networks. In: IEEE international conference on acoustics
Arik S, Diamos G, Gibiansky A, Miller J, Peng K, Ping W et al (2017) Deep voice 2: multi-speaker neural text-to-speech
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, Y., Ma, Q., Wang, Y. (2020). Speech Synthesis Method Based on Tacotron + WaveNet. In: Liang, Q., Wang, W., Liu, X., Na, Z., Jia, M., Zhang, B. (eds) Communications, Signal Processing, and Systems. CSPS 2019. Lecture Notes in Electrical Engineering, vol 571. Springer, Singapore. https://doi.org/10.1007/978-981-13-9409-6_78
Download citation
DOI: https://doi.org/10.1007/978-981-13-9409-6_78
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9408-9
Online ISBN: 978-981-13-9409-6
eBook Packages: EngineeringEngineering (R0)