Abstract
This paper studies how the length of the window used during spectral envelope estimation influences the perceptual quality of HMM-based speech synthesis. We show that the acoustic differences due to variations in the window length are audible. The experiments reveal an overall preference towards short analysis windows, although longer windows seem to alleviate some artifacts related to training data scarcity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Communication 51(11), 1039–1064 (2009)
Tokuda, K., Nankaku, Y., Toda, T., Zen, H., Yamagishi, J., Oura, K.: Speech synthesis based on hidden Markov Models. Proceedings IEEE 101(5), 1234–1252 (2013)
Toda, T., Tokuda, K.: A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Transactions on Information and System E90-D(5), 816–824 (2007)
HHM-based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp/
Tokuda, K., Kobayashi, T., Masuko, T., Imai, S.: Mel-generalized cepstral analysis - a unified approach to speech spectral estimation. In: Proceedings ICSLP, vol. 3, pp. 1043–1046 (1994)
Imai, S.: Cepstral analysis synthesis on the mel frequency scale. In: Proceedigns ICASSP, pp. 93–96 (1983)
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Mixed excitation for HMM-based speech synthesis. In: Proceedings Eurospeech, pp. 2263–2266 (2001)
Gonzalvo, X., Socorro, J.C., Iriondo, I., Monzo, C., Martinez, E.: Linguistic and mixed excitation improvements on a HMM-based speech synthesis for Castilian Spanish. In: Proceedings of the 6th ISCA Speech Synthesis Workshop, pp. 362–367 (2007)
Maia, R., Toda, T., Zen, H., Nankaku, Y., Tokuda, K.: An excitation model for HMM-based speech synthesis based on residual modeling. In: Proceedings 6th ISCA Speech Synthesis Workshop, pp. 131–136 (2007)
Drugman, T., Wilfart, G., Dutoit, T.: A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis. In: Proceedings Interspeech, pp. 1779–1782 (2009)
Zen, H., Toda, T., Nakamura, M., Tokuda, K.: Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Transactions on Information and System E90-D(1), 325–333 (2007)
Kawahara, H., Masuda-Kasuse, I., de Cheveigne, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Communication 27, 187–207 (1999)
Cabral, J.P., Renals, S., Richmond, K., Yamagishi, J.: Glottal Spectra Separation for Parametric Speech Synthesis. In: Proceedings Interspeech, pp. 1829–1832 (2008)
Lanchantin, P., Degottex, G., Rodet, X.: A HMM-based speech synthesis system using a new glottal source and vocal-tract separation method. In: Proceedings ICASSP, pp. 4630–4633 (2010)
Raitio, T., Suni, A., Yamagishi, J., Pulakka, H., Nurminen, J., Vainio, M., Alku, P.: HMM-based Speech Synthesis Utilizing Glottal Inverse Filtering. IEEE Transactions on Audio Speech and Language Processing 19(1), 153–165 (2011)
Banos, E., Derro, D., Bonafonte, A., Moreno, A.: Flexible harmonic/stochastic modeling for HMM-based speech synthesis. In: Proceedings V Jornadas en TecnologÃas del Habla, pp. 145–148 (2008)
Shechtman, S., Sorin, A.: Sinusoidal model parameterization for HMM-based TTS system. In: Proceedings Interspeech, pp. 805–808 (2010)
Erro, D., Sainz, I., Navas, E., Hernaez, I.: Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE Journal of Selected Topics in Signal Processing (in press)
Toda, T., Tokuda, K.: Statistical approach to vocal tract transfer function estimation based on factor analyzed trajectory HMM. In: Proceedings ICASSP, pp. 3925–3928 (2008)
Wu, Y.J., Tokuda, K.: Minimum generation error training by using original spectrum as reference for log spectral distortion measure. In: Proceedings ICASSP, pp. 4013–4016 (2009)
Ling, Z.H., Deng, L., Yu, D.: Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis. IEEE Transactions on Audio Speech and Language Processing 21(10), 2129–2139 (2013)
Hojo, N., Yoshizato, K., Kameoka, H., Saito, D., Sagayama, S.: Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models. In: Proceedings of the 8th ISCA Speech Synthesis Workshop, pp. 129–134 (2013)
Stylianou, Y.: Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification. Ph.D. thesis, École Nationale Supèrieure de Télécommunications, Paris (1996)
Erro, D., Sainz, I., Navas, E., Hernaez, I.: Efficient spectral envelope estimation from harmonic speech signals. IET Electronics Letters 48(16), 1019–1021 (2012)
Cappé, O., Laroche, J., Moulines, E.: Regularized estimation of cepstrum envelope from discrete frequency points. In: Proceedings WASPAA, pp. 213–219 (1995)
Rix, A.W., Beerends, J.G., Hollier, M.P., Hekstra, A.P.: Perceptual evaluation of speech quality (PESQ) – a new method for speech quality assessment of telephone networks and codecs. In: Proceedings ICASSP, pp. 749–752 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Alonso, A., Erro, D., Navas, E., Hernaez, I. (2014). Fine Vocoder Tuning for HMM-Based Speech Synthesis: Effect of the Analysis Window Length. In: Navarro Mesa, J.L., et al. Advances in Speech and Language Technologies for Iberian Languages. Lecture Notes in Computer Science(), vol 8854. Springer, Cham. https://doi.org/10.1007/978-3-319-13623-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-13623-3_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13622-6
Online ISBN: 978-3-319-13623-3
eBook Packages: Computer ScienceComputer Science (R0)