Abstract
The synthetic speech produced from a Hidden Markov Model (HMM)-based system is often reported as sounding muffled when it is compared to natural speech. There are several reasons for this effect: some precise and fine characteristics of the natural speech are removed, minimized or hidden in the modeling phase of the HMM system; the resulting speech parameter trajectories become over smoothed versions of the speech waveforms. This means that each synthetic voice constructed from an HMM-based system must be tested for its speech quality. Usually, costly subjective testing is required and it is interesting to find objective alternatives. This paper considers nine acoustic parameters, related to jitter and shimmer, and considers their statistical significance as objective measurements of synthetic speech quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Valentini-Botinhao, C., Yamagishi, Y., King, S.: Evaluation of objective measures for intelligibility prediction of HMM-based synthetic speech in noise. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5112–5115 (2011)
Taal, C.H., Hendriks, R.C., Heusdens, R., Jensen, J., Kjems, U.: An evaluation of objective quality measures for speech intelligibility prediction. In: INTERSPEECH, pp. 1947–1950 (2009)
Martínez-Licona, F.M., Goddard, J., Martínez-Licona, A.E., Coto-Jiménez, M.: Assessing Stress in Mexican Spanish from Emotion Speech Signals. In: Proc. 8th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA, pp. 239–242 (2013)
Falcone, M., Yadav, N., Poellabauer, C., Flynn, P.: Using isolated vowel sounds for classification of Mild Traumatic Brain Injury. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7577–7581 (2013)
Wertzner, H., Schreiber, S., Amaro, L.: Analysis of fundamental frequency, jitter, shimmer and vocal intensity in children with phonological disorders. Revista Brasileira de Otorrinolaringologia 71, 582–588 (2005)
Brockmann, M., Drinnan, M.J., Storck, C., Carding, P.N.: Reliable jitter and shimmer measurements in voice clinics: the relevance of vowel, gender, vocal intensity, and fundamental frequency effects in a typical clinical task. Journal of Voice 25(1), 44–53 (2011)
Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Communication 51(11), 1039–1064 (2009)
Yamagishi, J., Zen, H., Wu, Y.J., Toda, T., Tokuda, K.: The HTS-2008 system: Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge (2008)
Hanzlíček, Z.: Czech HMM-based speech synthesis. Text, Speech and Dialogue, pp. 291–298. Springer, Heidelberg (2010)
Cernak, M., Motlicek, P., Garner, P.N.: On the (Un) importance of the contextual factors in HMM-based speech synthesis and coding. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8140–8143 (2013)
Tokuda, K., Nankaku, Y., Zen, H., Yamagishi, J., Oura, K.: Speech synthesis based on hidden Markov models. Proceedings of the IEEE 101, 1234–1252 (2013)
Black, A., Zen, H., Tokuda, K.: Statistical parametric speech synthesis. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. IV-1229–IV-1232 (2007)
ELRA catalogue: Emotional speech synthesis database, http://catalog.elra.info
Praat: doing phonetics by computer, http://www.praat.org
Goddard, J., Schlotthauer, G., Torres, M.E., Rufiner, H.L.: Dimensionality reduction for visualization of normal and pathological speech data. Biomedical Signal Processing and Control 4(3), 194–201 (2009)
HTS Voice Demos, http://hts.sp.nitech.ac.jp/?VoiceDemos
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Coto-Jiménez, M., Goddard-Close, J., Martínez-Licona, F.M. (2014). Quality Assessment of HMM-Based Speech Synthesis Using Acoustical Vowel Analysis. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-11581-8_46
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11580-1
Online ISBN: 978-3-319-11581-8
eBook Packages: Computer ScienceComputer Science (R0)