Quality Assessment of HMM-Based Speech Synthesis Using Acoustical Vowel Analysis

Coto-Jiménez, Marvin; Goddard-Close, John; Martínez-Licona, Fabiola M.

doi:10.1007/978-3-319-11581-8_46

Marvin Coto-Jiménez^22,23,
John Goddard-Close²³ &
Fabiola M. Martínez-Licona²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8773))

Included in the following conference series:

International Conference on Speech and Computer

1314 Accesses

Abstract

The synthetic speech produced from a Hidden Markov Model (HMM)-based system is often reported as sounding muffled when it is compared to natural speech. There are several reasons for this effect: some precise and fine characteristics of the natural speech are removed, minimized or hidden in the modeling phase of the HMM system; the resulting speech parameter trajectories become over smoothed versions of the speech waveforms. This means that each synthetic voice constructed from an HMM-based system must be tested for its speech quality. Usually, costly subjective testing is required and it is interesting to find objective alternatives. This paper considers nine acoustic parameters, related to jitter and shimmer, and considers their statistical significance as objective measurements of synthetic speech quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Valentini-Botinhao, C., Yamagishi, Y., King, S.: Evaluation of objective measures for intelligibility prediction of HMM-based synthetic speech in noise. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5112–5115 (2011)
Google Scholar
Taal, C.H., Hendriks, R.C., Heusdens, R., Jensen, J., Kjems, U.: An evaluation of objective quality measures for speech intelligibility prediction. In: INTERSPEECH, pp. 1947–1950 (2009)
Google Scholar
Martínez-Licona, F.M., Goddard, J., Martínez-Licona, A.E., Coto-Jiménez, M.: Assessing Stress in Mexican Spanish from Emotion Speech Signals. In: Proc. 8th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA, pp. 239–242 (2013)
Google Scholar
Falcone, M., Yadav, N., Poellabauer, C., Flynn, P.: Using isolated vowel sounds for classification of Mild Traumatic Brain Injury. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7577–7581 (2013)
Google Scholar
Wertzner, H., Schreiber, S., Amaro, L.: Analysis of fundamental frequency, jitter, shimmer and vocal intensity in children with phonological disorders. Revista Brasileira de Otorrinolaringologia 71, 582–588 (2005)
Article Google Scholar
Brockmann, M., Drinnan, M.J., Storck, C., Carding, P.N.: Reliable jitter and shimmer measurements in voice clinics: the relevance of vowel, gender, vocal intensity, and fundamental frequency effects in a typical clinical task. Journal of Voice 25(1), 44–53 (2011)
Article Google Scholar
Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Communication 51(11), 1039–1064 (2009)
Article Google Scholar
Yamagishi, J., Zen, H., Wu, Y.J., Toda, T., Tokuda, K.: The HTS-2008 system: Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge (2008)
Google Scholar
Hanzlíček, Z.: Czech HMM-based speech synthesis. Text, Speech and Dialogue, pp. 291–298. Springer, Heidelberg (2010)
Book Google Scholar
Cernak, M., Motlicek, P., Garner, P.N.: On the (Un) importance of the contextual factors in HMM-based speech synthesis and coding. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8140–8143 (2013)
Google Scholar
Tokuda, K., Nankaku, Y., Zen, H., Yamagishi, J., Oura, K.: Speech synthesis based on hidden Markov models. Proceedings of the IEEE 101, 1234–1252 (2013)
Article Google Scholar
Black, A., Zen, H., Tokuda, K.: Statistical parametric speech synthesis. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. IV-1229–IV-1232 (2007)
Google Scholar
ELRA catalogue: Emotional speech synthesis database, http://catalog.elra.info
Praat: doing phonetics by computer, http://www.praat.org
Goddard, J., Schlotthauer, G., Torres, M.E., Rufiner, H.L.: Dimensionality reduction for visualization of normal and pathological speech data. Biomedical Signal Processing and Control 4(3), 194–201 (2009)
Article Google Scholar
HTS Voice Demos, http://hts.sp.nitech.ac.jp/?VoiceDemos

Download references

Author information

Authors and Affiliations

Electrical Engineering School, Universidad de Costa Rica, San José, Costa Rica
Marvin Coto-Jiménez
Electrical Engineering Department, Universidad Autónoma Metropolitana, Mexico City, México
Marvin Coto-Jiménez, John Goddard-Close & Fabiola M. Martínez-Licona

Authors

Marvin Coto-Jiménez
View author publications
You can also search for this author in PubMed Google Scholar
John Goddard-Close
View author publications
You can also search for this author in PubMed Google Scholar
Fabiola M. Martínez-Licona
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute of Informatics and Automation of the Russian Academy of Sciences, 39, 14th line, 199178, St. Petersburg, Russia
Andrey Ronzhin
Institute of Applied and Mathematical Linguistics, Moscow State Linguistic University, 38, Ostozhenka, 119034, Moscow, Russia
Rodmonga Potapova
Faculty of Technical Sciences, University of Novi Sad, 6, Trg Dositeja Obradovića, 21000, Novi Sad, Serbia
Vlado Delic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Coto-Jiménez, M., Goddard-Close, J., Martínez-Licona, F.M. (2014). Quality Assessment of HMM-Based Speech Synthesis Using Acoustical Vowel Analysis. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_46

Download citation

DOI: https://doi.org/10.1007/978-3-319-11581-8_46
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11580-1
Online ISBN: 978-3-319-11581-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics