Skip to main content

Quality Assessment of HMM-Based Speech Synthesis Using Acoustical Vowel Analysis

  • Conference paper
Speech and Computer (SPECOM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8773))

Included in the following conference series:

  • 1314 Accesses

Abstract

The synthetic speech produced from a Hidden Markov Model (HMM)-based system is often reported as sounding muffled when it is compared to natural speech. There are several reasons for this effect: some precise and fine characteristics of the natural speech are removed, minimized or hidden in the modeling phase of the HMM system; the resulting speech parameter trajectories become over smoothed versions of the speech waveforms. This means that each synthetic voice constructed from an HMM-based system must be tested for its speech quality. Usually, costly subjective testing is required and it is interesting to find objective alternatives. This paper considers nine acoustic parameters, related to jitter and shimmer, and considers their statistical significance as objective measurements of synthetic speech quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Valentini-Botinhao, C., Yamagishi, Y., King, S.: Evaluation of objective measures for intelligibility prediction of HMM-based synthetic speech in noise. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5112–5115 (2011)

    Google Scholar 

  2. Taal, C.H., Hendriks, R.C., Heusdens, R., Jensen, J., Kjems, U.: An evaluation of objective quality measures for speech intelligibility prediction. In: INTERSPEECH, pp. 1947–1950 (2009)

    Google Scholar 

  3. Martínez-Licona, F.M., Goddard, J., Martínez-Licona, A.E., Coto-Jiménez, M.: Assessing Stress in Mexican Spanish from Emotion Speech Signals. In: Proc. 8th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA, pp. 239–242 (2013)

    Google Scholar 

  4. Falcone, M., Yadav, N., Poellabauer, C., Flynn, P.: Using isolated vowel sounds for classification of Mild Traumatic Brain Injury. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7577–7581 (2013)

    Google Scholar 

  5. Wertzner, H., Schreiber, S., Amaro, L.: Analysis of fundamental frequency, jitter, shimmer and vocal intensity in children with phonological disorders. Revista Brasileira de Otorrinolaringologia 71, 582–588 (2005)

    Article  Google Scholar 

  6. Brockmann, M., Drinnan, M.J., Storck, C., Carding, P.N.: Reliable jitter and shimmer measurements in voice clinics: the relevance of vowel, gender, vocal intensity, and fundamental frequency effects in a typical clinical task. Journal of Voice 25(1), 44–53 (2011)

    Article  Google Scholar 

  7. Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Communication 51(11), 1039–1064 (2009)

    Article  Google Scholar 

  8. Yamagishi, J., Zen, H., Wu, Y.J., Toda, T., Tokuda, K.: The HTS-2008 system: Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge (2008)

    Google Scholar 

  9. Hanzlíček, Z.: Czech HMM-based speech synthesis. Text, Speech and Dialogue, pp. 291–298. Springer, Heidelberg (2010)

    Book  Google Scholar 

  10. Cernak, M., Motlicek, P., Garner, P.N.: On the (Un) importance of the contextual factors in HMM-based speech synthesis and coding. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8140–8143 (2013)

    Google Scholar 

  11. Tokuda, K., Nankaku, Y., Zen, H., Yamagishi, J., Oura, K.: Speech synthesis based on hidden Markov models. Proceedings of the IEEE 101, 1234–1252 (2013)

    Article  Google Scholar 

  12. Black, A., Zen, H., Tokuda, K.: Statistical parametric speech synthesis. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. IV-1229–IV-1232 (2007)

    Google Scholar 

  13. ELRA catalogue: Emotional speech synthesis database, http://catalog.elra.info

  14. Praat: doing phonetics by computer, http://www.praat.org

  15. Goddard, J., Schlotthauer, G., Torres, M.E., Rufiner, H.L.: Dimensionality reduction for visualization of normal and pathological speech data. Biomedical Signal Processing and Control 4(3), 194–201 (2009)

    Article  Google Scholar 

  16. HTS Voice Demos, http://hts.sp.nitech.ac.jp/?VoiceDemos

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Coto-Jiménez, M., Goddard-Close, J., Martínez-Licona, F.M. (2014). Quality Assessment of HMM-Based Speech Synthesis Using Acoustical Vowel Analysis. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11581-8_46

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11580-1

  • Online ISBN: 978-3-319-11581-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics