Skip to main content

Experiment with Evaluation of Quality of the Synthetic Speech by the GMM Classifier

  • Conference paper
Text, Speech, and Dialogue (TSD 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Included in the following conference series:

  • 2400 Accesses

Abstract

This paper describes our experiment with using the Gaussian mixture models (GMM) for evaluation of the speech quality produced by different methods of speech synthesis and parameterization. In addition, the paper analyzes and compares influence of different types of features and different number of mixtures used for GMM evaluation. Finally, the GMM evaluation scores are compared with the results obtained by the conventional listening tests based on the mean opinion score (MOS) evaluations. Results of evaluations obtained by these two ways are in correspondence.

The work has been supported by the Technology Agency of the Czech Republic, project No. TA01030476, the Grant Agency of the Slovak Academy of Sciences (VEGA 2/0090/11), and the Ministry of Education of the Slovak Republic (VEGA 1/0987/12).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Audibert, N., Vincent, D., Aubergé, V., Rosec, O.: Evaluation of Expresive Speech Resynthesis. In: Proceedings of LREC 2006 Workshop on Emotional Corpora, Gènes, pp. 37–40 (2006)

    Google Scholar 

  2. Iriondo, I., Planet, S., Socoró, J.C., Martínez, E., Alías, F., Monzo, C.: Automatic Refinement of an Expressive Speech Corpus Assembling Subjective Perception and Automatic Classification. Speech Communication 51, 744–758 (2009)

    Article  Google Scholar 

  3. Takano, Y., Kondo, K.: Estimation of Speech Intelligibility Using Speech Recognition Systems. IEICE Transactions on Information and Systems E93D(12), 3368–3376 (2010)

    Article  Google Scholar 

  4. Vích, R., Nouza, J., Vondra, M.: Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 136–148. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Yun, S., Yoo, C.D.: Loss-Scaled Large-Margin Gaussian Mixture Models for Speech Emotion Classification. IEEE Transactions on Audio, Speech, and Language Processing 20(2), 585–598 (2012)

    Article  Google Scholar 

  6. Hosseinzadeh, D., Krishnan, S.: On the Use of Complementary Spectral Features for Speaker Recognition. EURASIP Journal on Advances in Signal Processing 2008, Article ID 258184, 10 pages (2008)

    Google Scholar 

  7. Lu, Y., Cooke, M.: The Contribution of Changes in F0 and Spectral Tilt to Increased Intelligibility of Speech Produced in Noise. Speech Communication 51(12), 1253–1262 (2009)

    Article  Google Scholar 

  8. Reynolds, D.A., Rose, R.C.: Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Transactions on Speech and Audio Processing 3, 72–83 (1995)

    Article  Google Scholar 

  9. Vích, R.: Cepstral Speech Model, Padé Approximation, Excitation, and Gain Matching in Cepstral Speech Synthesis. In: Proceedings of the 15th Biennial EURASIP Conference Biosignal 2000, Brno, Czech Republic, pp. 77–82 (2000)

    Google Scholar 

  10. Madlová, A.: Autoregressive and Cepstral Parametrization in Harmonic Speech Modelling. Journal of Electrical Engineering 53, 46–49 (2002)

    Google Scholar 

  11. Grůber, M., Hanzlíček, Z.: Czech Expressive Speech Synthesis in Limited Domain Comparison of Unit Selection and HMM-Based Approaches. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 656–664. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  12. Bishop, C.M., Nabney, I.T.: NETLAB Online Reference Documentation (accessed February 16, 2012), http://www.fizyka.umk.pl/netlab/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Přibil, J., Přibilová, A., Matoušek, J. (2013). Experiment with Evaluation of Quality of the Synthetic Speech by the GMM Classifier. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40585-3_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40584-6

  • Online ISBN: 978-3-642-40585-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics