Skip to main content

Optimal Number of States in HMM-Based Speech Synthesis

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10415))

Included in the following conference series:

Abstract

This paper deals with using models with a variable number of states in the HMM-based speech synthesis system. The paper also includes some implementation details on how to use these models in systems based on the HTS toolkit, which cannot handle the models with an unequal number of states directly. A bypass to enable this functionality is proposed here. A data-based method for the determination of the optimal number of states for particular models is proposed here and experimentally tested on 4 large speech corpora. The preference listening test, focused on local differences, proved the preference of the proposed system to the traditional system with 5-state models, while the size of the proposed system (the total number of states) is lower.

This research was supported by the Czech Science Foundation (GA CR), project No. GA16-04420S. Access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum, provided under the programme CESNET LM2015042, is greatly appreciated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Praat: doing phonetics by computer, www.praat.org.

  2. 2.

    HMM-based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp.

  3. 3.

    The detailed scheme of the training procedure is more complex, e.g. the reestimation and clustering of models are usually repeated twice.

  4. 4.

    A bug had to be fixed in HTS toolkit ver.2.2 (file HFB.c) to allow using the 1-state models or else it did not work properly.

  5. 5.

    Names of HTS tools are stated here to specify the point of transition to 1-state models as precisely as possible.

  6. 6.

    However, proposed methods are certainly not language-dependent.

References

  1. Kawahara, H., Masuda-Katsuse, I., de Cheveigne, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27, 187–207 (1999)

    Article  Google Scholar 

  2. Ling, Z.H., Kang, S.Y., Zen, H., Senior, A., Schuster, M., Qian, X.J., Meng, H.M., Deng, L.: Deep learning for acoustic modeling in parametric speech generation: a systematic review of existing techniques and future trends. IEEE Signal Process. Mag. 32(3), 35–52 (2015)

    Article  Google Scholar 

  3. Matoušek, J., Tihelka, D., Romportl, J.: Building of a speech corpus optimised for unit selection TTS synthesis. In: Proceedings of LREC (2008)

    Google Scholar 

  4. Romportl, J., Matoušek, J., Tihelka, D.: Advanced prosody modelling. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 441–447. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30120-2_56

    Chapter  Google Scholar 

  5. Shao, X., Pollet, V., Breen, A.: Refined statistical model tuning for speech synthesis. In: Proceedings of the 7th ISCA Workshop on Speech Synthesis, pp. 284–287 (2010)

    Google Scholar 

  6. Wells, J.: SAMPA computer readable phonetic alphabet. In: Handbook of Standards and Resources for Spoken Language Systems, pp. 684–732. Mouton de Gruyter, Berlin (1997)

    Google Scholar 

  7. Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zdeněk Hanzlíček .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Hanzlíček, Z. (2017). Optimal Number of States in HMM-Based Speech Synthesis. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64206-2_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64205-5

  • Online ISBN: 978-3-319-64206-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics